Jérôme Belleman
Home  •  Tools  •  Posts  •  Talks  •  Travels  •  Graphics  •  About Me

Some sort of comparison of sort and :sort

10 Sep 2017

The UNIX sort command is part of any DevOps' toolkit. Vim has its own :sort command, but its behaviour is in some ways radically different – for the better.

1 The UNIX sort command

It sorts lines in a text file, without further ado. As part of the essential options, -k specifies which key – read field – to sort on and -t changes the field separator.

The sort command offers a few options to sort numerically, such that e.g. 33 comes after 4, not before when it by default only looks at the first character, in which case it would only check that 3 comes before 4. The -n option is the one we most of the time use. There's also the much slower – but not much more useful – -g option which is a bit more general in that it typically supports the exponential notation. Where -n would consider 1e-2 to be greater than 1e-1, -g will recognise the whole string for what it is and place 1e-1 after 1e-2. A much more useful numeric option for human concerns would be -h, which also understands SI suffixes such as kMG.

There's a few other useful options which might come in handy in your everyday life: -f ignores case, -M sorts months written in letters (e.g. Jan, Feb, Mar, ...), -R doesn't actually sort but shuffles instead (useful if shuf isn't installed), -r reverses the sorting.

A special mention for the -u option, which I was interested to learn sorts and removes duplicate lines, much like sort | uniq would.

2 The Vim :sort command

Now, don't take me wrong. I'm not about to claim that :sort is a suitable alternative or replacement for sort. They both have their strengths and weaknesses in terms of available features, and as a result they're very much complementary.

The Vim :sort command offers some of the features that the UNIX sort command does, such as reversing the order (the ! flag) or ignoring case (the i flag). It can sort numerically by integers (the n flag) or floats (the f flag) for a variety a bases (hexadecimal with x, octal with o, binary with b). It can remove duplicate lines (the u flag) much like sort | uniq does.

Where the Vim :sort command stands out, is in the fact it lets you specify a pattern to skip characters when identifying a key to sort by, instead of having to specify a field number. You can very naturally use this to your advantage in a number of ways. Suppose for instance that you've got a list of 3-columns lines to sort but the fields are delimited in a rather annoying way as they are only separated with a relatively long sequence of spaces. To make matters worse, you can also come across fields which include spaces themselves. Looking at the data set, you can however naturally see that they start at the same column. Indeed, baz, grault fred and gazonk all start at column 23:

foo bar               baz
boo qux corge         grault garply
waldo                 fred
plugh xyzzy           gazonk                  thud

As a result you can use a pattern that will simply cause :sort to skip the first 23 characters and only sort by whatever comes next:

:sort /.\{23}/

You could have used other patterns which might have been more meaningful to you –  for instance one which causes :sort to skip the first match for more than a space and only sort by whatever comes after:

:sort /\s\s/

Another example of a potentially annoying scenario with traditional sorting tools would be a list of dates which you'd like to sort by year. Unfortunately, the date format may change:

Friday, 21 April 2017
2 Feb 2012
9 Aug 2015
2009-05-23
2016-01-04

Could this possibly be addressed with a pattern matching specifically years, i.e. four digits?

:sort /\d\d\d\d/

This won't work, in fact, as it instructs Vim to skip the first match of a year and sort by what comes after. We don't actually want to skip the match, in such a case. Fortunately, Vim also offers the r flag which, instead of skipping what the pattern matches, will cause :sort to actually use the match to sort. As a result, this will work as intended:

:sort r /\d\d\d\d/

3 References

The sort command is one of those which is much better documented via info than in the manpage. In particular, you'll prefer reading the former if you want to understand the different numeric sorts.