Some sort of comparison of sort and :sort
The UNIX sort command is part of any DevOps' toolkit. Vim has its own :sort command, but its behaviour is in some ways radically different – for the better.
1 The UNIX sort command
It sorts lines in a text file, without further ado. As part of the essential options, -k
specifies which key – read field – to sort on and -t
changes the field separator.
The sort
command offers a few options to sort numerically, such that e.g. 33 comes after 4, not before when it by default only looks at the first character, in which case it would only check that 3 comes before 4. The -n
option is the one we most of the time use. There's also the much slower – but not much more useful – -g
option which is a bit more general in that it typically supports the exponential notation. Where -n
would consider 1e-2
to be greater than 1e-1
, -g
will recognise the whole string for what it is and place 1e-1
after 1e-2
. A much more useful numeric option for human concerns would be -h
, which also understands SI suffixes such as k, M, G.
There's a few other useful options which might come in handy in your everyday life: -f
ignores case, -M
sorts months written in letters (e.g. Jan, Feb, Mar, ...), -R
doesn't actually sort but shuffles instead (useful if shuf
isn't installed), -r
reverses the sorting.
A special mention for the -u
option, which I was interested to learn sorts and removes duplicate lines, much like sort | uniq
would.
2 The Vim :sort command
Now, don't take me wrong. I'm not about to claim that :sort
is a suitable alternative or replacement for sort
. They both have their strengths and weaknesses in terms of available features, and as a result they're very much complementary.
The Vim :sort
command offers some of the features that the UNIX sort
command does, such as reversing the order (the !
flag) or ignoring case (the i
flag). It can sort numerically by integers (the n
flag) or floats (the f
flag) for a variety a bases (hexadecimal with x
, octal with o
, binary with b
). It can remove duplicate lines (the u
flag) much like sort | uniq
does.
Where the Vim :sort
command stands out, is in the fact it lets you specify a pattern to skip characters when identifying a key to sort by, instead of having to specify a field number. You can very naturally use this to your advantage in a number of ways. Suppose for instance that you've got a list of 3-columns lines to sort but the fields are delimited in a rather annoying way as they are only separated with a relatively long sequence of spaces. To make matters worse, you can also come across fields which include spaces themselves. Looking at the data set, you can however naturally see that they start at the same column. Indeed, baz
, grault
fred
and gazonk
all start at column 23:
foo bar baz
boo qux corge grault garply
waldo fred
plugh xyzzy gazonk thud
As a result you can use a pattern that will simply cause :sort
to skip the first 23 characters and only sort by whatever comes next:
:sort /.\{23}/
You could have used other patterns which might have been more meaningful to you – for instance one which causes :sort
to skip the first match for more than a space and only sort by whatever comes after:
:sort /\s\s/
Another example of a potentially annoying scenario with traditional sorting tools would be a list of dates which you'd like to sort by year. Unfortunately, the date format may change:
Friday, 21 April 2017
2 Feb 2012
9 Aug 2015
2009-05-23
2016-01-04
Could this possibly be addressed with a pattern matching specifically years, i.e. four digits?
:sort /\d\d\d\d/
This won't work, in fact, as it instructs Vim to skip the first match of a year and sort by what comes after. We don't actually want to skip the match, in such a case. Fortunately, Vim also offers the r
flag which, instead of skipping what the pattern matches, will cause :sort
to actually use the match to sort. As a result, this will work as intended:
:sort r /\d\d\d\d/
3 References
The sort
command is one of those which is much better documented via info
than in the manpage. In particular, you'll prefer reading the former if you want to understand the different numeric sorts.