Trying to sort on two fields, second then first
I am trying to sort on multiple columns. The results are not as expected.
Here’s my data (people.txt):
Simon Strange 62 Pete Brown 37 Mark Brown 46 Stefan Heinz 52 Tony Bedford 50 John Strange 51 Fred Bloggs 22 James Bedford 21 Emily Bedford 18 Ana Villamor 44 Alice Villamor 50 Francis Chepstow 56
The following works correctly:
bash-3.2$ sort -k2 -k3 <people.txt Emily Bedford 18 James Bedford 21 Tony Bedford 50 Fred Bloggs 22 Pete Brown 37 Mark Brown 46 Francis Chepstow 56 Stefan Heinz 52 John Strange 51 Simon Strange 62 Ana Villamor 44 Alice Villamor 50
But, the following does not work as expected:
bash-3.2$ sort -k2 -k1 <people.txt Emily Bedford 18 James Bedford 21 Tony Bedford 50 Fred Bloggs 22 Pete Brown 37 Mark Brown 46 Francis Chepstow 56 Stefan Heinz 52 John Strange 51 Simon Strange 62 Ana Villamor 44 Alice Villamor 50
I was trying to sort by surname and then by first name, but you will see the Villamors are not in the correct order. I was hoping to sort by surname, and then when surnames matched, to sort by first name.
It seems there is something about how this should work I don’t understand. I could do this another way of course (using awk), but I want to understand sort.
I am using the standard Bash shell on Mac OS X.
sort you do it like this, not sure about MacOS:
sort -k2,2 -k1 <people.txt
Update according to comment. Quoted from
-k, --key=KEYDEF sort via a key; KEYDEF gives location and type KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a field number and C a character position in the field; both are origin 1, and the stop position defaults to the line's end.
A key specification like
-k2 means to take all the fields from 2 to the end of the line into account. So
Villamor 44 ends up before
Villamor 50. Since these two are not equal, the first comparison in
sort -k2 -k1 is enough to discriminate these two lines, and the second sort key
-k1 is not invoked. If the two Villamors had had the same age,
-k1 would have caused them to be sorted by first name.
To sort by a single column, use
-k2,2 as the key specification. This means to use the fields from #2 to #2, i.e. only the second field.
sort -k2 -k3 <people.txt is redundant: it’s equivalent to
sort -k2 <people.txt. To sort by last names, then first names, then age, run the following command:
sort -k2,2 -k1,1 <people.txt
sort -k2,2 -k1 <people.txt since there are only these three fields and the separators are the same. In fact, you will get the same effect from
sort -k2,2 <people.txt, because
sort uses the whole line as a last resort when all the keys in a subset of lines are identical.
Also note that the default field separator is the transition between a non-blank and a blank, so the keys will include the leading blanks (in your example, for the first line, the first key will be
"Emily", but the second key
" Bedford". Add the
-b option to strip those blanks:
sort -b -k2,2 -k1,1
It can also be done on a per-key basis by adding the
b flag at the end of the key start specification:
sort -k2b,2 -k1,1 <people.txt
But something to bear in mind: as soon as you add one such flag to the key specification, the global flags (like
-r…) no longer apply to them so it’s better to avoid mixing per-key flags and global flags.
You can do this
$ sort -k2,2 -k1,1 people.txt Emily Bedford 18 James Bedford 21 Tony Bedford 50 Fred Bloggs 22 Mark Brown 46 Pete Brown 37 Francis Chepstow 56 Stefan Heinz 52 John Strange 51 Simon Strange 62 Alice Villamor 50 Ana Villamor 44
-k2,2 you are sorting by last name. Then,
k1,1 sorting by first name.