Trying to sort on two fields, second then first

I am trying to sort on multiple columns. The results are not as expected.

Here’s my data (people.txt):

Simon Strange 62
Pete Brown 37
Mark Brown 46
Stefan Heinz 52
Tony Bedford 50
John Strange 51
Fred Bloggs 22
James Bedford 21
Emily Bedford 18
Ana Villamor 44
Alice Villamor 50
Francis Chepstow 56

The following works correctly:

bash-3.2$ sort -k2 -k3 <people.txt                                                                                                                    
Emily Bedford 18                                                                                                                                      
James Bedford 21                                                                                                                                      
Tony Bedford 50                                                                                                                                       
Fred Bloggs 22                                                                                                                                        
Pete Brown 37                                                                                                                                         
Mark Brown 46                                                                                                                                         
Francis Chepstow 56                                                                                                                                   
Stefan Heinz 52                                                                                                                                       
John Strange 51                                                                                                                                       
Simon Strange 62                                                                                                                                      
Ana Villamor 44                                                                                                                                       
Alice Villamor 50

But, the following does not work as expected:

bash-3.2$ sort -k2 -k1 <people.txt                                        
Emily Bedford 18                                                                                                                                      
James Bedford 21                                                                                                                                      
Tony Bedford 50                                                                                                                                       
Fred Bloggs 22                                                                                                                                        
Pete Brown 37                                                                                                                                         
Mark Brown 46                                                                                                                                         
Francis Chepstow 56                                                                                                                                   
Stefan Heinz 52                                                                                                                                       
John Strange 51                                                                                                                                       
Simon Strange 62                                                                                                                                      
Ana Villamor 44                                                                                                                                       
Alice Villamor 50

I was trying to sort by surname and then by first name, but you will see the Villamors are not in the correct order. I was hoping to sort by surname, and then when surnames matched, to sort by first name.

It seems there is something about how this should work I don’t understand. I could do this another way of course (using awk), but I want to understand sort.

I am using the standard Bash shell on Mac OS X.

Asked By: Harry

||

With GNU sort you do it like this, not sure about MacOS:

sort -k2,2 -k1 <people.txt

Update according to comment. Quoted from man sort:

   -k, --key=KEYDEF
          sort via a key; KEYDEF gives location and type

   KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where
   F is a field number and C a character position in the field; both are
   origin 1, and the stop position defaults to the line's end.
Answered By: manatwork

A key specification like -k2 means to take all the fields from 2 to the end of the line into account. So Villamor 44 ends up before Villamor 50. Since these two are not equal, the first comparison in sort -k2 -k1 is enough to discriminate these two lines, and the second sort key -k1 is not invoked. If the two Villamors had had the same age, -k1 would have caused them to be sorted by first name.

To sort by a single column, use -k2,2 as the key specification. This means to use the fields from #2 to #2, i.e. only the second field.

sort -k2 -k3 <people.txt is redundant: it’s equivalent to sort -k2 <people.txt. To sort by last names, then first names, then age, run the following command:

sort -k2,2 -k1,1 <people.txt

or equivalently sort -k2,2 -k1 <people.txt since there are only these three fields and the separators are the same. In fact, you will get the same effect from sort -k2,2 <people.txt, because sort uses the whole line as a last resort when all the keys in a subset of lines are identical.

Also note that the default field separator is the transition between a non-blank and a blank, so the keys will include the leading blanks (in your example, for the first line, the first key will be "Emily", but the second key " Bedford". Add the -b option to strip those blanks:

sort -b -k2,2 -k1,1

It can also be done on a per-key basis by adding the b flag at the end of the key start specification:

sort -k2b,2 -k1,1 <people.txt

But something to bear in mind: as soon as you add one such flag to the key specification, the global flags (like -n, -r…) no longer apply to them so it’s better to avoid mixing per-key flags and global flags.

You can do this

$ sort -k2,2 -k1,1 people.txt 
Emily Bedford 18
James Bedford 21
Tony Bedford 50
Fred Bloggs 22
Mark Brown 46
Pete Brown 37
Francis Chepstow 56
Stefan Heinz 52
John Strange 51
Simon Strange 62
Alice Villamor 50
Ana Villamor 44

So first -k2,2 you are sorting by last name. Then, k1,1 sorting by first name.

Answered By: Logan Lee

Using Raku (formerly known as Perl_6)

Adding this answer for U&L users who might be trying to sorting Unicode. Raku has high-level support for Unicode built-in, and this answer (in part) is to help this author understand Raku’s sorting rules.


Sorting on one column (last-name) with a ‘unary’ comparison operator/block (commented out at top), or with a binary block containing the leg "less-than/equal-to/greater-than" string comparison operator. Ties stay in ‘encounter’ order (i.e. stable sort):

~$ #`{ raku -e '.put for lines.sort: { .words[1] };'  #unary block, OR binary block below: }

~$ raku -e '.put for lines.sort: { $^a.words[1] leg $^b.words[1] };'  file
Tony Bedford 50
James Bedford 21
Emily Bedford 18
Fred Bloggs 22
Pete Brown 37
Mark Brown 46
Francis Chepstow 56
Stefan Heinz 52
Simon Strange 62
John Strange 51
Ana Villamor 44
Alice Villamor 50

Sorting on two columns, last-name then first-name. At top (commented out), giving sort a list of unary elements to sort on. Second example below: more explicitly using two leg string-comparison operators, with || "short-circuit-OR" in between:

~$ #`{ raku -e '.put for lines.sort: { .words.[1], .words.[0] }; #list of unary elements, OR binary blocks below: }

~$ raku -e '.put for lines.sort: {$^a.words[1] leg $^b.words[1] || $^a.words[0] leg $^b.words[0] };'  file
Emily Bedford 18
James Bedford 21
Tony Bedford 50
Fred Bloggs 22
Mark Brown 46
Pete Brown 37
Francis Chepstow 56
Stefan Heinz 52
John Strange 51
Simon Strange 62
Alice Villamor 50
Ana Villamor 44

The above Raku code satisfies the title of this question: "Trying to sort on two fields, second then first". But if one column is numeric you can use<=> instead of leg to sort numerically (<=> is commonly termed ‘spaceship’ operator). Example below:


The following sorts on three (3) columns: last-name, reverse age (oldest first–swap $^b and $^a for reverse sort), then first-name. So in the sorted output Simon Strange 62 will appear before John Strange 51.

Raku has an improved cmp operator which tries to detect Types and make smart comparisons for you (i.e. string comparisons with leg and numeric comparisons with <=>). In the second example below, three cmp comparisons give the exact same sorted output as the first example:

~$ #`{ raku -e '.put for lines.sort: {$^a.words[1] leg $^b.words[1] || $^b.words[2] <=> $^a.words[2] || $^a.words[0] leg $^b.words[0] };' #OR with cmp operator below: }

~$ raku -e '.put for lines.sort: {$^a.words[1] cmp $^b.words[1] || $^b.words[2] cmp $^a.words[2] || $^a.words[0] cmp $^b.words[0] };'  file
Tony Bedford 50
James Bedford 21
Emily Bedford 18
Fred Bloggs 22
Mark Brown 46
Pete Brown 37
Francis Chepstow 56
Stefan Heinz 52
Simon Strange 62
John Strange 51
Alice Villamor 50
Ana Villamor 44

Finally, the "binary comparator blocks" (above) can give you a precise understanding/control over your sorting mechanism. But if you prefer, sorting on three columns (above) can be simplified to the code below:

~$ raku -e '.put for lines.sort: { ~.words[1], -.words[2], ~.words[0] };'  file
Tony Bedford 50
James Bedford 21
Emily Bedford 18
Fred Bloggs 22
Mark Brown 46
Pete Brown 37
Francis Chepstow 56
Stefan Heinz 52
Simon Strange 62
John Strange 51
Alice Villamor 50
Ana Villamor 44

https://perl6advent.wordpress.com/2013/12/23/day-23-unary-sort/
https://docs.raku.org/language/101-basics#Stable_sort
https://docs.raku.org/routine/sort
https://docs.raku.org/routine/cmp
https://perl6advent.wordpress.com/2013/12/23/day-23-unary-sort/
https://raku.org

Answered By: jubilatious1
Categories: Answers Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.