'sort' produces output in a weird order

Consider the following input to sort:

cat > foo <<EOM
D,,5014978
DD,,25
D,I,1972765530
D,Y,4223624
-,Y,71285059
YA,I,2
EOM

Now try running sort foo.

The output is not sorted when trying this on any of my Linux boxes (GNU coreutils versions 6.9-8.26). I get this instead:

$ sort foo
D,,5014978
DD,,25
D,I,1972765530
D,Y,4223624
-,Y,71285059
YA,I,2

Obviously, all the lines with D, should be together, and - should come before any letters.

The output is sorted when run under Cygwin (GNU coreutils 8.5). Comments?

Asked By: Leo Alekseyev

||

Sorting depends on the locale; specifically, it depends on $LC_COLLATE (possibly overridden by $LC_ALL), falling back to $LANG if it doesn’t exist. The command locale will show you what values you’re effectively working with. See man 3 strcoll, man 3 setlocale, etc.

LC_COLLATE=C (or POSIX or no locale at all) results in a strict byte-by-byte comparison.

LC_COLLATE=en_US.utf8 results in an alphabetical-equivalence sort, with punctuation ignored and characters within the same equivalence class treated equally.

Answered By: ephemient
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.