'sort' produces output in a weird order
Consider the following input to sort:
cat > foo <<EOM
D,,5014978
DD,,25
D,I,1972765530
D,Y,4223624
-,Y,71285059
YA,I,2
EOM
Now try running sort foo
.
The output is not sorted when trying this on any of my Linux boxes (GNU coreutils versions 6.9-8.26). I get this instead:
$ sort foo
D,,5014978
DD,,25
D,I,1972765530
D,Y,4223624
-,Y,71285059
YA,I,2
Obviously, all the lines with D,
should be together, and -
should come before any letters.
The output is sorted when run under Cygwin (GNU coreutils 8.5). Comments?
Sorting depends on the locale; specifically, it depends on $LC_COLLATE
(possibly overridden by $LC_ALL
), falling back to $LANG
if it doesn’t exist. The command locale
will show you what values you’re effectively working with. See man 3 strcoll
, man 3 setlocale
, etc.
LC_COLLATE=C
(or POSIX
or no locale at all) results in a strict byte-by-byte comparison.
LC_COLLATE=en_US.utf8
results in an alphabetical-equivalence sort, with punctuation ignored and characters within the same equivalence class treated equally.