Bash sort lines starting with punctuation in non-dictionary order

I have a file containing lines, some of which start with a ! character, some with a ? character, and some with a space ( ) character. The second character is always a letter of the alphabet.

When I try to use the bash sort command from coreutils, this seems to ignore the the first character, and sort according to the second only.

This surprised me very much, as I assumed the sort would treat the punctuation numbers by their ascii value, and lump all the ! lines together, followed by all the ? lines together, etc.

In particular, the documentation says there’s a -d option, which explicitly instructs the sort command to ignore such punctuation marks. But what I want is the opposite behaviour, and there’s no option to ‘reverse’ this behaviour. It’s as if the -d option has been "baked in" somehow.

I have checked, and as far as I know, I don’t have an alias defined somewhere that might activate the -d flag by accident.

Is this a bug in sort? (coreutils v8.32). Is there a way to force it NOT to sort by dictionary order but by strict ascii value?

OS: Linux Mint 21.1 (based on ubuntu jammy, afaik) in case this is relevant

EDIT: Providing locale and MVP as requested

$ locale
LANG=en_GB.UTF-8
LANGUAGE=en_GB:en
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC=en_GB.UTF-8
LC_TIME=en_GB.UTF-8
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY=en_GB.UTF-8
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER=en_GB.UTF-8
LC_NAME=en_GB.UTF-8
LC_ADDRESS=en_GB.UTF-8
LC_TELEPHONE=en_GB.UTF-8
LC_MEASUREMENT=en_GB.UTF-8
LC_IDENTIFICATION=en_GB.UTF-8
LC_ALL=

$ echo '
> !a
> ?b
>  c
> !f
>  e
> ?d' | sort 

!a
?b
 c
?d
 e
!f

You probably want to sort in the C locale. Ex. given

$ printf '%2sn' '!a' '?b' 'c' '!f' 'e' '?d'
!a
?b
 c
!f
 e
?d

then

$ printf '%2sn' '!a' '?b' 'c' '!f' 'e' '?d' | LC_COLLATE=C sort
 c
 e
!a
!f
?b
?d

or perhaps better, use LC_ALL=C since according to info sort the former is affected by other variables:

———- Footnotes ———-

(1) If you use a non-POSIX locale (e.g., by setting ‘LC_ALL’ to
‘en_US’), then ‘sort’ may produce output that is sorted differently
than you’re accustomed to. In that case, set the ‘LC_ALL’ environment
variable to ‘C’. Note that setting only ‘LC_COLLATE’ has two
problems. First, it is ineffective if ‘LC_ALL’ is also set. Second,
it has undefined behavior if ‘LC_CTYPE’ (or ‘LANG’, if ‘LC_CTYPE’ is
unset) is set to an incompatible value. For example, you get
undefined behavior if ‘LC_CTYPE’ is ‘ja_JP.PCK’ but ‘LC_COLLATE’ is
‘en_US.UTF-8’.

Answered By: steeldriver
Categories: Answers Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.