Difference between non-graphic characters and non-printable characters

My system:

  • Ubuntu 22.04.3 LTS
  • GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)

man ls describes -b as follows:

   -b, --escape
          print C-style escapes for nongraphic characters

The Wikipedia page for "control character" states:

a control character or non-printing character (NPC) is a code point
in a character set that does not represent a written character or
symbol. All other characters are mainly graphic characters, also known
as printing characters (or printable characters), except perhaps for
"space" characters.

This is ambiguous.

What authoritative resource explains what nongraphic characters are, and how this term may differ from non-printing characters?

Asked By: yossi-matkal

||

This Bash script tabulates the character classes associated with each character in the ASCII set (according to GNU/awk definitions).

#! /bin/bash --

Awk='
BEGIN {
    Ctl1 = "SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI ";
    Ctl2 = "DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US SPACE";
    split (Ctl1 Ctl2, Ctl); Ctl[0] = "NUL"; Ctl[127] = "DEL";
    C = "cntrl print graph space blank punct alnum alpha digit lower upper xdigit";
    split (C, Class); 
}
function Char (n, ch, Local, j) {

    printf ("0x%.2X  %5s", n, (n <= 32 || n == 127) ? Ctl[n] : ch);
    for (j = 1; j in Class; ++j) 
        if (ch ~ "[[:" Class[j] ":]]") printf ("  :%s:", Class[j]);
    printf ("n");
}
{ for (j = 0; j < 128; j++) Char( j, sprintf ("%c", j)); }
'
    echo | awk -f <( printf '%s' "${Awk}" ) 
    
Answered By: Paul_Pedant

The graphic characters would be the one for which the isgraph()/iswgraph() standard functions return true or the ones matched by the [[:graph:]] regular expressions, that is the ones in the graph character class in the locale.

Per POSIX, the print class must be a superset of graph and be disjunct from cntrl and graph must be a superset of upper, lower, alpha, digit, xdigit, and punct and must not include the space (U+0020) character (with no mention of other whitespace characters).

The idea being that the graphic characters would be the ones for which ink would be used to draw them, while printable would be the non-control ones.

In practice, on GNU systems (such as Ubuntu) at least print is graph plus the non-control characters from the space class. Here with glibc 2.35 (as used on Ubuntu 22.04) and in UTF-8 locales, that includes:

U+0020 SPACE
U+1680 OGHAM SPACE MARK
U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE

While the space class has:

U+0009 CHARACTER TABULATION
U+000A LINE FEED
U+000B LINE TABULATION
U+000C FORM FEED
U+000D CARRIAGE RETURN
U+0020 SPACE
U+1680 OGHAM SPACE MARK
U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE
Answered By: Stéphane Chazelas
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.