Untangling pathname expansion and quote removal in echo 'a'*

Shell is: GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)

In the current working directory, there are two files:

  • a file named abc.txt
  • a file named 'a'bc.txt (created with touch 'a'bc.txt)

I run the following command:

echo 'a'*

The output is:

abc.txt

The GNU bash manual specifies that quote removal is processed AFTER pathname expansion.

Therefore, I expected this command to match 'a'bc.txt but NOT to match abc.txt. !

I expected the above command to proceed as follows:

  • at the pathname expansion stage, try to match any file with a filename
    that starts with 'a' ('a' taken as a literal string), and THUS
    match 'a'bc.txt
  • at the quote removal stage, remove the single quotes ' in 'a'
    without impacting the results of the pathname expansion that took place at the previous
    step.

There’s obviously something I don’t understand here.

I could NOT find any documentation or answers to this specific question.

Asked By: yossi-matkal

||

The way you describe it would make it impossible to match filenames that e.g. start with a *. As it currently works, one can write '*'*, where the first asterisk is quoted, and as such, a literal, while the second retains the special meaning of matching anything. If the quotes (and symmetrically, backslashes) themselves would require matching characters to be found in the filename, that would be impossible.


I’m not sure how the internal implementation of the shell works, or what the history behind the phrase "quote removal" is, but I find it best to consider the state of being quoted a (hidden) property of the characters, instead of thinking the quote characters actually being there as separate entities after the command line was initially processed.

So, when you write '*'*, you get **, where the (hopefully at least barely visible) bolding marks the character as being quoted. Then, if you want to match a literal quote, you need to quote or escape that. E.g. "'*'"* would give '*'*, i.e. quoted quote-asterisk-quote, and a normally-special asterisk. (I think I’ve heard that some early shell implementation used the 8th bit of the byte to mark quoted characters, but of course that only works with 7-bit charsets.)

Or, if you like, just think of the quotes as special characters that are only used for determining if another character is quoted or not, and not for matching against characters in the target string.

Answered By: ilkkachu
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.