Why does the POSIX for-loop allow a `for name in ; do` (no words in list) syntax?

I’m trying to understand the for-loop syntax in the POSIX shell grammar rules:

for_clause       : For name                                      do_group
                 | For name                       sequential_sep do_group
                 | For name linebreak in          sequential_sep do_group
                 | For name linebreak in wordlist sequential_sep do_group

The last (pretty much the default) and the two first makes sense, quoting from posix:

Omitting:
in word...
shall be equivalent to:
in "$@"

so the first two are just looping over the arguments to the shell script.

I just don’t understand how a for-loop consisting of for name in; do echo $name; done makes sense. I thought it was still just looping over the arguments to the shell script, but creating a small script it doesn’t seem to be the case. The for loop just kinda seems to get ignore.
So what is the purpose of that third variation?

Asked By: justsome631

||

It’s just that wordlist is defined as:

wordlist         : wordlist WORD
                 |          WORD

So one or more words, but the for var in ...; do ...; done loop can loop over zero or more words, so it needs For name linebreak in sequential_sep do_group to cover the case of zero words.

Having wordlist defined as zero or more words:

wordlist         : wordlist WORD
                 | /* empty */

Would probably have been less confusing.

If you look at SUSv2, there was:

 for_clause       : For name linebreak                            do_group
                  | For name linebreak in wordlist sequential_sep do_group
 wordlist         : wordlist WORD
                  |          WORD

Meaning that an empty list was not allowed. So it looks like what happened is that in SUSv3, they fixed that omission by adding the:

For name linebreak in sequential_sep do_group

Rather than changing the definition of wordlist.


Note that we’re talking of the shell language grammar here, so it’s not about for i in $var; do ...; done or for i in $(some cmd); do ...; done where $var/$(some cmd) expansions could result in an empty list. Those $var, $(some cmd) are each one WORD token in POSIX formalisation of the shell language.

We are talking of for i in; do ...; done, where there is literally no WORD in between the in and the do.

I agree that code is not particularly useful. There is little reason for one to write a loop that explicitly loops over nothing. Some reasons that one could think of:

Generated code:

if ...; then
  list=' "foo" "bar baz" "$var" '
else
  list=
fi

eval '
  for i in '"$list"'; do
    printf "%sn" "$i"
  done
'

Or anything that generates sh scripts and constructs a for loop over explicit arguments.

Or to comment out some code:

for commented_out in; do
  this code is commented out
done

Though something like:

:||:<<'EOF'
  this code is commented out and (harmless) even
  if it contains invalid syntax
EOF

Would be more idiomatic.

I did not find the change request that led to the modification of the standard, but I suppose they changed it mainly because all existing shell implementations allowed for i in; do ...; done, so there was no point forbidding it in the standard.

You’ll find that it doesn’t allow if then ...; else ...; fi nor if; then ...; else ...; fi nor if ...; then;else ...; fi¹ (but allows if $empty; then $empty; else $empty; fi obviously) even if there’s no real good reason to forbid it, because most implementations (zsh being a notable exception), starting with the Bourne shell which introduced it and the Korn shell on which the POSIX sh specification is based in practice choke on it.


¹ Actually, in the Bourne shell that did not have the ! keyword to negate the exit status of a pipeline, you had to write if cmd; then :; else command if cmd fails; fi in place of if ! cmd; then command if cmd fails; fi, that is use an explicit : null-command in the the then part as the Bourne shell did not allow an empty then part.

Answered By: Stéphane Chazelas

The reasoning for this behavior is best illustrated with an example. Assume you have a variable named ${items} and you need to call a function on each word in that variable, but it might evaluate to an empty string in some cases.

With POSIX compliant behavior, you can just use:

for x in ${items} ; do
   do_something(x)
done

Here, it doesn’t matter if ${items} evaluates to an empty string or not, because if it does, the loop will have nothing to loop over and will just be skipped.

But if the shell handled the case of an empty word list differently (either not at all, or equivalent to one of the forms without the in), then you would instead need (at minimum):

if [ -n "${items}" ]; then
    for x in ${items} ; do
        do_something(x)
    done
fi

So this behavior saves you (at minimum) a level of indentation and an extra conditional check in the relatively common case of having to iterate over a potentially empty list of items of unknown length. Many other languages that have for loops which iterate over a collection of items (such as Python or JavaScript) also behave in exactly the same manner, they just explicitly require a variable (or literal) to be provided to the loop, while shell script does not appear to because of how a literal list of items to loop over is defined (namely, an empty one is just an unquoted empty string).

Answered By: Austin Hemmelgarn
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.