Why is `while IFS= read` used so often, instead of `IFS=; while read..`?

It seems that normal practice would put the setting of IFS outside the while loop in order to not repeat setting it for each iteration… Is this just a habitual “monkey see, monkey do” style, as it has been for this monkey until I read man read, or am I missing some subtle (or blatantly obvious) trap here?

Asked By: Peter.O

||

The trap is that

IFS=; while read..

sets the IFS for the whole shell environment outside the loop, whereas

while IFS= read

redefines it only for the read invocation (except in the Bourne shell).
You can check that doing a loop like

while IFS= read xxx; ... done

then after such loop, echo "blabalbla $IFS ooooooo" prints

blabalbla
 ooooooo

whereas after

IFS=; read xxx; ... done

the IFS stays redefined: now echo "blabalbla $IFS ooooooo" prints

blabalbla  ooooooo

So if you use the second form, you have to remember to reset : IFS=$' tn'.



The second part of this question has been merged here, so I’ve removed the related answer from here.

Answered By: rozcietrzewiacz

Let’s look at an example, with some carefully-crafted input text:

text=' hello  world
foobar'

That’s two lines, the first beginning with a space and ending with a backslash. First, let’s look at what happens without any precautions around read (but using printf '%sn' "$text" to carefully print $text without any risk of expansion). (Below, $ ‌ is the shell prompt.)

$ printf '%sn' "$text" |
  while read line; do printf '%sn' "[$line]"; done
[hello worldfoobar]

read ate up the backslashes: backslash-newline causes the newline to be ignored, and backslash-anything ignores that first backslash. To avoid backslashes being treated specially, we use read -r.

$ printf '%sn' "$text" |
  while read -r line; do printf '%sn' "[$line]"; done
[hello  world]
[foobar]

That’s better, we have two lines as expected. The two lines almost contain the desired content: the double space between hello and world has been retained, because it’s within the line variable. On the other hand, the initial space was eaten up. That’s because read reads as many words as you pass it variables, except that the last variable contains the rest of the line — but it still starts with the first word, i.e. the initial spaces are discarded.

So, in order to read each line literally, we need to make sure that no word splitting is going on. We do this by setting the IFS variable to an empty value.

$ printf '%sn' "$text" |
  while IFS= read -r line; do printf '%sn' "[$line]"; done
[ hello  world]
[foobar]

Note how we set IFS specifically for the duration of the read built-in. The IFS= read -r line sets the environment variable IFS (to an empty value) specifically for the execution of read.
This is an instance of the general simple command syntax: a (possibly empty) sequence of variable assignments followed by a command name and its arguments (also, you can throw in redirections at any point). Since read is a built-in, the variable never actually ends up in an external process’s environment; nonetheless the value of $IFS is what we’re assigning there as long as read is executing¹. Note that read is not a special built-in, so the assignment does last only for its duration.

Thus we’re taking care not to change the value of IFS for other instructions that may rely on it. This code will work no matter what the surrounding code has set IFS to initially, and it will not cause any trouble if the code inside the loop relies on IFS.

Contrast with this code snippet, which looks files up in a colon-separated path. The list of file names is read from a file, one file name per line.

IFS=":"; set -f
while IFS= read -r name; do
  for dir in $PATH; do
    ## At this point, "$IFS" is still ":"
    if [ -e "$dir/$name" ]; then echo "$dir/$name"; fi
  done
done <filenames.txt

If the loop was while IFS=; read -r name; do …, then for dir in $PATH would not split $PATH into colon-separated components. If the code was IFS=; while read …, it would be even more obvious that IFS is not set to : in the loop body.

Of course, it would be possible to restore the value of IFS after executing read. But that would require knowing the previous value, which is extra effort. IFS= read is the simple way (and, conveniently, also the shortest way).

¹ And, if read is interrupted by a trapped signal, possibly while the trap is executing — this is not specified by POSIX and depends on the shell in practice.

Apart from the (already clarified) IFS scoping differences between the while IFS='' read, IFS=''; while read and while IFS=''; read idioms (per-command vs script/shell-wide IFS variable scoping), the take-home lesson is that you lose the leading and trailing spaces of an input line if the IFS variable is set to (contain a) space.

This can have pretty serious consequences if file paths are being processed.

Therefore setting the IFS variable to the empty string is anything but a bad idea since it ensures that a line’s leading and trailing whitespace does not get stripped.

See also: Bash, read line by line from file, with IFS

(
shopt -s nullglob
touch '  file with spaces   '
IFS=$' tn' read -r file <<<"$(printf '%s' *file*with*spaces*)"
ls -l "$file"
IFS='' read -r file <<<"$(printf '%s' *file*with*spaces*)"
ls -l "$file"
)
Answered By: jon

Inspired by Yuzem’s answer

If you want to set IFS to an actual character, this worked for me

iconv -f cp1252 zapni.tv.php | while IFS='#' read -d'#' line
do
  echo "$line"
done
Answered By: Zombo