Why is using a shell loop to process text considered bad practice?

Is using a while loop to process text generally considered bad practice in POSIX shells?

As Stéphane Chazelas pointed out, some of the reasons for not using shell loop are conceptual, reliability, legibility, performance and security.

This answer explains the reliability and legibility aspects:

while IFS= read -r line <&3; do
  printf '%sn' "$line"
done 3< "$InputFile"

For performance, the while loop and read are tremendously slow when reading from a file or a pipe, because the read shell built-in reads one character at a time.

How about conceptual and security aspects?

Asked By: cuonglm


As far as conceptual and legibility goes, shells typically are interested in files. Their “addressable unit” is the file, and the “address” is the file name. Shells have all kinds of methods of testing for file existence, file type, file name formatting (beginning with globbing). Shells have very few primitives for dealing with file contents. Shell programmers have to invoke another program to deal with file contents.

Because of the file and file name orientation, doing text manipulation in the shell is really slow, as you’ve noted, but also requires an unclear and contorted programming style.

Answered By: user732

Yes, we see a number of things like:

while read line; do
  echo $line | cut -c3

Or worse:

for line in $(cat file); do
  foo=$(echo $line | awk '{print $2}')
  bar=$(echo $line | awk '{print $3}')
  doo=$(echo $line | awk '{print $5}')
  echo $foo whatever $doo $bar

(don’t laugh, I’ve seen many of those).

Generally from shell scripting beginners. Those are naive literal translations of what you would do in imperative languages like C or python, but that’s not how you do things in shells, and those examples are very inefficient (that latter one spawns six child processes for each input line), completely unreliable (potentially leading to security issues), and if you ever manage to fix most of the bugs, your code becomes illegible.


In C or most other languages, building blocks are just one level above computer instructions. You tell your processor what to do and then what to do next. You take your processor by the hand and micro-manage it: you open that file, you read that many bytes, you do this, you do that with it.

Shells are a higher level language. One may say it’s not even a language. They’re before all command line interpreters. The job is done by those commands you run and the shell is only meant to orchestrate them.

One of the great things that Unix introduced was the pipe and those default stdin/stdout/stderr streams that all commands handle by default.

In 50 years, we’ve not found better than that API to harness the power of commands and have them cooperate to a task. That’s probably the main reason why people are still using shells today.

You’ve got a cutting tool and a transliterate tool, and you can simply do:

cut -c4-5 < in | tr a b > out

The shell is just doing the plumbing (open the files, setup the pipes, invoke the commands) and when it’s all ready, it just flows without the shell doing anything. The tools do their job concurrently, efficiently at their own pace with enough buffering so as not one blocking the other, it’s just beautiful and yet so simple.

Invoking a tool though has a cost (and we’ll develop that on the performance point). Those tools may be written with thousands of instructions in C. A process has to be created, the tool has to be loaded, initialised, then cleaned-up, process destroyed and waited for.

Invoking cut is like opening the kitchen drawer, take the knife, use it, wash it, dry it, put it back in the drawer. When you do:

while read line; do
  echo $line | cut -c3
done < file

It’s like for each line of the file, getting the read tool from the kitchen drawer (a very clumsy one because it’s not been designed for that), read a line, wash your read tool, put it back in the drawer. Then schedule a meeting for the echo and cut tool, get them from the drawer, invoke them, wash them, dry them, put them back in the drawer and so on.

Some of those tools (read and echo) are built in most shells, but that hardly makes a difference here since echo and cut still need to be run in separate processes.

It’s like cutting an onion but washing your knife and put it back in the kitchen drawer between each slice.

Here the obvious way is to get your cut tool from the drawer, slice your whole onion and put it back in the drawer after the whole job is done.

IOW, in shells, especially to process text, you invoke as few utilities as possible and have them cooperate to the task, not run thousands of tools in sequence waiting for each one to start, run, clean up before running the next one.

Further reading in Bruce’s fine answer. The low-level text processing internal tools in shells (except maybe for zsh) are limited, cumbersome, and generally not fit for general text processing.


As said earlier, running one command has a cost. A huge cost if that command is not builtin, but even if they are builtin, the cost is big.

And shells have not been designed to run like that, they have no pretension to being performant programming languages. They are not, they’re just command line interpreters. So, little optimisation has been done on this front.

Also, the shells run commands in separate processes. Those building blocks don’t share a common memory or state. When you do a fgets() or fputs() in C, that’s a function in stdio. stdio keeps internal buffers for input and output for all the stdio functions, to avoid to do costly system calls too often.

The corresponding even builtin shell utilities (read, echo, printf) can’t do that. read is meant to read one line. If it reads past the newline character, that means the next command you run will miss it. So read has to read the input one byte at a time (some implementations have an optimisation if the input is a regular file in that they read chunks and seek back, but that only works for regular files and bash for instance only reads 128 byte chunks which is still a lot less than text utilities will do).

Same on the output side, echo can’t just buffer its output, it has to output it straight away because the next command you run will not share that buffer.

Obviously, running commands sequentially means you have to wait for them, it’s a little scheduler dance that gives control from the shell and to the tools and back. That also means (as opposed to using long running instances of tools in a pipeline) that you cannot harness several processors at the same time when available.

Between that while read loop and the (supposedly) equivalent cut -c3 < file, in my quick test, there’s a CPU time ratio of around 40000 in my tests (one second versus half a day). But even if you use only shell builtins:

while read line; do
  echo ${line:2:1}

(here with bash), that’s still around 1:600 (one second vs 10 minutes).


It’s very hard to get that code right. The examples I gave are seen too often in the wild, but they have many bugs.

read is a handy tool that can do many different things. It can read input from the user, split it into words to store in different variables. read line does not read a line of input, or maybe it reads a line in a very special way. It actually reads words from the input those words separated by $IFS and where backslash can be used to escape the separators or the newline character.

With the default value of $IFS, on an input like:


read line will store "foo/bar baz" into $line, not " foo/bar " as you’d expect.

To read a line, you actually need:

IFS= read -r line

That’s not very intuitive, but that’s the way it is, remember shells were not meant to be used like that.

Same for echo. echo expands sequences. You can’t use it for arbitrary contents like the content of a random file. You need printf here instead.

And of course, there’s the typical forgetting of quoting your variable which everybody falls into. So it’s more:

while IFS= read -r line; do
  printf '%sn' "$line" | cut -c3
done < file

Now, a few more caveats:

  • except for zsh, that doesn’t work if the input contains NUL characters while at least GNU text utilities would not have the problem.
  • if there’s data after the last newline, it will be skipped
  • inside the loop, stdin is redirected so you need to pay attention that the commands in it don’t read from stdin.
  • for the commands within the loops, we’re not paying attention to whether they succeed or not. Usually, error (disk full, read errors…) conditions will be poorly handled, usually more poorly than with the correct equivalent. Many commands, including several implementations of printf also don’t reflect their failure to write to stdout in their exit status.

If we want to address some of those issues above, that becomes:

while IFS= read -r line <&3; do
    printf '%sn' "$line" | cut -c3 || exit
  } 3<&-
done 3< file
if [ -n "$line" ]; then
    printf '%s' "$line" | cut -c3 || exit

That’s becoming less and less legible.

There are a number of other issues with passing data to commands via the arguments or retrieving their output in variables:

  • the limitation on the size of arguments (some text utility implementations have a limit there as well, though the effect of those being reached are generally less problematic)
  • the NUL character (also a problem with text utilities).
  • arguments taken as options when they start with - (or + sometimes)
  • various quirks of various commands typically used in those loops like expr, test
  • the (limited) text manipulation operators of various shells that handle multi-byte characters in inconsistent ways.

Security considerations

When you start working with shell variables and arguments to commands, you’re entering a mine-field.

If you forget to quote your variables, forget the end of option marker, work in locales with multi-byte characters (the norm these days), you’re certain to introduce bugs which sooner or later will become vulnerabilities.

When you may want to use loops

Using a shell loop to process text may make sense when your task involves what the shell is good at: launching external programs.

E.g. a loop like the following might make some sense:

while IFS= read -r line; do
    someprog -f "$line"
done < file-list.txt

though the simple case above where the input is passed unmodified to someprog could be also done with e.g. xargs:

<file-list.txt tr 'n' '' | xargs -r0 -n1 someprog -f 

Or with GNU xargs:

xargs -rd 'n' -n1 -a file-list.txt someprog -f
Answered By: Stéphane Chazelas

There are some complicated answers, giving a lot of interesting details for the geeks among us, but it’s really quite simple – processing a large file in a shell loop is just too slow.

I think the questioner is interested in a typical kind of shell script, which may start with some command-line parsing, environment setting, checking files and directories, and a bit more initialization, before getting on to its main job: going through a large line-oriented text file.

For the first parts (initialization), it doesn’t usually matter that shell commands are slow – it’s only running a few dozen commands, maybe with a couple of short loops.
Even if we write that part inefficiently, it’s usually going to take less than a second to do all that initialization, and that’s fine – it only happens once.

But when we get on to processing the big file, which could have thousands or millions of lines, it is not fine for the shell script to take a significant fraction of a second (even if it’s only a few dozen milliseconds) for each line, as that could add up to hours.

That’s when we need to use other tools, and the beauty of Unix shell scripts is that they make it very easy for us to do that.

Instead of using a loop to look at each line, we need to pass the whole file through a pipeline of commands.
This means that, instead of calling the commands thousands or millions of time, the shell calls them only once.
It’s true that those commands will have loops to process the file line-by-line, but they are not shell scripts and they are designed to be fast and efficient.

Unix has many wonderful built in tools, ranging from the simple to the complex, that we can use to build our pipelines. I would usually start with the simple ones, and only use more complex ones when necessary.

I would also try to stick with standard tools that are available on most systems, and try to keep my usage portable, although that’s not always possible. And if your favourite language is Python or Ruby, maybe you won’t mind the extra effort of making sure it’s installed on every platform your software needs to run on 🙂

Simple tools include head, tail, grep, sort, cut, tr, sed, join (when merging 2 files), and awk one-liners, among many others.
It’s amazing what some people can do with pattern-matching and sed commands.

When it gets more complex, and you really have to apply some logic to each line, awk is a good option – either a one-liner (some people put whole awk scripts in ‘one line’, although that’s not very readable) or in a short external script.

As awk is an interpreted language (like your shell), it’s amazing that it can do line-by-line processing so efficiently, but it’s purpose-built for this and it’s really very fast.

And then there’s Perl and a huge number of other scripting languages that are very good at processing text files, and also come with lots of useful libraries.

And finally, there’s good old C, if you need maximum speed and high flexibility (although text processing is a bit tedious).
But it’s probably a very bad use of your time to write a new C program for every different file-processing task you come across.
I work with CSV files a lot, so I have written several generic utilities in C that I can re-use in many different projects. In effect, this expands the range of ‘simple, fast Unix tools’ that I can call from my shell scripts, so I can handle most projects by only writing scripts, which is much faster than writing and debugging bespoke C code each time!

Some final hints:

  • don’t forget to start your main shell script with export LANG=C, or many tools will treat your plain-old-ASCII files as Unicode, making them much much slower
  • also consider setting export LC_ALL=C if you want sort to produce consistent ordering, regardless of the environment!
  • if you need to sort your data, that will probably take more time (and resources: CPU, memory, disk) than everything else, so try to minimize the number of sort commands and the size of the files they’re sorting
  • a single pipeline, when possible, is usually most efficient – running multiple pipelines in sequence, with intermediate files, may be more readable and debug-able, but will increase the time that your program takes
Answered By: Laurence Renshaw

Yes, but…

The correct answer of Stéphane Chazelas is based on concept of delegating every text operation to specific binaries, like grep, awk, sed and others.

As is capable of doing a lot of things by itself, dropping forks may become quicker (even than running another interpreter for doing all job).

For sample, have a look on this post:




test and compare…

Of course

There is no consideration about user input and security!

Don’t write web application under !!

But for a lot of server administration tasks, where could be used in place of , using built-ins bash could be very efficient.

My opinion:

Writing tools like bin utils is not same kind of work than system administration.

So not same people!

Where sysadmins have to know shell, they could write prototypes by using his preferred (and best known) tool.

If this new utility (prototype) is really useful, some other people could develop dedicated tool by using some more appropriated language.

The accepted answer is good as it states clearly the drawbacks of parsing text files in the shell, but people have been cargo culting the main idea (mainly, that shell scripts deal poorly with text processing tasks) to criticize anything that uses a shell loop.

There is nothing inherently wrong with shell loops to the extent there is nothing wrong with loops in shell scripts or command substitutions outside of loops. It is certainly true that in most cases, you can replace them with more idiomatic constructs. For example, instead of writing

for i in $(find . -iname "*.txt"); do

write this:

for i in *.txt; do

In other scenarios, it is better to rely on more specialized tools such as awk, sed, cut, join, paste, datamash, miller, general purpose programming languages with good text processing capabilities (e.g. perl, python, ruby) or parsers for specific file types (XML, HTML, JSON)

Having said that, using a shell loop is the right call as long as you know:

  1. Performance is not a priority. Is it important that your script runs fast? Are you running a task once every few hours as a cron job? Then maybe performance is not an issue. Or if it is, run benchmarks to make sure your shell loop is not a bottleneck. Intuition or preconceptions about what tool is "fast" or "slow" cannot serve as a replacement of accurate benchmarks.
  2. Legibility is maintained. If you’re adding too much logic in your shell loop that it becomes hard to follow, then you may need to rethink this approach.
  3. Complexity does not increase substantially.
  4. Security is preserved.
  5. Testability doesn’t become an issue. Properly testing shell script is already difficult. If using external commands makes it more difficult to know when you have a bug in your code or you’re working under incorrect assumptions about return values, then that’s a problem.
  6. The shell loop has the same semantics as the alternative or the differences don’t matter for what you’re doing at the moment. For example, the find command above recurses into subdirectories and matches files whose names start with .. (Both are likely to have problems if you have files with spaces in their names.)

As an example demonstrating it is not an impossible task to satisfy the previous statements, this is the pattern used in an installer for a well-known commercial software:

MD5=... # embedded checksum
for s in $sizes
    checksum=`echo $VAR | cut -d" " -f $i`
    if <checksum condition>; then
       md5=`echo $MD5 | cut -d" " -f $i

This runs for a very small number of times, its purpose is clear, it is concise and doesn’t increase complexity unnecessarily, no user-controlled input is used and therefore, security is not a concern. Does it matter that it is invoking additional processes in a loop? Not at all.

Answered By: r_31415
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.