Piping awk's print/printf output into a shell command makes that statement run after all other unrelated print/printf statements

Given this awk script:

END  {
print "Y" | "cat" 

print "X"
print "X"

# Output: 
# X
# X
# Y

Why isn’t Y printed first given that it’s supposed to run before the other statements?

Asked By: Asker321


If you want the cat process to terminate (and the Y to be printed) before the Xs, then just call close("cat") after the print "Y" | "cat".

All the rest is explained in the manpage, which you better read.

Why isn’t Y printed first given that it’s supposed to run before the other statements?

The cat is not supposed to write its output and terminate before the other statements. It may write its output before, after or in between your two print "X" calls.

When you use something like print ... | "command ..." in awk, command .. is started as an asynchronous process with its stdin connected to a pipe (via popen("command ...", "w")), and that process will not necessarily terminate and write its output before you call close("command ...") (or that is implicitly done when awk terminates).

See an example like:

   print "foo" | "cat > file"
   print "bar" | "cat > file"

The result will be that file will contain both lines, foo and bar; the cat > file command will not be run separately for each line.

Answered By: guest

Redirections and pipes in awk are similar to redirections and pipes in sh, but there is one major difference. In sh, foo >bar keeps bar open only for the duration of the foo command, and foo | bar waits for both foo and bar to terminate. In awk, a redirection or pipe remains open until it’s closed explicitly, and redirecting or piping to the same file name or command multiple times reuses the open redirection/pipe.

For example, in sh, this prints a, b, c, a, b, because each sort command gets just two lines of input:

{ echo b; echo a; } | sort
echo c
{ echo b; echo a; } | sort

But in awk, this prints c, a, a, b, b (assuming that awk’s output is line-buffered, otherwise c could be delayed) because there is a single sort command and it won’t print anything until it has all of its input data, which only happens when the input side of the pipe gets closed.

{ print "b"; print "a"; } | "sort";
print "c";
{ print "b"; print "a"; } | "sort";

To make a piped command terminate, call the close function explicitly. Awk implicitly closes all open pipes and redirections when it exits. This prints a, b, c, a, b:

{ print "b"; print "a"; } | "sort"; close("sort");
print "c";
{ print "b"; print "a"; } | "sort"; close("sort");

Likewise, this awk snippet creates a two-line file, since foo is opened once by the first line and is still open when the second line runs:

print "hello" >"foo";
print "world" >"foo";

Whereas this sh snippet creates a one-line file, because the second line opens the file that was created by the first line and truncates it before writing world:

echo hello >foo
echo world >foo

The main reason awk is designed this way is that there’s an implicit loop around the processing of each line. In sh, if you want to process lines in a loop, you’d typically write the redirection around the loop:

while read line; do
  if condition "$line"; then
    process line
done >output

But in awk, there would be no way to apply the redirection to the implicit loop, so you write

condition($0) { process $0 >"output" }

The awk way is also more powerful because you can open and close pipes at will, even in the middle of a loop or other blocks. In sh, this is possible for redirections with the exec builtin, but not for pipes: a pipe has to be applied to a (possibly compound) command as a whole.

Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.