How to write a script that can take input from stdout

I want to be able to write a script that can take stdout as an argument, if anything is piped into it (ultimately, I would like it to be polymorphic) –

The trouble is, I have searched and searched for how to do this with no avail – lots of alternative suggestions about how to do other things that are not – this:

cat /var/log/some.log | grep something | awk '{print $1 $6 $8}' | myscript

Why do that, instead of? : myscript $(!!)
At this point, solely to prove that it is possible…

I know that you can ‘read variable’ in a script, but say I don’t care about the lines – let’s say, I want to accept the whole of it as a blob of text and do something with it in the script –

Do I really have to :

while read x; do
stdin=$stdin" "$x;
done;

solely in order to read from STDIN ?

There must be a better way …

Asked By: rm-vanda

||

If you want to read all of stdin into a shell script, usually you just capture it into a temp file:

TMPFILE=$(mktemp -- "${TMPDIR:-/tmp}/${0##*/}.$$.XXXXXX") || exit
cat > "$TMPFILE"
# Script works with $TMPFILE and its contents,
# ultimately writing everything to stdout.
rm -f -- "$TMPFILE"

Even system utilities do things very much like this. sort has to have all of stdin before it can print anything to stdout, for example.

Answered By: user732

You can read from stdout by redirecting input from file descriptor 1. Stdout is file descriptor 1 by definition. The fact that file descriptor 1 is used for output is a matter of convention, not a technical obligation. However it’s a bizarre thing to do which is bound to confuse the people who use your script.

read line <&1

If you want to read a whole file, use cat. To stuff the content into a variable, use a command substitution.

whole_input=$(cat)
whole_input_from_stdout=$(cat <&1)

Some shells let you write $(</dev/stdin) as a slightly more efficient shortcut for $(cat).

Note that this strips trailing newlines. This behavior is built into command substitution. To retain trailing newlines, hide them behind another character and remove that character.

whole_input=$(cat; echo .); whole_input=${whole_input%?}

The shell variable will only contain the data up to the first null byte, if there is a null byte in the input. With some shells, you’ll get all the data with the null bytes stripped. Shells can’t deal with binary data. Zsh is an exception, it retains the null bytes.

You don’t have to get the whole lump if you don’t want to – you can chunk out stdin and work it as a stream if you like. I did some googling, though, and I don’t think I can offer you any advice on how to change the sex of your script, or whatever it is you meant by polymorphic.

In any case, I typically find that putting the whole of input aside in some storage is rarely what I’m after. I usually want to handle it in delimited lumps of some kind. Here’s an example of how you can get input of any kind split out into 4k chunks per while loop iteration:

splitin(){
    dd obs=4k | { j=$1 f=$2;shift 2 && 
    while dd bs=4k count=1 of="$f"  &&
          [ -s "$f" ] 
    do    "$j" "$@" < "$f"
    done
};  }

…which is a function you might call from your shell script with the name of a some job you want to perform on 4k input intervals and the name of a temp file it can use to store each latest chunk. Like:

splitin handle_chunk /tmp/work/chunk 
        other args to pass on as appropriate   
Answered By: mikeserv

My understanding is that you have to consider separately 1) the device, 2) the data stream interface in general, 3) the line reader-writer interface in particular, 4) the semantic.

stdout is a technical endpoint private to your process and with the same lifecycle, to write byte stream, not to read. You can duplicate a reference to it so it is referred to by another name, its capability is always output. It is created and deleted with the process. stdin is an equivalent on which you read byte stream. Its capability is always input. Depending on underlying device, they may have both capabilities, but the standard and reliable model is agnostic about that.

A byte stream is a general interface that accept two signals on its write-end: put, close and emits two corresponding signals on its read-end: byte, EOD. When the write-end (e.g stdout) receive put, the read-end has to get a byte. When the write-end accept close, the read-end get EOD. EOD (close) is sent either explicitly when the stdout descriptor is closed by exec >&- or when the process terminates.

A line reader or writer is a special way to use a byte stream, with an additional signal newline, but it is not inherent to stdin nor stdout, it is just data transformed to and from text lines because it is convenient, maniable and robust. This is a line protocol or discipline, historic standard, iterable and reasonably agnostic. The newline signal is a convention ; whatever it may be, it has to take one value. There is no alternative unless relying on another state but the device has only two states, opened or closed. In other words, the byte stream is device driven while the line reader is data driven.

Given that, supposing you want to communicate between two processes via a pipe, the minimal signal the speaker can send to the listener is byte and EOD, but this can’t be done by iteration as you would do with text lines because once closed (EOD sent), the file descriptor is not reusable (even while it may be hackable). But, as long as you don’t need to iterate, you can rely on EOD.

Then, let’s be realistic : polymorphism as you say, is not a matter of raw data in a byte stream, if you lose the iterable line model in doing that. Polymorphism is a matter of model on top of a simple iterable protocol, say line discipline, on top of a raw protocol, say byte stream, that can be implemented on socket, pipe, file, tape, printer or whatever.

However, on top of the stream model, there are only two iteration controls : either you know in advance how much to read, or you read one by one until a signal. Knowing that, you can adopt the discipline you prefer on top of the raw stream instead of the text lines, it is always a matter of parser. A parser is a machine that read a stream and builds other objects in its own language model.

And, to be an acceptable interface on top of a stream, it has to resume parsing at the exact point where the last iteration left it, sequentially and forward.

Now that we know that polymorphism + iteration => sequential parser, we can come back to standard and be very satisfied. What we need is Keep It Simple Stupid.

Answered By: Thibault LE PAUL

I think you mean that you want to write your script so it can process the stdout of a command either from its stdin or in its arguments.

Then, the obvious way to do it is to check the number of arguments.

If your script processes its input in a shell variable:

#! /bin/sh -

if [ "$#" -gt 0 ]; then
  # arguments joined with space, the first character of $IFS
  input="$*"
else
  input=$(cat) # stdin without the trailing newline characters
fi

printf 'I got "%s"n' "$input"

Or you could read each non-empty lines of stdin into the positional parameters ($@: $1, $2…) if not passed any argument:

#! /bin/sh -
NL='
'
if [ "$#" -eq 0 ]; then
  set -o noglob
  IFS=$NL
  set -- $(cat) # split+glob with splitting on NL and glob disabled
fi

echo "I got $# arguments:"
[ "$#" -eq 0 ] || printf ' - "%s"n' "$@"

Or if it processes its input as a stream:

#! /bin/sh -
main() {
  # process standard input as a stream 
  grep -i foo | tr '[:lower:]' '[:upper:]'
  # as an example
}

if [ "$#" -gt 0 ]; then
  # feed arguments as separate lines through a pipe:
  printf '%sn' "$@" | main
else
  # stdin just passed along
  main
fi

All can be invoked with:

cmd | the-script

Or:

the-script "$(cmd)"

(don’t forget the quotes without which the output of cmd would be subject to split+glob!)

See also:

Answered By: Stéphane Chazelas
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.