cut for key-value pairs

I am looking for an easy way to cut key-value pairs (where the key is unique and specified) from text, much like how cut can be used to cut a specified column in a CSV file. The keys I’m looking for are not always in the same relative position in the line — that is, using cut followed by sed won’t do the trick, because they key I’m looking for is not always in the same ,-delimited column.

The text in question is indeed CSV, but it so happens that the values are key/value pairs, delimited with an =.

For example, I might parse a file with the following three lines:

Foo=1, Bar=2, Baz=3
Bar=4, Foo=2, Baz=3
Bar=42, Baz=42, Foo=3

And I would like to cut this text to yield the key/value pair for a specific key. If I was looking for Foo, then my desired output would be:

Foo=1
Foo=2
Foo=3

Ideally I would like a command-line tool that has similar syntax as cut, and can read both from stdin and from a file.

Is there such a tool?

Asked By: John Dibling

||

grep can do this with -o option:

grep -o 'Foo=[^,]*' file
Answered By: jimmij

Depending on the complexity of your real-world situation, this sed command may be sufficient:

sed -n 's/^.*(<Foo=[^,]*).*/1/p'

Here’s the worked example

FIELD='Foo'
sed -n "s/^.*(<${FIELD}=[^,]*).*/1/p' << xxEOFxx
Foo=1, Bar=2, Baz=3
Bar=4, Foo=2, Baz=3
Bar=42, Baz=42, Foo=3
xxEOFxx
Foo=1
Foo=2
Foo=3
Answered By: roaima

Given your example, a brittle solution could involve cut:

tr ', ' '[n*]' <input | cut -sd F -f1-

…which would put each key/value pair on a separate line by transforming the intervening commas and spaces into newlines, and then cutting out lines which don’t contain an F. But that is a highly specialized example, and can only work if you can be sure an F only occurs in the wanted key/value pairs.

Otherwise, sed would be what I would use:

sed 'y/ ,/nn/;/^Foo=/P;D' <input

Which would also transform intervening commas and spaces into newlines, but then only Print those key/value pairs which begin w/ the string Foo=. So long as the spaces are reliable separators, the above would work to portably print the Foo key/value pairs each on a separate line no matter how many times they might occur on an input line, and would print nothing else – even for lines which do not contain the key/values you want printed.

Answered By: mikeserv

If you don’t have a grep available with the -o option, this ought to do the trick as well:

sed -e 's/, /n/g' | grep '^Foo='

That’s using sed to replace every comma+space with a newline (breaking each key-value pair onto its own line), and then grep to search for only the ‘Foo’ key.

Test case:

printf "%sn" "Foo=1, Bar=2, Baz=3" "Bar=4, Foo=2, Baz=3" "Bar=42, Baz=42, Foo=3" 
    | sed -e 's/, /n/g' | grep '^Foo='
Answered By: godlygeek

The awk solution to round up the list of alternatives:

awk -v RS=', ' -F'=' '$1=="Foo"' <file>

Treat each record to be delimited by ', ', and split each record into fields on the = character (using -F) as well. Then it’s just a matter of matching on the first field $1. The suggestion shown here is a simple string matching, feel free to use regexes, e.g. $1~/<Foo>/.

Answered By: h.j.k.
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.