Can grep output only specified groupings that match?
Say I have a file:
# file: 'test.txt'
foobar bash 1
bash
foobar happy
foobar
I only want to know what words appear after “foobar”, so I can use this regex:
"foobar (w+)"
The parenthesis indicate that I have a special interest in the word right after foobar. But when I do a grep "foobar (w+)" test.txt
, I get the entire lines that match the entire regex, rather than just “the word after foobar”:
foobar bash 1
foobar happy
I would much prefer that the output of that command looked like this:
bash
happy
Is there a way to tell grep to only output the items that match the grouping (or a specific grouping) in a regular expression?
Standard grep can’t do this, but recent versions of GNU grep can. You can turn to sed, awk or perl. Here are a few examples that do what you want on your sample input; they behave slightly differently in corner cases.
Replace foobar word other stuff
by word
, print only if a replacement is done.
sed -n -e 's/^foobar ([[:alnum:]]+).*/1/p'
If the first word is foobar
, print the second word.
awk '$1 == "foobar" {print $2}'
Strip foobar
if it’s the first word, and skip the line otherwise; then strip everything after the first whitespace and print.
perl -lne 's/^foobars+// or next; s/s.*//; print'
Well, if you know that foobar is always the first word or the line, then you can use cut. Like so:
grep "foobar" test.file | cut -d" " -f2
GNU grep has the -P
option for perl-style regexes, and the -o
option to print only what matches the pattern. These can be combined using look-around assertions (described under Extended Patterns in the perlre manpage) to remove part of the grep pattern from what is determined to have matched for the purposes of -o
.
$ grep -oP 'foobar Kw+' test.txt
bash
happy
$
The K
is the short-form (and more efficient form) of (?<=pattern)
which you use as a zero-width look-behind assertion before the text you want to output. (?=pattern)
can be used as a zero-width look-ahead assertion after the text you want to output.
For instance, if you wanted to match the word between foo
and bar
, you could use:
$ grep -oP 'foo Kw+(?= bar)' test.txt
or (for symmetry)
$ grep -oP '(?<=foo )w+(?= bar)' test.txt
If PCRE is not supported you can achieve the same result with two invocations of grep. For example to grab the word after foobar do this:
<test.txt grep -o 'foobar *[^ ]*' | grep -o '[^ ]*$'
This can be expanded to an arbitrary word after foobar like this (with EREs for readability):
i=1
<test.txt egrep -o 'foobar +([^ ]+ +){'$i'}[^ ]+' | grep -o '[^ ]*$'
Output:
1
Note the index i
is zero-based.
sed -n "s/^.*foobars*(S*).*$/1/p"
-n suppress printing
s substitute
^.* anything before foobar
foobar initial search match
s* any white space character (space)
( start capture group
S* capture any non-white space character (word)
) end capture group
.*$ anything after the capture group
1 substitute everything with the 1st capture group
p print it
pcregrep
has a smarter -o
option
that lets you choose which capturing groups you want output.
So, using your example file,
$ pcregrep -o1 "foobar (w+)" test.txt
bash
happy
Using grep
is not cross-platform compatible, since -P
/--perl-regexp
is only available on GNU grep
, not BSD grep
.
Here is the solution using ripgrep
:
$ rg -o "foobar (w+)" -r '$1' <test.txt
bash
happy
As per man rg
:
-r
/--replace REPLACEMENT_TEXT
Replace every match with the text given.Capture group indices (e.g.,
$5
) and names (e.g.,$foo
) are supported in the replacement string.
Related: GH-462.
I found the answer of @jgshawkey very helpful. grep
is not such a good tool for this, but sed is, although here we have an example that uses grep to grab a relevant line.
Regex syntax of sed is idiosyncratic if you are not used to it.
Here is another example: this one parses output of xinput to get an ID integer
⎜ ↳ SynPS/2 Synaptics TouchPad id=19 [slave pointer (2)]
and I want 19
export TouchPadID=$(xinput | grep 'TouchPad' | sed -n "s/^.*id=([[:digit:]]+).*$/1/p")
Note the class syntax:
[[:digit:]]
and the need to escape the following +
I assume only one line matches.