Can grep output only specified groupings that match?

Say I have a file:

# file: 'test.txt'
foobar bash 1
bash
foobar happy
foobar

I only want to know what words appear after “foobar”, so I can use this regex:

"foobar (w+)"

The parenthesis indicate that I have a special interest in the word right after foobar. But when I do a grep "foobar (w+)" test.txt, I get the entire lines that match the entire regex, rather than just “the word after foobar”:

foobar bash 1
foobar happy

I would much prefer that the output of that command looked like this:

bash
happy

Is there a way to tell grep to only output the items that match the grouping (or a specific grouping) in a regular expression?

Asked By: Cory Klein

||

Standard grep can’t do this, but recent versions of GNU grep can. You can turn to sed, awk or perl. Here are a few examples that do what you want on your sample input; they behave slightly differently in corner cases.

Replace foobar word other stuff by word, print only if a replacement is done.

sed -n -e 's/^foobar ([[:alnum:]]+).*/1/p'

If the first word is foobar, print the second word.

awk '$1 == "foobar" {print $2}'

Strip foobar if it’s the first word, and skip the line otherwise; then strip everything after the first whitespace and print.

perl -lne 's/^foobars+// or next; s/s.*//; print'

Well, if you know that foobar is always the first word or the line, then you can use cut. Like so:

grep "foobar" test.file | cut -d" " -f2
Answered By: Dave

GNU grep has the -P option for perl-style regexes, and the -o option to print only what matches the pattern. These can be combined using look-around assertions (described under Extended Patterns in the perlre manpage) to remove part of the grep pattern from what is determined to have matched for the purposes of -o.

$ grep -oP 'foobar Kw+' test.txt
bash
happy
$

The K is the short-form (and more efficient form) of (?<=pattern) which you use as a zero-width look-behind assertion before the text you want to output. (?=pattern) can be used as a zero-width look-ahead assertion after the text you want to output.

For instance, if you wanted to match the word between foo and bar, you could use:

$ grep -oP 'foo Kw+(?= bar)' test.txt

or (for symmetry)

$ grep -oP '(?<=foo )w+(?= bar)' test.txt
Answered By: camh

If PCRE is not supported you can achieve the same result with two invocations of grep. For example to grab the word after foobar do this:

<test.txt grep -o 'foobar  *[^ ]*' | grep -o '[^ ]*$'

This can be expanded to an arbitrary word after foobar like this (with EREs for readability):

i=1
<test.txt egrep -o 'foobar +([^ ]+ +){'$i'}[^ ]+' | grep -o '[^ ]*$'

Output:

1

Note the index i is zero-based.

Answered By: Thor
    sed -n "s/^.*foobars*(S*).*$/1/p"

-n     suppress printing
s      substitute
^.*    anything before foobar
foobar initial search match
s*    any white space character (space)
(     start capture group
S*    capture any non-white space character (word)
)     end capture group
.*$    anything after the capture group
1     substitute everything with the 1st capture group
p      print it
Answered By: jgshawkey

pcregrep has a smarter -o option
that lets you choose which capturing groups you want output. 
So, using your example file,

$ pcregrep -o1 "foobar (w+)" test.txt
bash
happy

Using grep is not cross-platform compatible, since -P/--perl-regexp is only available on GNU grep, not BSD grep.

Here is the solution using ripgrep:

$ rg -o "foobar (w+)" -r '$1' <test.txt
bash
happy

As per man rg:

-r/--replace REPLACEMENT_TEXT Replace every match with the text given.

Capture group indices (e.g., $5) and names (e.g., $foo) are supported in the replacement string.

Related: GH-462.

Answered By: kenorb

I found the answer of @jgshawkey very helpful. grep is not such a good tool for this, but sed is, although here we have an example that uses grep to grab a relevant line.

Regex syntax of sed is idiosyncratic if you are not used to it.

Here is another example: this one parses output of xinput to get an ID integer

⎜   ↳ SynPS/2 Synaptics TouchPad                id=19   [slave  pointer  (2)]

and I want 19

export TouchPadID=$(xinput | grep 'TouchPad' | sed  -n "s/^.*id=([[:digit:]]+).*$/1/p")

Note the class syntax:

[[:digit:]]

and the need to escape the following +

I assume only one line matches.

Answered By: Tim Richardson

Compare Perl and Raku solutions:


Using Perl (answers from @vault and @Gilles ‘SO- stop being evil’):

~$ perl -lne 'print $1 if /^foobar (w+)/;'  file

#OR:

~$ perl -lne 's/^foobars+// or next; s/s.*//; print'  file

Using Raku (formerly known as Perl_6)

~$ raku -ne 'put $0 if /^foobar s+ (w+)/;'  file

#OR:

~$ raku -ne 's/^foobars+// or next; s/s.*//; .put;'  file

A few more Raku answers (including Raku grep)

~$ raku -pe 's/^foobar s+ ( S+ ) [ s+ .*?]? $ /$0/ or next;'  file

#OR:

~$ raku -ne '.grep(/^foobars+/) or next; .words[1].put;'  file

#OR:

~$ raku -ne '$_ .= words; if .[0] eq "foobar" { put .[1] // next };'  file

#OR:

~$ raku -ne 'put .[1] || next if $_.=words[0] eq "foobar";'  file

Note: for the last few examples using indexing, stray spaces can be cleaned up first if they are problematic. Try using any of the various trim routines in Raku: .trim, .trim-leading. or trim-trailing, like so:

~$ raku -ne '.trim-trailing.grep(/foobars+/) or next; .words[1].put;'  file

(Of course, an advantage of solutions in Perl/Raku is that these languages are cross-platform, having binaries available for Windows, etc.).


Perl References:
https://perldoc.perl.org
https://www.perl.org

Raku References:
https://docs.raku.org
https://raku.org

Answered By: jubilatious1