Does GNU grep's -o option ignore zero-length matches?

I found an answer on another site that was suggesting grep -oP '^w+|$. I pointed out that the |$ is pointless in PCRE, since it just means “OR end of line” and will therefore always be true for regular lines. However, I can’t exactly figure out what it does in GNU grep PCREs when combined with -o. Consider the following:

$ printf 'abnancnn' | perl -ne 'print if /ab|$/'
ab
a
c

$

(I am including the second prompt ($) character to show that the empty line is included in the results).

As expected, in Perl, that will match every line. Either because it contains an ab or because the $ matches the end of the line. GNU grep behaves the same way without the -o flag:

$ printf 'abnancnn' | grep -P 'ab|$'
ab
a
c

$

However, -o changes the behavior:

$ printf 'abnancnn' | grep -oP 'ab|$'
ab
$

This is the same as simply grepping for ab. The second part, the “OR end of line” seems to be ignored. It does work as expected without the -o flag:

What’s going on? Does –o ignore 0-length matches? Is that a bug or is it expected?

Asked By: terdon

||

My GNU grep man page says the following:

-o, –only-matching

Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

emphasis is mine

I’m guessing it considers the end of line match to be an “empty match”

Answered By: jesse_b

Sides of an OR

The second part, the “OR end of line” seems to be ignored.

No, it isn’t, if we change the match slightly:

$ printf 'abnanncn' | grep -oP 'ab|.$'
ab
a
c

Both parts of the OR are explicitly matched.

Empty match

What is ignored are “empty” matches (the resulting string has zero length):

$ printf '%sn' ab " " a "" c | grep -oP '^.*$'
ab

a
c

It is documented (in GNU grep) LESS=+'/^ *-o,' man grep (emphasis mine):

-o, –only-matching

Print only the matched (
non-empty) parts of a matching line,
with each such part on a separate output line.

Answered By: user232326
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.