Isolate one specific instance of a word in a string

I have a grep command to process the string

test-test test

I want the command to only show me the "free-standing" word test highlighted, but not test-test.

However, when I run

echo "test-test test" | grep -w "test"

I get all test occurences highlighted.

Is it possible to restrict the matching to only the isolated occurence of test?

Asked By: xUr

||

With Perl regexes (supported in GNU grep with the -P flag), you can use a negative lookbehind and lookahead to assert that the match for test is not adjacent to a -:

echo "test-test test" | grep -P '(?<!-)test(?!-)'

This outputs

test-test test
Answered By: Brian61354270

You get all test occurrences in bold because for -w a word is a sequence of word characters delimited by non-word characters, word characters being alphanumerics and underscores. - is not a word character.

In testing-test+test2;test_3;untested, only the second test occurrence matches because that’s the only one that is neither preceded nor followed by a word character.

If you want to match on whitespace-delimited words, you’ll have to specify those look-around assertions by hand, and you’ll need perl-compatible regexps for that which not all grep implementations support. Since you seem to have a grep='grep --color alias, if your grep supports --color, there’s a chance it also supports a -P / -X perl option, and then you can do:

grep --color -P '(?<!S)test(?!S)'

(?<!...) and (?!...) being respectively the negative look-behind and look-ahead assertion operators in perl regexps. S, like POSIX’ [^[:space:]] matches and character that is not classified as whitespace. So that says: "match on test provided it’s neither preceded nor followed by a non-whitespace character".

Answered By: Stéphane Chazelas

Using Perl:

~$ echo "test-test testntestnn" | perl -lpe 's/(?<!S)(test)(?!S)/**$1**/g;'
test-test **test**
**test**


Using Raku:

~$ echo "test-test testntestnn" | raku -pe 's:g/ <!after S > (test) <!before S >  /**$0**/;'
test-test **test**
**test**


[Special thanks to @StéphaneChazelas for expostulating on Perl-style Lookarounds in his grep answer].

If you want something akin to highlighting here’s a simple idea from PerlMonks. Instead of grep‘s color option, use Perl/Raku and surround the matches with ** asterisks.

Raku note: Lookbehinds are spelled after in Raku with ? for positive and ! for negative, while Lookaheads are spelled before in Raku with ? for positive and ! for negative. So you can literally read the Raku code above as "substitute globally any matches to the word "test" not after a non-whitespace character and also not before a non-whitespace character… ".

https://perldoc.perl.org/perlre#Lookaround-Assertions
https://docs.raku.org/language/regexes#Lookahead_assertions
https://docs.raku.org/language/regexes#Lookbehind_assertions

Answered By: jubilatious1

Using extended regex, you can achieve it like so

grep -E "(^| )test( |$)"

With the minor caveat that this will actually highlight any leading or training space, but you won’t see it because it’s not a printable character.

Answered By: bxm
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.