Isolate one specific instance of a word in a string
I have a grep
command to process the string
test-test test
I want the command to only show me the "free-standing" word test
highlighted, but not test-test
.
However, when I run
echo "test-test test" | grep -w "test"
I get all test
occurences highlighted.
Is it possible to restrict the matching to only the isolated occurence of test
?
With Perl regexes (supported in GNU grep with the -P
flag), you can use a negative lookbehind and lookahead to assert that the match for test
is not adjacent to a -
:
echo "test-test test" | grep -P '(?<!-)test(?!-)'
This outputs
test-test test
You get all test
occurrences in bold because for -w
a word is a sequence of word characters delimited by non-word characters, word characters being alphanumerics and underscores. -
is not a word character.
In testing-test+test2;test_3;untested
, only the second test
occurrence matches because that’s the only one that is neither preceded nor followed by a word character.
If you want to match on whitespace-delimited words, you’ll have to specify those look-around assertions by hand, and you’ll need perl-compatible regexps for that which not all grep
implementations support. Since you seem to have a grep='grep --color
alias, if your grep
supports --color
, there’s a chance it also supports a -P
/ -X perl
option, and then you can do:
grep --color -P '(?<!S)test(?!S)'
(?<!...)
and (?!...)
being respectively the negative look-behind and look-ahead assertion operators in perl regexps. S
, like POSIX’ [^[:space:]]
matches and character that is not classified as whitespace. So that says: "match on test
provided it’s neither preceded nor followed by a non-whitespace character".
Using Perl:
~$ echo "test-test testntestnn" | perl -lpe 's/(?<!S)(test)(?!S)/**$1**/g;'
test-test **test**
**test**
Using Raku:
~$ echo "test-test testntestnn" | raku -pe 's:g/ <!after S > (test) <!before S > /**$0**/;'
test-test **test**
**test**
[Special thanks to @StéphaneChazelas for expostulating on Perl-style Lookarounds in his grep
answer].
If you want something akin to highlighting here’s a simple idea from PerlMonks. Instead of grep
‘s color option, use Perl/Raku and surround the matches with **
asterisks.
Raku note: Lookbehinds are spelled after
in Raku with ?
for positive and !
for negative, while Lookaheads are spelled before
in Raku with ?
for positive and !
for negative. So you can literally read the Raku code above as "substitute globally any matches to the word "test" not after
a non-whitespace character and also not before
a non-whitespace character… ".
https://perldoc.perl.org/perlre#Lookaround-Assertions
https://docs.raku.org/language/regexes#Lookahead_assertions
https://docs.raku.org/language/regexes#Lookbehind_assertions
Using extended regex, you can achieve it like so
grep -E "(^| )test( |$)"
With the minor caveat that this will actually highlight any leading or training space, but you won’t see it because it’s not a printable character.