How to run grep with multiple AND patterns?

I would like to get the multi pattern match with implicit AND between patterns, i.e. equivalent to running several greps in a sequence:

grep pattern1 | grep pattern2 | ...

So how to convert it to something like?

grep pattern1 & pattern2 & pattern3

I would like to use single grep because I am building arguments dynamically, so everything has to fit in one string. Using filter is system feature, not grep, so it is not an argument for it.


Don’t confuse this question with:

grep "pattern1|pattern2|..."

This is an OR multi pattern match. I am looking for an AND pattern match.

Asked By: greenoldman

||

You didn’t specify grep version, this is important. Some regexp engines allow multiple matching groupped by AND using ‘&’ but this is non-standard and non-portable feature. But, at least GNU grep doesn’t support this.

OTOH you can simply replace grep with sed, awk, perl, etc. (listed in order of weight increasing). With awk, the command would look like

awk '/regexp1/ && /regexp2/ && /regexp3/ { print; }'

and it can be constructed to be specified in command line in easy way.

Answered By: Netch

To find the lines that match each and everyone of a list of patterns, agrep (the original one, now shipped with glimpse, not the unrelated one in the TRE regexp library) can do it with this syntax:

agrep 'pattern1;pattern2'

With GNU grep, when built with PCRE support, you can do:

grep -P '^(?=.*pattern1)(?=.*pattern2)'

With ast grep:

grep -X '.*pattern1.*&.*pattern2.*'

(adding .*s as <x>&<y> matches strings that match both <x> and <y> exactly, a&b would never match as there’s no such string that can be both a and b at the same time).

If the patterns don’t overlap, you may also be able to do:

grep -e 'pattern1.*pattern2' -e 'pattern2.*pattern1'

The best portable way is probably with awk as already mentioned:

awk '/pattern1/ && /pattern2/'

Or with sed:

sed -e '/pattern1/!d' -e '/pattern2/!d'

Or perl:

perl -ne 'print if /pattern1/ && /pattern2/'

Please beware that all those will have different regular expression syntaxes.

The awk/sed/perl ones don’t reflect whether any line matched the patterns in their exit status. To so that you need:

awk '/pattern1/ && /pattern2/ {print; found = 1}
     END {exit !found}'
perl -ne 'if (/pattern1/ && /pattern2/) {print; $found = 1}
          END {exit !$found}'

Or pipe the command to grep '^'.

For potentially gzip-compressed files, you can use zgrep which is generally a shell script wrapper around grep, and use one of the grep solutions above (not the ast-open one as that grep implementation cannot be use by zgrep) or you could use the PerlIO::gzip module of perl which can transparently uncompress files upon input:

perl -MPerlIO::gzip -Mopen='IN,gzip(autopop)' -ne '
  print "$ARGV:$_" if /pattern1/ && /pattern2/' -- *.gz

(which if the files are small enough at least is even going to be more efficient than zgrep as the decompression is done internally without having to run gunzip for each file).

Answered By: Stéphane Chazelas

If patterns contains one pattern per line, you can do something like this:

awk 'NR==FNR{a[$0];next}{for(i in a)if($0!~i)next}1' patterns -

Or this matches substrings instead of regular expressions:

awk 'NR==FNR{a[$0];next}{for(i in a)if(!index($0,i))next}1' patterns -

To print all instead of no lines of the input in the case that patterns is empty, replace NR==FNR with FILENAME==ARGV[1], or with ARGIND==1 in gawk.

These functions print the lines of STDIN which contain each string specified as an argument as a substring. ga stands for grep all and gai ignores case.

ga(){ awk 'FILENAME==ARGV[1]{a[$0];next}{for(i in a)if(!index($0,i))next}1' <(printf %s\n "$@") -; }
gai(){ awk 'FILENAME==ARGV[1]{a[tolower($0)];next}{for(i in a)if(!index(tolower($0),i))next}1' <(printf %s\n "$@") -; }
Answered By: nisetama

grep pattern1 | grep pattern2 | ...

I would like to use single grep because I am building arguments dynamically, so everything has to fit in one string

It’s actually possible to build the pipeline dynamically (without resorting to eval):

# Executes: grep "$1" | grep "$2" | grep "$3" | ...
function chained-grep {
    local pattern="$1"
    if [[ -z "$pattern" ]]; then
        cat
        return
    fi    

    shift
    grep -- "$pattern" | chained-grep "$@"
}

cat something | chained-grep all patterns must match order but matter dont

It’s probably not a very efficient solution though.

Answered By: olejorgenb

git grep

Here is the syntax using git grep combining multiple patterns using Boolean expressions:

git grep --no-index -e pattern1 --and -e pattern2 --and -e pattern3

The above command will print lines matching all the patterns at once.

--no-index Search files in the current directory that is not managed by Git.

Check man git-grep for help.

See also:

For OR operation, see:

Answered By: kenorb

ripgrep

Here is the example using rg:

rg -N '(?P<p1>.*pattern1.*)(?P<p2>.*pattern2.*)(?P<p3>.*pattern3.*)' file.txt

It’s one of the quickest grepping tools, since it’s built on top of Rust’s regex engine which uses finite automata, SIMD and aggressive literal optimizations to make searching very fast.

See also related feature request at GH-875.

Answered By: kenorb

To find all of the words (or patterns), you can run grep in a for loop. The main advantage here is searching from a list of regular expressions.

A real example:

# File 'search_all_regex_and_error_if_missing.sh'

find_list="
^a+$ 
^b+$ 
^h+$ 
^d+$ 
"

for item in $find_list; do
   if grep -E "$item" file_to_search_within.txt
   then
       echo "$item found in file."
   else
       echo "Error: $item not found in file. Exiting!"
       exit 1
   fi
done

Now let’s run it on this file:

hhhhhhhhhh
aaaaaaa
bbbbbbbbb
ababbabaabbaaa
ccccccc
dsfsdf
bbbb
cccdd
aa
caa
$ ./search_all_regex_and_error_if_missing.sh
aaaaaaa aa
^a+$ found in file.
bbbbbbbbb bbbb
^b+$ found in file.
hhhhhhhhhh
^h+$ found in file.
Error: ^d+$ not found in file. Exiting!
Answered By: Noam Manos

Here’s my take, and this works for words in multiple lines:

Use find . -type f followed by as many
-exec grep -q 'first_word' {} ;
and the last keyword with
-exec grep -l 'nth_word' {} ;

-q quiet / silent
-l show files with matches

The following returns list of filenames with words ‘rabbit’ and ‘hole’ in them:
find . -type f -exec grep -q 'rabbit' {} ; -exec grep -l 'hole' {} ;

Answered By: StackRover

to search multiple files for the presence of two patterns anywhere in the file use

awk -v RS="" '/patern1/&&/patern2/{print FILENAME}' file1 ... filen
Answered By: concerned

just directly MULTIPLY the patterns if you want them all to be true, thus eliminating any and all conditional branching

awk '/regexp1/ * /regexp2/ * /regexp3/ … '

say if you need regex 4 FALSE while regex 5/6 both being TRUE, then you can lump them all into a single compare :

awk '/regexp4/ < /regexp5/ * /regexp6/'

or say if you want to match either regex 7 or regex 8, but not both at the same time, then do either one of these

logical "!=" NOT EQUAL

awk '/regexp7/ != /regexp8/'

arithmetic "-" MINUS, since [ A XOR B ] on a single-bit level 
is same as checking for non-zero result of subtraction

awk '/regexp7/ - /regexp8/'

A real world example of this particular combination would be for checking whether a certain month has 31 days or not :

jot 12 | awk '(_ = +$1) % 2 != (7 < _)'
                                        or                   
         awk '((_ = +$1) + (7 < _)) % 2'

     1
     3
     5
     7
     8
    10
    12

Conversely, checking for a month being short would be :

jot 12 | awk  '(_ = +$1) % 2 == (7 < _)'
         awk  '(_ = +$1) % 2 -  (_ < 8)'                               
         awk '((_ = +$1)     +  (_ < 8)) % 2'

     2
     4
     6
     9
    11

here’s the strangest one of them all – if you want either regex 9 to be TRUE or regex 10 to be FALSE, and wanna do it without conditional branching :

awk '/regexp9/ ^ /regexp10/'

That’s right – regex 9 RAISED TO THE POWER of regex 10. It works because

    1 1 1^1 ->  1
    1 0 1^0 ->  1
    0 1 0^1 -> [0]
    0 0 0^0 ->  1

So the only time this algebraic expressions yields FALSE would be when regex 9 is FALSE while regex 10 is TRUE. Its twin via logical comparison operators would be :

awk '/regexp9/ >= /regexp10/'

All these might not appear to be idiomatic, but they’re all POSIX-compliant awk syntax that is fully portable.

Answered By: RARE Kpop Manifesto
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.