How do I grep for multiple patterns with pattern having a pipe character?

I want to find all lines in several files that match one of two patterns. I tried to find the patterns I’m looking for by typing

grep (foo|bar) *.txt

but the shell interprets the | as a pipe and complains when bar isn’t an executable.

How can I grep for multiple patterns in the same set of files?

Asked By: Dan

||

Firstly, you need to use quotes for special characters. Second, even so, grep will not understand alternation directly; you would need to use egrep, or (with GNU grep only) grep -E.

egrep 'foo|bar' *.txt

(The parentheses are unnecessary unless the alternation is part of a larger regex.)

Answered By: geekosaur
egrep "foo|bar" *.txt

or

grep "foo|bar" *.txt
grep -E "foo|bar" *.txt

selectively citing the man page of gnu-grep:

   -E, --extended-regexp
          Interpret PATTERN as an extended regular expression (ERE, see below).  (-E is specified by POSIX.)

Matching Control
   -e PATTERN, --regexp=PATTERN
          Use PATTERN as the pattern.  This can be used to specify multiple search patterns, or to protect  a  pattern
          beginning with a hyphen (-).  (-e is specified by POSIX.)

(…)

   grep understands two different versions of regular expression syntax: “basic” and “extended.”  In  GNU grep,  there
   is  no  difference  in  available  functionality  using  either  syntax.   In  other implementations, basic regular
   expressions are less powerful.  The following description applies to extended regular expressions; differences  for
   basic regular expressions are summarized afterwards.

In the beginning I didn’t read further, so I didn’t recognize the subtle differences:

Basic vs Extended Regular Expressions
   In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead  use  the
   backslashed versions ?, +, {, |, (, and ).

I always used egrep and needlessly parens, because I learned from examples. Now I learned something new. 🙂

Answered By: user unknown

First, you need to protect the pattern from expansion by the shell. The easiest way to do that is to put single quotes around it. Single quotes prevent expansion of anything between them (including backslashes); the only thing you can’t do then is have single quotes in the pattern.

grep -- 'foo*' *.txt

(also note the -- end-of-option-marker to stop some grep implementations including GNU grep from treating a file called -foo-.txt for instance (that would be expanded by the shell from *.txt) to be taken as an option (even though it follows a non-option argument here)).

If you do need a single quote, you can write it as ''' (end string literal, literal quote, open string literal).

grep -- 'foo*'''bar' *.txt

Second, grep supports at least¹ two syntaxes for patterns. The old, default syntax (basic regular expressions) doesn’t support the alternation (|) operator, though some versions have it as an extension, but written with a backslash.

grep -- 'foo|bar' *.txt

The portable way is to use the newer syntax, extended regular expressions. You need to pass the -E option to grep to select it (formerly that was done with the egrep separate command²)

grep -E -- 'foo|bar' *.txt

Another possibility when you’re just looking for any of several patterns (as opposed to building a complex pattern using disjunction) is to pass multiple patterns to grep. You can do this by preceding each pattern with the -e option.

grep -e foo -e bar -- *.txt

Or put patterns on several lines:

grep -- 'foo
bar' *.txt

Or store those patterns in a file, one per line and run

grep -f that-file -- *.txt

Note that if *.txt expands to a single file, grep won’t prefix matching lines with its name like it does when there are more than one file. To work around that, with some grep implementations like GNU grep, you can use the -H option, or with any implementation, you can pass /dev/null as an extra argument.


¹ some grep implementations support even more like perl-compatible ones with -P, or augmented ones with -X, -K for ksh wildcards…

² while egrep has been deprecated by POSIX and is sometimes no longer found on some systems, on some other systems like Solaris when the POSIX or GNU utilities have not been installed, then egrep is your only option as its /bin/grep supports none of -e, -f, -E, | or multi-line patterns

Like TC1 said, -F seems to be usable option:

$> cat text
some text
foo
another text
bar
end of file

$> patterns="foo
bar" 

$> grep -F "${patterns}" text
foo
bar

I had access logs where the dates were stupidly formatted: [30/Jun/2013:08:00:45 +0200]

But I needed to display it as: 30/Jun/2013 08:00:45

The problem is that using “OR” in my grep statement, I was receiving the two match expressions on two separate lines.

Here is the solution:

grep -in myURL_of_interest  *access.log  | 
grep -Eo '(b[[:digit:]]{2}/[[:upper:]][[:lower:]]{2}/[[:digit:]]{4}|[[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2}b)'   
| paste - - -d" " > MyAccess.log
Answered By: tsmets

This works for me

root@gateway:/home/sshuser# aws ec2 describe-instances --instance-ids i-2db0459d |grep 'STATE|TAG'

**STATE**   80      stopped

**STATE**REASON     Client.UserInitiatedShutdown    Client.UserInitiatedShutdown: User initiated shutdown

**TAGS**    Name    Magento-Testing root@gateway:/home/sshuser#
Answered By: Mansur Ul Hasan

You can try the below command to get the result:

egrep 'rose.*lotus|lotus.*rose' some_file
Answered By: Abhishek

If you don’t need regular expressions, it’s much faster to use fgrep or grep -F with multiple -e parameters, like this:

fgrep -efoo -ebar *.txt

fgrep (alternatively grep -F) is much faster than regular grep because it searches for fixed strings instead of regular expressions.

Answered By: Moustafa Elqabbany

A cheap and cheerful way to grep for multiple patterns:

$ echo "foo" > ewq ; echo "bar" >> ewq ; grep -H -f ewq *.txt ; rm ewq
Answered By: DHDHDHD

There are multiple ways to do this.

  1. grep 'foo|bar' *.txt
  2. egrep 'foo|bar' *.txt
  3. find . -maxdepth 1 -type f -name "*.txt" | xargs grep 'foo|bar'
  4. find . -maxdepth 1 -type f -name "*.txt" | xargs egrep 'foo|bar'

The 3rd and 4th option will grep only in the files and avoid directories having .txt in their names.
So, as per your use-case, you can use any of the option mentioned above.
Thanks!!

Answered By: Bhagyesh Dudhediya

Pipe (|) is a special shell character, so it either needs to be escaped (|) or quoted as per manual (man bash):

Quoting is used to remove the special meaning of certain characters or words to the shell. It can be used to disable special treatment for special characters, to prevent reserved words from being recognized as such, and to prevent parameter expansion.

Enclosing characters in double quotes preserves the literal value of all characters within the quotes

A non-quoted backslash () is the escape character.

See: Which characters need to be escaped in Bash?

Here are few examples (using tools not mentioned yet):

  • Using ripgrep:

    • rg "foo|bar" *.txt
    • rg -e foo -e bar *.txt
  • Using git grep:

    • git grep --no-index -e foo --or -e bar

      Note: It also supports Boolean expressions such as --and, --or and --not.

For AND operation per line, see: How to run grep with multiple AND patterns?

For AND operation per file, see: How to check all of multiple strings or regexes exist in a file?

Answered By: kenorb

TL;DR: if you want to do more things after matching one of the multiple patterns, enclose them as in (pattern1|pattern2)

example: I want to find all the places where a variable that contains the name ‘date’ is defined as a String or int. (e.g., “int cronDate =” or “String textFormattedDateStamp =”):

cat myfile | grep '(int|String) [a-zA-Z_]*date[a-zA-Z_]* =' 

With grep -E, you don’t need to escape the parentheses or the pipe, i.e., grep -E '(int|String) [a-zA-Z_]*date[a-zA-Z_]* ='

Answered By: jeremysprofile

to add to @geekosaur’s answer, if you have multiple patterns that also contain tabs and space you use the following command

grep -E "foo[[:blank:]]|bar[[:blank:]]"

where [[:blank:]] is RE character class that represents either a space or a tab character

Answered By: Fuseteam