Count total number of occurrences using grep
grep -c
is useful for finding how many times a string occurs in a file, but it only counts each occurence once per line. How to count multiple occurences per line?
I’m looking for something more elegant than:
perl -e '$_ = <>; print scalar ( () = m/needle/g ), "n"'
Your example only prints out the number of occurrences per-line, and not the total in the file. If that’s what you want, something like this might work:
perl -nle '$c+=scalar(()=m/needle/g);END{print $c}'
grep’s -o
will only output the matches, ignoring lines; wc
can count them:
grep -o 'needle' file | wc -l
This will also match ‘needles’ or ‘multineedle’.
To match only single words use one of the following commands:
grep -ow 'needle' file | wc -l
grep -o 'bneedleb' file | wc -l
grep -o '<needle>' file | wc -l
If you have GNU grep (always on Linux and Cygwin, occasionally elsewhere), you can count the output lines from grep -o
: grep -o needle | wc -l
.
With Perl, here are a few ways I find more elegant than yours (even after it’s fixed).
perl -lne 'END {print $c} map ++$c, /needle/g'
perl -lne 'END {print $c} $c += s/needle//g'
perl -lne 'END {print $c} ++$c while /needle/g'
With only POSIX tools, one approach, if possible, is to split the input into lines with a single match before passing it to grep. For example, if you’re looking for whole words, then first turn every non-word character into a newline.
# equivalent to grep -ow 'needle' | wc -l
tr -c '[:alnum:]' '[n*]' | grep -c '^needle$'
Otherwise, there’s no standard command to do this particular bit of text processing, so you need to turn to sed (if you’re a masochist) or awk.
awk '{while (match($0, /set/)) {++c; $0=substr($0, RSTART+RLENGTH)}}
END {print c}'
sed -n -e 's/set/n&n/g' -e 's/^/n/' -e 's/$/n/'
-e 's/n[^n]*n/n/g' -e 's/^n//' -e 's/n$//'
-e '/./p' | wc -l
Here’s a simpler solution using sed
and grep
, which works for strings or even by-the-book regular expressions but fails in a few corner cases with anchored patterns (e.g. it finds two occurrences of ^needle
or bneedle
in needleneedle
).
sed 's/needle/n&n/g' | grep -cx 'needle'
Note that in the sed substitutions above, I used n
to mean a newline. This is standard in the pattern part, but in the replacement text, for portability, substitute backslash-newline for n
.
Another solution using awk and needle
as field separator:
awk -F'^needle | needle | needle$' '{c+=NF-1}END{print c}'
If you want to match needle
followed by punctuation, change the field separator accordingly i.e.
awk -F'^needle[ ,.?]|[ ,.?]needle[ ,.?]|[ ,.?]needle$' '{c+=NF-1}END{print c}'
Or use the class: [^[:alnum:]]
to encompass all non alpha characters.
This is my pure bash solution
#!/bin/bash
B=$(for i in $(cat /tmp/a | sort -u); do
echo "$(grep $i /tmp/a | wc -l) $i"
done)
echo "$B" | sort --reverse
If, like me, you actually wanted “both; each exactly once”, (this is actually “either; twice”) then it’s simple:
grep -E "thing1|thing2" -c
and check for the output 2
.
The benefit of this approach (if exactly once is what you want) is that it scales easily.
I had a need to do this but for more than one search term. And I wanted them to be listed in columns with the number of occurrences of each.
My bash-only, one-liner, solution is as follows:
grep -o -E 'borp|flarb' flarb.log | sort | uniq -c
910 borp
9090 flarb