How to count the number of a specific character in each line?

I was wondering how to count the number of a specific character in each line by some text processing utilities?

For example, to count " in each line of the following text

"hello!" 
Thank you!

The first line has two, and the second line has 0.

Another example is to count ( in each line.

Asked By: Tim

||

Using tr ard wc:

function countchar()
{
    while IFS= read -r i; do printf "%s" "$i" | tr -dc "$1" | wc -m; done
}

Usage:

$ countchar '"' <file.txt  #returns one count per line of file.txt
1
3
0

$ countchar ')'           #will count parenthesis from stdin
$ countchar '0123456789'  #will count numbers from stdin
Answered By: Stéphane Gimenez

You can do it with sed and awk:

$ sed 's/[^"]//g' dat | awk '{ print length }'
2
0

Where dat is your example text, sed deletes (for each line) all non-" characters and awk prints for each line its size (i.e. length is equivalent to length($0), where $0 denotes the current line).

For another character you just have to change the sed expression. For example for ( to:

's/[^(]//g'

Update: sed is kind of overkill for the task – tr is sufficient. An equivalent solution with tr is:

$ tr -d -c '"n' < dat | awk '{ print length; }'

Meaning that tr deletes all characters which are not (-c means complement) in the character set "n.

Answered By: maxschlepzig

Another possible implementation with awk and gsub:

awk '{ gsub("[^"]", ""); print length }' input-file

The function gsub is the equivalent of sed’s 's///g' .

Use gsub("[^(]", "")for counting (.

Answered By: enzotib

Yet another implementation that does not rely on external programs, in bash, zsh, yash and some implementations/versions of ksh:

while IFS= read -r line; do 
  line="${line//[!"]/}"
  echo "${#line}"
done <input-file

Use line="${line//[!(]}"for counting (.

Answered By: enzotib

I would just use awk

awk -F" '{print NF-1}' <fileName>

Here we set the field separator (with the -F flag) to be the character " then all we do is print number of fields NF – 1. The number of occurrences of the target character will be one less than the number of separated fields.

For funny characters that are interpreted by the shell you just need to make sure you escape them otherwise the command line will try and interpret them. So for both " and ) you need to escape the field separator (with ).

Answered By: Martin York

I decided to write a C program cause I was bored.

You should probably add input validation, but other than that’s all set.

#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[])
{
        char c = argv[1][0];
        char * line = NULL;
        size_t len = 0;
        while (getline(&line, &len, stdin) != -1)
        {
                int count = 0;
                char * s = line;
                while (*s) if(*s++ == c) count++;
                printf("%dn",count);
        }
        if(line) free(line);
}
Answered By: user606723

Here is another C solution that only needs STD C and less memory:

#include <stdio.h>

int main(int argc, char **argv)
{
  if (argc < 2 || !*argv[1]) {
    puts("Argument missing.");
    return 1;
  }
  char c = *argv[1], x = 0;
  size_t count = 0;
  while ((x = getc(stdin)) != EOF)
    if (x == 'n') {
      printf("%zdn", count);
      count = 0;
    } else if (x == c)
      ++count;
  return 0;
}
Answered By: maxschlepzig

For a pure bash solution (however, it’s bash-specific): If $x is the variable containing your string:

x2="${x//[^"]/}"
echo ${#x2}

The ${x// thing removes all chars except ", ${#x2} calculates the length of this rest.

(Original suggestion using expr which has problems, see comments: )

expr length "${x//[^"]/}"
Answered By: Marian

The answers using awk fail if the number of matches is too large (which happens to be my situation). For the answer from loki-astari, the following error is reported:

awk -F" '{print NF-1}' foo.txt 
awk: program limit exceeded: maximum number of fields size=32767
    FILENAME="foo.txt" FNR=1 NR=1

For the answer from enzotib (and the equivalent from manatwork), a segmentation fault occurs:

awk '{ gsub("[^"]", ""); print length }' foo.txt
Segmentation fault

The sed solution by maxschlepzig works correctly, but is slow (timings below).

Some solutions not yet suggested here. First, using grep:

grep -o " foo.txt | wc -w

And using perl:

perl -ne '$x+=s/"//g; END {print "$xn"}' foo.txt

Here are some timings for a few of the solutions (ordered slowest to fastest); I limited things to one-liners here. ‘foo.txt’ is a file with one line and one long string which contains 84922 matches.

## sed solution by [maxschlepzig]
$ time sed 's/[^"]//g' foo.txt | awk '{ print length }'
84922
real    0m1.207s
user    0m1.192s
sys     0m0.008s

## using grep
$ time grep -o " foo.txt | wc -w
84922
real    0m0.109s
user    0m0.100s
sys     0m0.012s

## using perl
$ time perl -ne '$x+=s/"//g; END {print "$xn"}' foo.txt
84922
real    0m0.034s
user    0m0.028s
sys     0m0.004s

## the winner: updated tr solution by [maxschlepzig]
$ time tr -d -c '"n' < foo.txt |  awk '{ print length }'
84922
real    0m0.016s
user    0m0.012s
sys     0m0.004s
Answered By: josephwb

We can use grep with regex to make it more simple and powerful.

To count specific character.

$ grep -o '"' file.txt|wc -l

To count special characters including whitespace characters.

$ grep -Po '[W_]' file.txt|wc -l

Here we are selecting any character with [Ss] and with -o option we make grep to print each match (which is, each character) in separate line. And then use wc -l to count each line.

Answered By: Kannan Mohan

Another awk solution:

awk '{print gsub(/"/, "")}' < input-file
Answered By: Stéphane Chazelas

For a string, the simplest would be with tr and wc (no need to overkill with awk or sed) – but note the above comments about tr, counts bytes, not characters –

echo $x | tr -d -c '"' | wc -m

where $x is the variable that contains the string (not a file) to evaluate.

Answered By: Ocumo

Maybe a more straight forward, purely awk answer would be to use split.
Split takes a string and turns it into an array, the return value is the number of array items generated + 1.

The following code will print out the number of times ” appears on each line.

awk ' {print (split($0,a,""")-1) }' file_to_parse

more info on split http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_92.html

Answered By: bleurp

Here is a simple Python script to find the count of " in each line of a file:

#!/usr/bin/env python2
with open('file.txt') as f:
    for line in f:
        print line.count('"')

Here we have used the count method of built-in str type.

Answered By: heemayl

Replace a by the char to be counted. Output is the counter for each line.

perl -nE 'say y!a!!'
Answered By: JJoao

Time comparison of the presented solutions (not an answer)

The efficiency of the answers is not important.
Nevertheless, following @josephwb approach, I tried to time all the answers presented.

I use as
input the Portuguese translation of Victor Hugo “Les Miserables” (great book!) and count the occurrences of “a”. My edition has 5 volumes, many pages…

$ wc miseraveis.txt 
29331  304166 1852674 miseraveis.txt 

C answers were compiled with gcc, (no optimizations).

Each answer was run 3 times and choose the best.

Don’t trust too much these numbers (my
machine is doing other tasks, etc, etc.). I share these times with you, because I got
some unexpected results and I’m sure you will find some more…

  • 14 of 16 timed solutions took less then 1s; 9 less then 0.1s, many of them using pipes
  • 2 solutions, using bash line by line, processed the 30k lines by creating new processes,
    calculate the correct solution in 10s /20s.
  • grep -oP a is tree times faster then grep -o a (10;11 vs 12)
  • The difference between C and others is not so big as I expected. (7;8 vs 2;3)
  • (conclusions welcome)

(results in a random order)

=========================1 maxschlepzig
$ time sed 's/[^a]//g' mis.txt | awk '{print length}' > a2
real    0m0.704s ; user 0m0.716s
=========================2 maxschlepzig
$ time tr -d -c 'an' < mis.txt | awk '{ print length; }' > a12
real    0m0.022s ; user 0m0.028s
=========================3 jjoao
$ time perl -nE 'say y!a!!' mis.txt  > a1
real    0m0.032s ; user 0m0.028s
=========================4 Stéphane Gimenez
$ function countchar(){while read -r i; do echo "$i"|tr -dc "$1"|wc -c; done }

$ time countchar "a"  < mis.txt > a3
real    0m27.990s ; user    0m3.132s
=========================5 Loki Astari
$ time awk -Fa '{print NF-1}' mis.txt > a4
real    0m0.064s ; user 0m0.060s
Error : several -1
=========================6 enzotib
$ time awk '{ gsub("[^a]", ""); print length }' mis.txt > a5
real    0m0.781s ; user 0m0.780s
=========================7 user606723
#include <stdio.h> #include <string.h> // int main(int argc, char *argv[]) ...  if(line) free(line); }

$ time a.out a < mis.txt > a6
real    0m0.024s ; user 0m0.020s
=========================8 maxschlepzig
#include <stdio.h> // int main(int argc, char **argv){if (argc < 2 || !*argv[1]) { ...  return 0; }

$ time a.out a < mis.txt > a7
real    0m0.028s ; user 0m0.024s
=========================9 Stéphane Chazelas
$ time awk '{print gsub(/a/, "")}'< mis.txt > a8
real    0m0.053s ; user 0m0.048s
=========================10 josephwb count total
$ time grep -o a < mis.txt | wc -w > a9
real    0m0.131s ; user 0m0.148s
=========================11 Kannan Mohan count total
$ time grep -o 'a' mis.txt | wc -l > a15
real    0m0.128s ; user 0m0.124s
=========================12 Kannan Mohan count total
$ time grep -oP 'a' mis.txt | wc -l > a16
real    0m0.047s ; user 0m0.044s
=========================13 josephwb Count total
$ time perl -ne '$x+=s/a//g; END {print "$xn"}'< mis.txt > a10
real    0m0.051s ; user 0m0.048s
=========================14 heemayl
#!/usr/bin/env python2 // with open('mis.txt') as f: for line in f: print line.count('"')

$ time pyt > a11
real    0m0.052s ; user 0m0.052s
=========================15 enzotib
$ time  while IFS= read -r line; do   line="${line//[!a]/}"; echo "${#line}"; done < mis.txt  > a13
real    0m9.254s ; user 0m8.724s
=========================16 bleurp
$ time awk ' {print (split($0,a,"a")-1) }' mis.txt > a14
real    0m0.148s ; user 0m0.144s
Error several -1
Answered By: JJoao
grep -n -o " file | sort -n | uniq -c | cut -d : -f 1

where grep does all the heavy lifting: reports each character found at each line number. The rest is just to sum the count per line, and format the output.

Remove the -n and get the count for the whole file.

Counting a 1.5Meg text file in under 0.015 secs seems fast.
And does work with characters (not bytes).

Answered By: user79743

Although most of the answers are really fascinating, I just want to mention another one with the aid of awk:

$ printf ""hello!"                                                                                  
  Thank you!" > data

$ awk '{counter=0; for(i=1;i<=length;i++) if(substr($0,i,1)==""") counter++; print counter}' data
2
0

In the first part, it counts the number of " and then at the END it prints the counter.

Answered By: javadr

Using Raku (formerly known as Perl_6)

raku -ne 'put m:g/"/.elems;' 

OR

raku -ne '.match( /"/, :global).elems.put;'

Sample Input (task is to count " doublequotes):

zero
"two"
"two","four"
"two","four","six"
"two","four","six","eight"

Sample Output:

0
2
4
6
8

FYI, I try very hard to stump Raku with Unicode characters and the language performs very well (it does NFC Normalization under-the-hood). It seems to have earned the moniker "Unicode-ready". Below, counting Bengali letters with Raku:

Sample Input (Bengali days-of-the-week from Wikipedia):

~$ cat  Bengali_DOW.txt
রবিবার/সূর্যবার Rabibār/Sūryabār
সোমবার/চন্দ্রবার Somabār/Chandrabār
মঙ্গলবার Mangalbār
বুধবার Budhabār
বৃহস্পতিবার/গুরুবার Brihaspatibār/Gurubār
শুক্রবার Shukrabār
শনিবার Shanibār

Sample Output (testing with first letter of each line):

~$ raku -ne 'put m:g/ <[র সো ম বু বৃ শু শ %]> /.elems;'  Bengali_DOW.txt
3
5
2
2
3
3
2

https://docs.raku.org/language/unicode#Normalization
https://raku.org

Answered By: jubilatious1

Everyone is complicating things so much. I’ll give the cleanest, simplest and probably the most performant answer for you:

grep -onF '"' input.txt | uniq -c

Add cut -d: -f1 if you would like better format.

Answered By: Weihang Jian
Categories: Answers Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.