How to count the times a specific character appears in a file?

For example, we want to count all quote (") characters; we just worry if files have more quotes than it should.

For example:

cluster-env,"manage_dirs_on_root","true"
cluster-env,"one_dir_per_partition","false"
cluster-env,"override_uid","true"
cluster-env,"recovery_enabled","false"

expected results:

16

Asked By: yael

||

You can combine tr (translate or delete characters) with wc (count words, lines, characters):

tr -cd '"' < yourfile.cfg | wc -c

-delete all characters in the complement of ", and then count the characters (bytes). Some versions of wc may support the -m or --chars flag which will better suit non-ASCII character counts.

Answered By: Ulrich Schwarz

grep approach:

grep -o '"' file | wc -l
16 
  • -o – output only matched substrings

Or with single gawk:

awk -v RS='' -v FPAT='"' '{print NF}' file
16
  • RS='' – empty record separator (instead of newline)

  • FPAT='"' – pattern defining field value

Answered By: RomanPerekhrest

If two lines in the file has an odd number of double quotes, the total sum of double quotes will be even, and you will not detect unbalanced quotes (this is what I presume you’d like to actually do, but I might be wrong).

This awk script reports any line in the input line that has an odd number of quotes:

awk -F'"' 'NF % 2 == 0 { printf("Line %d has odd quoting: %sn", NR, $0) }'

We set the field separator (FS) to " with -F'"' which means that if a line has an even number of fields it has odd quotes. NF is the number of fields in the recent record, and NR is the ordinal number of the current record (“the line number”).

Given the following input:

$ cat file
cluster-env,"manage_dirs_on_root","true"
cluster-env,"one_dir_per_partition","false"
cluster-env,override_uid","true"
cluster-env,recovery_enabled","false"

we get

$ awk -F'"' 'NF % 2 == 0 { printf("Line %d has odd quoting: %sn", NR, $0) }' file
Line 3 has odd quoting: cluster-env,override_uid","true"
Line 4 has odd quoting: cluster-env,recovery_enabled","false"

Something like

$ grep -o '"' | wc -l

would return “14” for this file.

Answered By: Kusalananda

Pure BASH:

var="$(< file.txt)"
tmp="${var//[^"]/}"
echo ${#tmp}
Answered By: Thunderbeef

Another single awk approach:

awk '{ count+=gsub(/"/, "") } END{ print count+0 }'
Answered By: αғsнιη

Eccentric double GNU grep method:

grep -o " file | grep -c .
Answered By: agc

If you want to count all characters and list frequency in ascending order, this works for me:

cat <filename> | sed 's/(.)/n1/g' | sort | uniq -c | sort -h
# adds newline before every character, sorts, counts, and sorts results

(note the count of the newline character will be doubled, but using wc -l can count line number)

Answered By: Ben Oliver
Categories: Answers Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.