How to count the times a specific character appears in a file?
For example, we want to count all quote ("
) characters; we just worry if files have more quotes than it should.
For example:
cluster-env,"manage_dirs_on_root","true"
cluster-env,"one_dir_per_partition","false"
cluster-env,"override_uid","true"
cluster-env,"recovery_enabled","false"
expected results:
16
You can combine tr
(translate or delete characters) with wc
(count words, lines, characters):
tr -cd '"' < yourfile.cfg | wc -c
-d
elete all characters in the c
omplement of "
, and then count the c
haracters (bytes). Some versions of wc
may support the -m
or --chars
flag which will better suit non-ASCII character counts.
grep approach:
grep -o '"' file | wc -l
16
-o
– output only matched substrings
Or with single gawk:
awk -v RS='' -v FPAT='"' '{print NF}' file
16
-
RS=''
– empty record separator (instead of newline) -
FPAT='"'
– pattern defining field value
If two lines in the file has an odd number of double quotes, the total sum of double quotes will be even, and you will not detect unbalanced quotes (this is what I presume you’d like to actually do, but I might be wrong).
This awk
script reports any line in the input line that has an odd number of quotes:
awk -F'"' 'NF % 2 == 0 { printf("Line %d has odd quoting: %sn", NR, $0) }'
We set the field separator (FS
) to "
with -F'"'
which means that if a line has an even number of fields it has odd quotes. NF
is the number of fields in the recent record, and NR
is the ordinal number of the current record (“the line number”).
Given the following input:
$ cat file
cluster-env,"manage_dirs_on_root","true"
cluster-env,"one_dir_per_partition","false"
cluster-env,override_uid","true"
cluster-env,recovery_enabled","false"
we get
$ awk -F'"' 'NF % 2 == 0 { printf("Line %d has odd quoting: %sn", NR, $0) }' file
Line 3 has odd quoting: cluster-env,override_uid","true"
Line 4 has odd quoting: cluster-env,recovery_enabled","false"
Something like
$ grep -o '"' | wc -l
would return “14” for this file.
Pure BASH:
var="$(< file.txt)"
tmp="${var//[^"]/}"
echo ${#tmp}
Another single awk
approach:
awk '{ count+=gsub(/"/, "") } END{ print count+0 }'
Eccentric double GNU grep
method:
grep -o " file | grep -c .
If you want to count all characters and list frequency in ascending order, this works for me:
cat <filename> | sed 's/(.)/n1/g' | sort | uniq -c | sort -h
# adds newline before every character, sorts, counts, and sorts results
(note the count of the newline character will be doubled, but using wc -l
can count line number)