Count number of lines with length condition
I am trying to count the number of lines in large files with length of the line less than 300 characters.
My current approach to do this is with following command(but it is slow):
awk "length<=300" *.log | wc -l
Is there a better way to get only the count of the lines?
use awk
to count line
awk 'length<=300{c++} END { print c }' *.log
where
c++
increment counterEND { print c }
is executed after last line and printc
value.
I am not sure, this will be faster (at least wc -l
won’t have to count and parse lines)
to get subtotal (can be one lined)
awk 'length<=300{t++;s++}
ENDFILE { printf "%s:%dn",FILENAME,s ; s=0 ; }
END { printf "TOTAL:%dn",t }' *.log
With grep
:
cat *.log | grep -vc '^.{301}'
To match lines with length <=300
we grep with -v
(invert match) for any 301
characters, as the search pattern is limited to one line for grep
. Pattern is anchored at the beginning of the line with ^
. And -c
counts the matching lines.
If you want to have some basic progress indicator, you can use pv
from package moreutils
:
pv *.log | grep -vc '^.{301}'
If you want to get line number per file:
grep -vc '^.{301}' *.log
and if you want to get the total from the above command:
grep -vc '^.{301}' *.log | awk -F':' '{c+=$NF} END {print c}'
Depending on the data, although we don’t usually pipe grep
with awk
, it could be faster than cat
& grep
, if there are many very long input lines, the pipe here is used just for a small amount of data, numbers and filenames.
Using Raku (formerly known as Perl_6)
Dependent on shell-globbing:
raku -ne 'state $i; $i++ if .chars <= 300; END say $i // 0;'
#OR
raku -ne 'state $i; if .chars <= 300 {$i++}; END say $i // 0;'
Files determined via regex (independent of shell-globbing):
raku -e 'for dir(test => / .+ .log $ /) {state $i; $i++ if .chars <= 300 for .lines; END say $i // 0};'
https://docs.raku.org/syntax/state
https://docs.raku.org/routine/dir
https://raku.org