Find the total size of certain files within a directory branch

Assume there’s an image storage directory, say, ./photos/john_doe, within which there are multiple subdirectories, where many certain files reside (say, *.jpg). How can I calculate a summary size of those files below the john_doe branch?

I tried du -hs ./photos/john_doe/*/*.jpg, but this shows individual files only. Also, this tracks only the first nest level of the john_doe directory, like john_doe/june/, but skips john_doe/june/outrageous/.

So, how could I traverse the entire branch, summing up the size of the certain files?

Asked By: mbaitoff

||
du -ch public_html/images/*.jpg | grep total
20M total

gives me the total usage of my .jpg files in this directory.

To deal with multiple directories you’d probably have to combine this with find somehow.

You might find du command examples useful (it also includes find)

Answered By: Levon
find ./photos/john_doe -type f -name '*.jpg' -exec du -ch {} + | grep total$

If more than one invocation of du is required because the file list is very long, multiple totals will be reported and need to be summed.

Answered By: SHW

Primarily, you need two things:

du -ch -- **/*.jpg | tail -n 1

If the list of files is too big that it can’t be passed to a single invocation of du -c, on a GNU system, you can do:

find . -iname '*.jpg' -type f -printf '%bt%D:%in' |
  sort -u | cut -f1 | paste -sd+ - | bc

(size expressed in number of 512 byte blocks). Like du it tries to count hard links only once. If you don’t care about hardlinks, you can simplify it to:

(printf 0; find . -iname '*.jpg' -type f -printf +%b) | bc

If you want the size instead of disk usage, replace %b with %s. The size will then be expressed in bytes.

Answered By: Stéphane Chazelas

The answers given until now do not take into account that the file list passed from find to du may be so long that find automatically splits the list into chunks, resulting in multiple occurences of total.

You can either grep total (locale!) and sum up manually, or use a different command. AFAIK there are only two ways to get a grand total (in kilobytes) of all files found by find:
find . -type f -iname '*.jpg' -print0 | xargs -r0 du -a| awk '{sum+=$1} END {print sum}'

Explanation
find . -type f -iname '*.jpg' -print0: Find all files with the extension jpg regardless of case (i.e. *.jpg, *.JPG, *.Jpg…) and output them (null-terminated).
xargs -r0 du -a:
-r: Xargs would call the command even with no arguments passed, which -r prevents. -0 means null-terminated strings (not newline terminated).
awk '{sum+=$1} END {print sum}': Sum up the file sizes output by the previous command

And for reference, the other way would be
find . -type f -iname '*.jpg' -print0 | du -c --files0-from=-

Answered By: Jan

The solutions mentioned so far are inefficient (exec is expensive) and require additional manual work to sum if the file list is long or they don’t work on Mac OS X. The following solution is very fast, should work on any system, and yields the total answer in GB (remove a /1024 if you want to see the total in MB):

find . -iname "*.jpg" -ls |perl -lane '$t += $F[6]; print $t/1024/1024/1024 . " GB"'

Answered By: hobbydad

Improving SHW’s great answer to make it work with any locale, like Zbyszek already pointed out in his comment:

LC_ALL=C find ./photos/john_doe -type f -name '*.jpg' -exec du -ch {} + | grep total$
Answered By: lbo

du naturally traverses the directory hierarchy and awk can perform the filtering so something like this may be sufficient:

du -ak | awk 'BEGIN {sum=0} /.jpg$/ {sum+=$1} END {print sum}'

This works without GNU.

Answered By: GeoffP

The ultimate answer is:

{ find <DIR> -type f -name "*.<EXT>" -printf "%s+"; echo 0; } | bc

and even faster version, not limited by RAM, but that requires GNU AWK with bignum support:

find <DIR> -type f -name "*.<EXT>" -printf "%sn" | gawk -M '{t+=$1}END{print t}'

This version has the following features:

  • all capabilities of find to specify the files you’re looking for
  • supports millions of files
    • other answers here are limited by the maximum length of the argument list
  • spawns only 3 simple processes with a minimal pipe throughput
    • many answers here spawn C+N processes, where C is some constant and N is the number of files
  • doesn’t bother with string manipulation
    • this version doesn’t do any grepping, or regexing
    • well, find does a simple wildcard matching of filenames
  • optionally formats the sum into a human-readable form (eg. 5.5K, 176.7M, …)
    • to do that append | numfmt --to=si
Answered By: rindeal

Another would be

ls -al <directory> | awk '{t+=$5}END{print t}}'

Assuming you’re looking in a single directory. If you want to look at the current directory and beneath that

ls -Ral <directory> | awk '{t+=$5}END{print t}}'
Answered By: chris bird

This is what worked for me.

find -type f -iname *.jpg -print0 | du -ch --files0-from=- | grep total$
Answered By: serendrewpity

Other alternative using stat rather than du

stat -L -c %s ** | awk '{s+=$1} END {printf "%.0fn", s}'

See Gilles answer about using **

Answered By: Peter Frost

This is a mashup of several answers and comments that do what I need.

find . ( -iname "*.jpg" -o -iname "*.png" ) -type f -exec du -bc {} + | grep total$ | cut -f1 | awk '{ total += $1 }; END { print total }'| numfmt --to=iec

  • find will get all the files recursively
  • -iname is for case INsensitive
  • -o and parenthesis to look for multiple patterns
  • du -bc will get the files’ size, sometimes in more than one call if there are many files
  • grep total will get only the total line as given by du
  • cut -f1 will take only the actual integer values
  • awk will sum them all
  • numfmt will convert it to a human-readable format
Answered By: Gabriel

Using the modern fd (AKA fd-find or fdfind on Ubuntu)

fdfind -e jpg -X du -ch | tail -1

I found fd easier to work with then find, and no need to enable globstar

The trick is to use the uppercase X –exec-batch that executes the command just once and not the lowercase x, which does a normal exec running on every file.

To install on ubuntu:

sudo apt install fd-find

See more

Answered By: Janghou