How do you sort du output by size?
How do you sort du -sh /dir/*
by size? I read one site that said use | sort -n
but that’s obviously not right. Here’s an example that is wrong.
[~]# du -sh /var/* | sort -n
0 /var/mail
1.2M /var/www
1.8M /var/tmp
1.9G /var/named
2.9M /var/run
4.1G /var/log
8.0K /var/account
8.0K /var/crash
8.0K /var/cvs
8.0K /var/games
8.0K /var/local
8.0K /var/nis
8.0K /var/opt
8.0K /var/preserve
8.0K /var/racoon
12K /var/aquota.user
12K /var/portsentry
16K /var/ftp
16K /var/quota.user
20K /var/yp
24K /var/db
28K /var/empty
32K /var/lock
84K /var/profiles
224M /var/netenberg
235M /var/cpanel
245M /var/cache
620M /var/lib
748K /var/spool
If you have GNU coreutils (common in most Linux distributions), you can use
du -sh -- * | sort -h
The -h
option tells sort
that the input is the human-readable format (number with unit; 1024-based so that 1023 is considered less than 1K which happens to match what GNU du -h
does).
This feature was added to GNU Core Utilities 7.5 in Aug 2009.
Note:
If you are using an older version of Mac OSX, you need to install coreutils with
brew install coreutils
, then usegsort
as drop-in replacement ofsort
.Newer versions of macOS (verified on Mojave) support
sort -h
natively.
This little Perl script does the trick. Save it as duh
(or whatever you want) and call it with duh /dir/*
#!/usr/bin/perl -w
use strict;
my @line;
sub to_human_readable {
my ($number) = @_;
my @postfix = qw( k M G T P );
my $post;
my $divide = 1;
foreach (@postfix) {
$post = $_;
last if (($number / ($divide * 1024)) < 1);
$divide = $divide * 1024;
}
$number = int($number/$divide + 0.5);
return $number . $post;
}
sub trimlengthright {
my ($txt, $len) = @_;
if ( length($txt) >= $len ) {
$txt = substr($txt,0,$len - 1) . " ";
} else {
$txt = $txt . " " x ($len - length($txt));
}
return $txt;
}
sub trimlengthleft {
my ($txt, $len) = @_;
if ( length($txt) >= $len ) {
$txt = substr($txt,0,$len - 1) . " ";
} else {
$txt = " " x ($len - length($txt)) . $txt;
}
return $txt;
}
open(DF,"du -ks @ARGV | sort -n |");
while (<DF>) {
@line = split;
print &trimlengthleft(&to_human_readable($line[0]),5)," "; # size
print &trimlengthright($line[1],70),"n"; # directory
}
close DF;
Try using the -k flag to count 1K blocks intead of using human-readable. Then, you have a common unit and can easily do a numeric sort.
du -ck | sort -n
You don’t explictly require human units, but if you did, then there are a bunch of ways to do it. Many seem to use the 1K block technique above, and then make a second call to du.
https://serverfault.com/questions/62411/how-can-i-sort-du-h-output-by-size
If you want to see the KB units added, use:
du -k | sed -e 's_^([0-9]*)_1 KB_' | sort -n
If you don’t have sort -h
you can do this:
du -sh * | sed 's/([[:digit:]])t/1Bt/' | sed 's/(.t)/t1/' | sed 's/Gt/Zt/' | sort -n -k 2d,2 -k 1n,1 | sed 's/Zt/Gt/'
This gets the du list, separates the suffix, and sorts using that. Since there is no suffix for <1K, the first sed adds a B (for byte). The second sed adds a delimiter between the digit and the suffix. The third sed converts G to Z so that it’s bigger than M; if you have terabyte files, you’ll have to convert G to Y and T to Z. Finally, we sort by the two columns, then we replace the G suffix.
If you don’t have a recent version of GNU coreutils, you can call du
without -h
to get sortable output, and produce human-friendly output with a little postprocessing. This has the advantage of working even if your version of du
doesn’t have the -h
flag.
du -k | sort -n | awk '
function human(x) {
if (x<1000) {return x} else {x/=1024}
s="kMGTEPZY";
while (x>=1000 && length(s)>1)
{x/=1024; s=substr(s,2)}
return int(x+0.5) substr(s,1,1)
}
{gsub(/^[0-9]+/, human($1)); print}'
If you want SI suffixes (i.e. multiples of 1000 rather than 1024), change 1024 to 1000 in the while
loop body. (Note that that 1000 in the condition is intended, so that you get e.g. 1M
rather than 1000k
.)
If your du
has an option to display sizes in bytes (e.g. -b
or -B 1
— note that this may have the side effect of counting actual file sizes rather than disk usage), add a space to the beginning of s
(i.e. s=" kMGTEPYZ";
), or add if (x<1000) {return x} else {x/=1024}
at the beginning of the human
function.
Displaying a decimal digit for numbers in the range 1–10 is left as an exercise to the reader.
Here’s what I use on Ubuntu 10.04, CentOS 5.5, FreeBSD and Mac OS X.
I borrowed the idea from www.geekology.co.za/ and earthinfo.org, as well as the infamous ducks from “Linux Server Hacks” by O’Reilly. I am still adapting it to my needs. This is still a work in progress (As in, I was working on this on the train this morning.):
#! /usr/bin/env bash
ducks () {
du -cks -x | sort -n | while read size fname; do
for unit in k M G T P E Z Y; do
if [ $size -lt 1024 ]; then
echo -e "${size}${unit}t${fname}"
break
fi
size=$((size/1024))
done
done
}
ducks > .ducks && tail .ducks
Here’s the output:
stefan@darwin:~ $ ducks
32M src
42M .cpan
43M .macports
754M doc
865M Work
1G .Trash
4G Library
17G Downloads
30G Documents
56G total
stefan@darwin:~ $
Go crazy with this script –
$du -k ./* |
> sort -nr |
> awk '
> {split("KB,MB,GB",size,",");}
> {x = 1;while ($1 >= 1024) {$1 = $1 / 1024;x = x + 1} $1 = sprintf("%-4.2f%s", $1, size[x]); print $0;}'
To sort by size in MB
du --block-size=MiB --max-depth=1 path | sort -n
This script is even easier:
for i in G M K; do du -h -d1 / | grep [0-9]$i | sort -n; done
On OS X, you can install the needed coreutils via Homebrew:
brew install coreutils
With this you’ll have gsort
, which includes the -h
command line parameter.
This one handles filenames with whitespace or apostrophes, and works on systems which do not support xargs -d
or sort -h
:
du -s * | sort -n | cut -f2 | tr 'n' ' ' | xargs -0 -I {} du -sh "{}"
which results in:
368K diskmanagementd
392K racoon
468K coreaudiod
472K securityd
660K sshd
3.6M php-fpm
Since Mac OS X doesn’t have the -h
option for sort
(I was probably using Mavericks or Yosemite), so I tried and learned sed
and awk
for a first attempt:
du -sk * | sort -g | awk '{ numBytes = $1 * 1024; numUnits = split("B K M G T P", unit); num = numBytes; iUnit = 0; while(num >= 1024 && iUnit + 1 < numUnits) { num = num / 1024; iUnit++; } $1 = sprintf( ((num == 0) ? "%6d%s " : "%6.1f%s "), num, unit[iUnit + 1]); print $0; }'
it is a long line. Expanded, it is:
du -sk * | sort -g | awk '{
numBytes = $1 * 1024;
numUnits = split("B K M G T P", unit);
num = numBytes;
iUnit = 0;
while(num >= 1024 && iUnit + 1 < numUnits) {
num = num / 1024;
iUnit++;
}
$1 = sprintf( ((num == 0) ? "%6d%s " : "%6.1f%s "), num, unit[iUnit + 1]);
print $0;
}'
I tried it on Mac OS X Mavericks, Yosemite, Ubuntu 2014-04 with awk
being the default awk
(which is nawk
, because both awk
and nawk
point to /usr/bin/mawk
) or gawk, and they all worked.
Here is a sample of the output on a Mac:
0B bar
0B foo
4.0K wah
43.0M Documents
1.2G Music
2.5G Desktop
4.7G Movies
5.6G VirtualBox VMs
9.0G Dropbox
11.7G Library
21.2G Pictures
27.0G Downloads
instead of du -sk *
, I saw in @Stefan’s answer where the grand total is also displayed, and without traversing any filesystem mount point, by using du -skcx *
for OSX
du -h -k {PATH} | sort -n
This will sort the output in decreasing order of size:
du -sh /var/* | sort -k 1rn
This will sort the output in increasing order of size:
du -sh /var/* | sort -k 1n
PS : this can be used to sort by any column but that column values should be in same format
Tested on Solaris!
du -kh | sort -nk1 | grep [0-9]K && du -kh | sort -nk1 | grep [0-9]M && du -kh | sort -nk1 | grep [0-9]G
This will output all directory sizes recursively, at the bottom will be largest directory in Gigabytes and at the top smallest in Kilobytes.
The biggest is at the bottom:
du -sh * | sort -h
In the absence of GNU sort -h
, this should work in most UNIX environments:
join -1 2 -2 2 <(du -sk /dir/* 2>/dev/null | sort -k2,2) <(du -sh /dir/* 2>/dev/null | sort -k2,2) | sort -nk2,2 | awk '{ print $3 "t" $1 }'
Command:
du -ah . | sort -k1 -h | tail -n 50
Explanation:
- List size of all files/folders recursively in the current directory in human-readable form
du -ah .
- Sort the human-readable size which is present in the first column and keep the largest 50
sort -k1 -h | tail -n 50