How can I get the size of a file in a bash script?

How can I get the size of a file in a bash script?

How do I assign this to a bash variable so I can use it later?

Asked By: haunted85

||

ls -l filename will give you lots of information about a file, including its file size, permissions and owner.

The file size in the fifth column, and is displayed in bytes. In the example below, the filesize is just under 2KB:

-rw-r--r-- 1 user owner 1985 2011-07-12 16:48 index.php

Edit: This is apparently not as reliable as the stat command.

Answered By: Druckles

du filename will tell you disk usage in bytes.

I prefer du -h filename, which gives you the size in a human readable format.

Answered By: Teddy

Your best bet if on a GNU system:

stat --printf="%s" file.any

From man stat:

%s total size, in bytes

In a bash script :

#!/bin/bash
FILENAME=/home/heiko/dummy/packages.txt
FILESIZE=$(stat -c%s "$FILENAME")
echo "Size of $FILENAME = $FILESIZE bytes."

NOTE: see @chbrown’s answer for how to use stat on BSD or macOS systems.

Answered By: b01
file_size_kb=`du -k "$filename" | cut -f1`

The problem with using stat is that it is a GNU (Linux) extension. du -k and cut -f1 are specified by POSIX and are therefore portable to any Unix system.

Solaris, for example, ships with bash but not with stat. So this is not entirely hypothetical.

ls has a similar problem in that the exact format of the output is not specified, so parsing its output cannot be done portably. du -h is also a GNU extension.

Stick to portable constructs where possible, and you will make somebody’s life easier in the future. Maybe your own.

Answered By: Nemo

You could also use the “word count” command (wc):

wc -c "$filename" | awk '{print $1}'

The problem with wc is that it’ll add the filename and indent the output. For example:

$ wc -c somefile.txt
    1160 somefile.txt

If you would like to avoid chaining a full interpreted language or stream editor just to get a file size count, just redirect the input from the file so that wc never sees the filename:

wc -c < "$filename"

This last form can be used with command substitution to easily grab the value you were seeking as a shell variable, as mentioned by Gilles below.

size="$(wc -c <"$filename")"
Answered By: Eugéne

I like the wc option myself. Paired with ‘bc,’ you can get decimals to as many places as you please.

I was looking to improve a script I had that awk’ed out the ‘file size’ column of an ‘ls -alh’ command. I didn’t want just integer file sizes, and two decimals seemed to suit, so after reading this discussion, I came up with the code below.

I suggest breaking the line at the semicolons if you include this in a script.

file=$1; string=$(wc -c $file); bite=${string% *}; okay=$(echo "scale=2; $bite/1024" | bc);friend=$(echo -e "$file $okay" "kb"); echo -e "$friend"

My script is called gpfl, for “get picture file length.” I use it after doing a mogrify on a file in imagemagick, before opening or re-loading a picture in a GUI jpeg viewer.

I don’t know how this rates as an “answer,” as it borrows much from what’s already been offered and discussed. So I’ll leave it there.

BZT

Answered By: BZT

BSD’s (macOS’s) stat has a different format argument flag, and different field specifiers. From man stat(1):

  • -f format: Display information using the specified format. See the FORMATS section for a description of valid formats.
  • … the FORMATS section …
  • z: The size of file in bytes.

So all together now:

stat -f%z myfile1.txt

NOTE: see @b01’s answer for how to use the stat command on GNU/Linux systems. 🙂

Answered By: chbrown

This script combines many ways to calculate the file size:

(
  du --apparent-size --block-size=1 "$file" 2>/dev/null ||
  gdu --apparent-size --block-size=1 "$file" 2>/dev/null ||
  find "$file" -printf "%s" 2>/dev/null ||
  gfind "$file" -printf "%s" 2>/dev/null ||
  stat --printf="%s" "$file" 2>/dev/null ||
  stat -f%z "$file" 2>/dev/null ||
  wc -c <"$file" 2>/dev/null
) | awk '{print $1}'

The script works on many Unix systems including Linux, BSD, OSX, Solaris, SunOS, etc.

The file size shows the number of bytes. It is the apparent size, which is the bytes the file uses on a typical disk, without special compression, or special sparse areas, or unallocated blocks, etc.

This script has a production version with more help and more options here:
https://github.com/SixArm/file-size

Answered By: joelparkerhenderson

stat appears to do this with the fewest system calls:

$ set debian-live-8.2.0-amd64-xfce-desktop.iso

$ strace stat --format %s $1 | wc
    282    2795   27364

$ strace wc --bytes $1 | wc
    307    3063   29091

$ strace du --bytes $1 | wc
    437    4376   41955

$ strace find $1 -printf %s | wc
    604    6061   64793
Answered By: user150821

Depends what you mean by size.

size=$(wc -c < "$file")

will give you the number of bytes that can be read from the file. IOW, it’s the size of the contents of the file. It will however read the contents of the file (except if the file is a regular file or symlink to regular file in most wc implementations as an optimisation). That may have side effects. For instance, for a named pipe, what has been read can no longer be read again and for things like /dev/zero or /dev/random which are of infinite size, it’s going to take a while. That also means you need read permission to the file, and the last access timestamp of the file may be updated.

That’s standard and portable, however note that some wc implementations may include leading blanks in that output. One way to get rid of them is to use:

size=$(($(wc -c < "$file")))

or to avoid an error about an empty arithmetic expression in dash or yash when wc produces no output (like when the file can’t be opened):

size=$(($(wc -c < "$file") +0))

ksh93 has wc builtin (provided you enable it, you can also invoke it as command /opt/ast/bin/wc) which makes it the most efficient for regular files in that shell.

Various systems have a command called stat that’s an interface to the stat() or lstat() system calls.

Those report information found in the inode. One of that information is the st_size attribute. For regular files, that’s the size of the content (how much data could be read from it in the absence of error (that’s what most wc -c implementations use in their optimisation)). For symlinks, that’s the size in bytes of the target path. For named pipes, depending on the system, it’s either 0 or the number of bytes currently in the pipe buffer. Same for block devices where depending on the system, you get 0 or the size in bytes of the underlying storage.

You don’t need read permission to the file to get that information, only search permission to the directory it is linked to.

By chronological¹ order, there is:

  • IRIX stat (90’s):

    stat -qLs -- "$file"
    

    returns the st_size attribute of $file (lstat()) or:

    stat -s -- "$file"
    

    same except when $file is a symlink in which case it’s the st_size of the file after symlink resolution.

  • zsh stat builtin (now also known as zstat) in the zsh/stat module (loaded with zmodload zsh/stat) (1997):

    stat -L +size -- $file # st_size of file
    stat +size -- $file    # after symlink resolution
    

    or to store in a variable:

    stat -L -A size +size -- $file
    

    obviously, that’s the most efficient in that shell.

  • GNU stat (2001); also in BusyBox stat since 2005 and Toybox stat since 2013 (both copying the GNU stat interface):

    stat -c %s -- "$file"  # st_size of file
    stat -Lc %s -- "$file" # after symlink resolution
    

    (note the meaning of -L is reversed compared to IRIX or zsh stat).

  • BSDs stat (2002):

    stat -f %z -- "$file"  # st_size of file
    stat -Lf %z -- "$file" # after symlink resolution
    

Or you can use the stat()/lstat() function of some scripting language like perl:

perl -le 'print((lstat shift)[7])' -- "$file"

AIX also has an istat command which will dump all the stat() (not lstat(), so won’t work on symlinks) information and which you could post-process with, for example:

LC_ALL=C istat "$file" | awk 'NR == 4 {print $5}'

(thanks @JeffSchaller for the help figuring out the details).

In tcsh:

@ size = -Z $file:q

(size after symlink resolution)

Long before GNU introduced its stat command, the same could be achieved with GNU find command with its -printf predicate (already in 1991):

find -- "$file" -prune -printf '%sn'    # st_size of file
find -L -- "$file" -prune -printf '%sn' # after symlink resolution

One issue though is that doesn’t work if $file starts with - or is a find predicate (like !, (…).

Since version 4.9, that can be worked around by passing the file path through its stdin rather than as an argument with:

printf '%s' "$file" |
  find -files0-from - -prune -printf '%sn'

The standard command to get the stat()/lstat() information is ls.

POSIXly, you can do:

LC_ALL=C ls -dln -- "$file" | awk '{print $5; exit}'

(-n is required to imply -l so the latter should not be necessary, but you’ll find that on some BSDs, it is).

and add -L for the same after symlink resolution. That doesn’t work for device files though where the 5th field is the device major number instead of the size.

For block devices, systems where stat() returns 0 for st_size, usually have other APIs to report the size of the block device. For instance, Linux has the BLKGETSIZE64 ioctl(), and most Linux distributions now ship with a blockdev command that can make use of it:

blockdev --getsize64 -- "$device_file"

However, you need read permission to the device file for that. It’s usually possible to derive the size by other means. For instance (still on Linux):

lsblk -bdno size -- "$device_file"

Should work except for empty devices.

An approach that works for all seekable files (so includes regular files, most block devices and some character devices) is to open the file and seek to the end:

  • With zsh (after loading the zsh/system module):

    {sysseek -w end 0 && size=$((systell(0)))} < $file
    
  • With ksh93:

    < "$file" <#((size=EOF))
    

    or

    { size=$(<#((EOF))); } < "$file"
    
  • with perl:

    perl -le 'seek STDIN, 0, 2 or die "seek: $!"; print tell STDIN' < "$file"
    

For named pipes, we’ve seen that some systems (AIX, Solaris, HP/UX at least) make the amount of data in the pipe buffer available in stat()‘s st_size. Some (like Linux or FreeBSD) don’t.

On Linux at least, you can use the FIONREAD ioctl() after having opened the pipe (in read+write mode to avoid it hanging):

fuser -s -- "$fifo_file" && 
  perl -le 'require "sys/ioctl.ph";
            ioctl(STDIN, &FIONREAD, $n) or die$!;
            print unpack "L", $n' <> "$fifo_file"

However note that while it doesn’t read the content of the pipe, the mere opening of the named pipe here can still have side effects. We’re using fuser to check first that some process already has the pipe open to alleviate that but that’s not foolproof as fuser may not be able to check all processes.

Now, so far we’ve only been considering the size of the primary data associated with the files. That doesn’t take into account the size of the metadata and all the supporting infrastructure needed to store that file.

Another inode attribute returned by stat() is st_blocks. That’s the number of 512 byte (1024 on HP/UX) blocks that is used to store the file’s data (and sometimes some of its metadata like the extended attributes on ext4 filesystems on Linux). That doesn’t include the inode itself, or the entries in the directories the file is linked to.

Size and disk usage are not necessarily tightly related as compression, sparseness (sometimes some metadata), extra infrastructure like indirect blocks in some filesystems have an influence on the latter.

That’s typically what du uses to report disk usage. Most of the commands listed above will be able to get you that information.

  • POSIXLY_CORRECT=1 ls -sd -- "$file" | awk '{print $1; exit}'
  • POSIXLY_CORRECT=1 du -s -- "$file" (not for directories where that would include the disk usage of the files within).
  • GNU find -- "$file" -printf '%bn'
  • zstat -L +block -- $file
  • GNU stat -c %b -- "$file"
  • BSD stat -f %b -- "$file"
  • perl -le 'print((lstat shift)[12])' -- "$file"

¹ Strictly speaking, early versions of UNIX in the 70s, from v1 to v4 had a stat command. It was just dumping information from the inode and didn’t take options. It apparently disappeared in v5 (1974) presumably because it was redundant with ls -l.

Answered By: Stéphane Chazelas

Create small utility functions in your shell scripts that you can delegate to.

Example

#! /bin/sh -
# vim: set ft=sh

# size utility that works on GNU and BSD systems
size(){
    case $(uname) in
        (Darwin | *BSD*)
            stat -Lf %z -- "$1";;
        (*) stat -c %s -- "$1"
    esac
}

for f do
    printf '%sn' "$f : $(gzip < "$f" | wc -c) bytes (versus $(size "$f") bytes)"
done

Based on info from @Stéphane Chazelas’ answer.

Answered By: oligofren

I found an AWK 1 liner, and it had a bug but I fixed it. I also added in PetaBytes after TeraBytes.

FILE_SIZE=234234 # FILESIZE IN BYTES
FILE_SIZE=$(echo "${FILE_SIZE}" | awk '{ split( "B KB MB GB TB PB" , v ); s=1; while( $1>1024 ){ $1/=1024; s++ } printf "%.2f %s", $1, v[s] }')

Considering stat is not on every single system, you can almost always use the AWK solution. Example; the Raspberry Pi does not have stat but it does have awk.

Answered By: findrbot_admin

Fastest and simplest (IMO) method is:

bash_var=$(stat -c %s /path/to/filename)
Answered By: WinEunuuchs2Unix
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.