bzip2: Check file's decompressed size without actually decompressing it

I have a big bzip2 compressed file and I need to check it’s decompressed size without actually decompressing it (similar to gzip -l file.gz or xz -l file.xz). How can this be done using bzip2?

Asked By: manifestor

||

This question has already been answered here. Pasted below:

As noted by others, bzip2 doesn’t provide much information. But this technique works — you will have to decompress the file, but you won’t have to write the decompressed data to disk, which may be a “good enough” solution for you:

$ ls -l foo.bz2
-rw-r--r-- 1 ~quack ~quack 2364418 Jul  4 11:15 foo.bz2

$ bzcat foo.bz2 | wc -c         # bzcat decompresses to stdout, wc -c counts bytes
2928640                         # number of bytes of decompressed data

You can pipe that output into something else to give you a human-readable form:

$ ls -lh foo.bz2
-rw-r--r-- 1 quack quack 2.3M Jul  4 11:15 foo.bz2

$ bzcat foo.bz2 | wc -c | perl -lne 'printf("%.2fMn", $_/1024/1024)'
2.79M
Answered By: NATI0N

Like mentioned in the comments and linked answer, the only reliable way is to decompress (in a pipe) and do a byte count.

$ bzcat file.bz2 | wc -c
1234

Alternatively find some tool that does it without the superfluous pipe (could be slightly more efficient):

$ 7z t file.bz2
[...]
Everything is Ok
Size:       1234

This also applies to gzip and other formats. Although gzip -l file.gz prints a size, it can be a wrong result. Once the file is past a certain size, you get stuff like:

$ gzip --list foobar.gz 
         compressed        uncompressed  ratio uncompressed_name
           97894400            58835168 -66.4% foobar
$ gzip --list foobar.gz 
         compressed        uncompressed  ratio uncompressed_name
         4796137936                   0   0.0% foobar

Or if the file was concatenated or simply not created correctly:

$ truncate -s 1234 foobar
$ gzip foobar
$ cat foobar.gz foobar.gz > barfoo.gz
$ gzip -l barfoo.gz 
         compressed        uncompressed  ratio uncompressed_name
                 74                1234  96.0% barfoo
$ zcat barfoo.gz | wc -c
2468

The size does not match so this is not reliable in any way.

Sometimes you can cheat, depending on what’s inside the archive. For example if it’s a compressed filesystem image, with a metadata header at the start, you could decompress just that header then read total filesystem size from it.

$ truncate -s 1234M foobar.img
$ mkfs.ext2 foobar.img
$ bzip2 foobar.img
$ bzcat foobar.img.bz2 | head -c 1M > header.img
$ tune2fs -l header.img
tune2fs 1.45.4 (23-Sep-2019)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          95b64880-c4a7-4bea-9b63-6fdcc86d0914
[...]
Block count:              315904
Block size:               4096

So by extracting a tiny part you learn that this is 315904 blocks of 4096 bytes, which comes out as 1234 MiB.

There’s no guarantee that would be the actual size of the compressed file (it could be larger or smaller) but assuming no weird stuff, it’s more trustworthy than gzip -l in any case.

Last but not least if those files are created by you in the first place, just record the size.

Answered By: frostschutz
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.