How to uncompress zlib data in UNIX?
I have created zlib-compressed data in Python, like this:
import zlib
s = '...'
z = zlib.compress(s)
with open('/tmp/data', 'w') as f:
f.write(z)
(or one-liner in shell: echo -n '...' | python2 -c 'import sys,zlib; sys.stdout.write(zlib.compress(sys.stdin.read()))' > /tmp/data
)
Now, I want to uncompress the data in shell. Neither zcat
nor uncompress
work:
$ cat /tmp/data | gzip -d -
gzip: stdin: not in gzip format
$ zcat /tmp/data
gzip: /tmp/data.gz: not in gzip format
$ cat /tmp/data | uncompress -
gzip: stdin: not in gzip format
It seems that I have created gzip-like file, but without any headers. Unfortunately I don’t see any option to uncompress such raw data in gzip man page, and the zlib package does not contain any executable utility.
Is there a utility to uncompress raw zlib data?
zlib
implements the compression used by gzip, but not the file format. Instead, you should use the gzip
module, which itself uses zlib
.
import gzip
s = '...'
with gzip.open('/tmp/data', 'w') as f:
f.write(s)
This might do it:
import glob
import zlib
import sys
for filename in sys.argv:
with open(filename, 'rb') as compressed:
with open(filename + '-decompressed', 'wb') as expanded:
data = zlib.decompress(compressed.read())
expanded.write(data)
Then run it like this:
$ python expander.py data/*
I have found a solution (one of the possible ones), it’s using openssl:
$ openssl zlib -d < /tmp/data
or
$ openssl zlib -d -in /tmp/data
*NOTE: zlib functionality is apparently available in recent openssl versions >=1.0.0 (OpenSSL has to be configured/built with zlib or zlib-dynamic option, the latter is default)
It is also possible to decompress it using standard shell-script + gzip, if you don’t have, or want to use openssl or other tools.
The trick is to prepend the gzip magic number and compress method to the actual data from zlib.compress
:
printf "x1fx8bx08x00x00x00x00x00" |cat - /tmp/data |gzip -dc >/tmp/out
Edits:
@d0sboots commented: For RAW Deflate data, you need to add 2 more null bytes:
→ "x1fx8bx08x00x00x00x00x00x00x00"
This Q on SO gives more information about this approach. An answer there suggests that there is also an 8 byte footer.
Users @Vitali-Kushner and @mark-bessey reported success even with truncated files, so a gzip footer does not seem strictly required.
@tobias-kienzler suggested this function for the bashrc:
zlibd() (printf "x1fx8bx08x00x00x00x00x00" | cat - "$@" | gzip -dc)
zlib-flate -uncompress < IN_FILE > OUT_FILE
I tried this and it worked for me.
zlib-flate
can be found in package qpdf
(in Debian Squeeze, Fedora 23, and brew on MacOS according to comments in other answers)
(Thanks to user @tino who provided this as a comment below the OpenSSL answer. Made into propper answer for easy access.)
You can use this to compress with zlib:
openssl enc -z -none -e < /file/to/deflate
And this to deflate:
openssl enc -z -none -d < /file/to/deflate
I recommend pigz from Mark Adler, co-author of the zlib compression library. Execute pigz
to see the available flags.
You will notice:
-z --zlib Compress to zlib (.zz) instead of gzip format.
You can uncompress using the -d
flag:
-d --decompress --uncompress Decompress the compressed input.
Assuming a file named ‘test’:
pigz -z test
– creates a zlib compressed file named test.zzpigz -d -z test.zz
– converts test.zz to the decompressed test file
On OSX you can execute brew install pigz
zcat -f infile > outfile
works for me on fedora25
The example program zpipe.c
found here by Mark Adler himself (comes with the source distribution of the zlib library) is very useful for these scenarios with raw zlib data. Compile with cc -o zpipe zpipe.c -lz
and to decompress: zpipe -d < raw.zlib > decompressed
. It can also do the compression without the -d
flag.
On macOS, which is a full POSIX compliant UNIX (formally certified!), OpenSSL
has no zlib
support, there is no zlib-flate
either and while the first solution works as well as all the Python solutions, the first solution requires the ZIP data to be in a file and all the other solutions force you to create a Python script.
Here’s a Perl based solution that can be used as a command line one-liner, gets its input via STDIN pipe and that works out of the box with a freshly installed macOS:
cat file.compressed | perl -e 'use Compress::Raw::Zlib;my $d=new Compress::Raw::Zlib::Inflate();my $o;undef $/;$d->inflate(<>,$o);print $o;'
Nicer formatted, the Perl script looks like this:
use Compress::Raw::Zlib;
my $decompressor = new Compress::Raw::Zlib::Inflate();
my $output;
undef $/;
$decompressor->inflate(<>, $output);
print $output;
Optimized version from Marco d’Itri (see comments):
cat file.compressed | perl -MCompress::Zlib -E 'undef $/;print uncompress(<>)'
During development of eIDAS related code, i’ve came up with bash script, that decodes SSO (SingleSignOn) SAMLRequest param, which is usually encoded by base64 and raw-deflate (php gzdeflate)
#!/bin/bash
# file decode_saml_request.sh
urldecode() { : "${*//+/ }"; echo -e "${_//%/\x}"; }
if [[ $contents == *"SAMLRequest" ]]; then
# extract param SAMLRequest from URL, strip all following params
contents=$(cat ${1} | awk -F 'SAMLRequest=' '{print $2}' | awk -F '&' '{print $1}')
else
# work with raw base64 encoded string
contents=$(cat ${1})
fi
# add gzip raw-deflate header bytes and gunzip (`gzip -dc` can be replaced by `gunzip`)
printf "x1fx8bx08x00x00x00x00x00x00x00" | cat - <(echo `urldecode $contents` | base64 -d) | gzip -dc
You can use it like
> decode_saml_request.sh /path/to/file_with_sso_url
# or
> echo "y00tLk5MT1VISSxJBAA%3D" | decode_saml_request.sh
Script is published also as gist here: https://gist.github.com/smarek/77dacb9703ac8b715b5eced5314d5085 so i may not maintain this answer but I will maintain the source gist
The simple inflate program pufftest.c found in contrib/puff of zlib packet by Mark Adler himself can handle raw zlib data whithout header bytes and Adler32 checksum. Compile with cc -o pufftest puff.c pufftest.c
and to inflate: pufftest < raw.zlib > decompressed
. Note, it can’t deflate.
I have an addition to @Alex Stragies conversion for those who need a proper header and footer (an actual conversion from zlib to gzip).
It would probably be easier to use one of the above methods, however if the reader has a case like mine which requires conversion of zlib to gzip without decompression and recompression, this is the way to do it.
According to RFC1950/1952, A zlib file can only have a single stream or member. This is different from gzip in that:
A gzip file consists of a series of "members" (compressed data
sets). … The members simply appear one after another in the file,
with no additional information before, between, or after them.
This means that while a single zlib file can always be converted to a single gzip file, the converse is not strictly true. Something to keep in mind.
zlib has both a header (2 bytes) and a footer (4 bytes) which must be removed from the data so that the gzip header and footer can be appended. One way of doing that is as follows:
# Remove zlib 4 byte footer
trunc_size=$(ls -l infile.z | awk '{print $5 - 4}')
truncate -s $trunc_size infile.z
# Remove zlib 2 byte header
dd bs=1M iflag=skip_bytes skip=2 if=infile.z of=tmp1.z
Now we have just raw data and may append the gzip header (from @Alex Stragies)
printf "x1fx8bx08x00x00x00x00x00x00x00" | cat - tmp1.z > tmp2.z
The gzip footer is 8 bytes long. It consists the CRC32 of the uncompressed file, plus the size of the file uncompressed mod 2^32, both in big endian format. If you don’t know these but have means of getting an uncompressed file:
generate_crcbig() {
crc=$(crc32 $uncompressedfile)
crcbig=$(echo "x${crc:6:2}x${crc:4:2}x${crc:2:2}x${crc:0:2}")
}
generate_lbig () {
leng=$(ls -l $uncompressedfile | awk '{print $5}')
lmod=$(expr $leng % 4294967296) # mod 2^32
lhex=$(printf "%xn" $lmod)
lbig=$(echo "x${lhex:6:2}x${lhex:4:2}x${lhex:2:2}x${lhex:0:2}")
}
And then the footer may be appended as such:
printf $crcbig$lbig | cat tmp3.z - > outfile.gz
Now you have a file which is in the gzip format! It can be verified with gzip -t outfile.gz
and uncompressed with any application complying with gzip specifications.
I get it that author doesn’t want to use Python but I believe that Python3 1-liner is natural choice for most Linux users, so let it be here:
python3 -c 'import sys,zlib; sys.stdout.write(zlib.decompress(sys.stdin.buffer.read()).decode())' < $COMPRESSED_FILE_PATH