How to XZ a directory with TAR using maximum compression?

So I need to compress a directory with max compression.

How can I do it with xz? I mean I will need tar too because I can’t compress a directory with only xz. Is there a oneliner to produce e.g. foo.tar.xz?

Asked By: LanceBaynes

||

With a recent GNU tar on bash or derived shell:

XZ_OPT=-9 tar cJf tarfile.tar.xz directory

tar’s lowercase j switch uses bzip, uppercase J switch uses xz.

The XZ_OPT environment variable lets you set xz options that cannot be passed via calling applications such as tar.

This is now maximal.

See man xz for other options you can set (-e/--extreme might give you some additional compression benefit for some datasets).

XZ_OPT=-e9 tar cJf tarfile.tar.xz directory
Answered By: bsd

Assuming xz honors the standard set of commandline flags – including compression level flags, you could try:

tar -cf - foo/ | xz -9 -c - > foo.tar.xz 
Answered By: Shadur

tar command uses J flag for xz files. An example:

tar -cJvf foo.tar.xz foo/

Answered By: leonardoav
XZ_OPT=-9e tar cJf tarfile.tar.xz directory

is even better than

XZ_OPT=-9 tar cJf tarfile.tar.xz directory

The option -e, "–extreme Modify the compression preset (-0 … -9)" so that a little bit better compression ratio can be achieved without increasing memory usage of the compressor or decompressor (exception: compressor memory usage may increase a little with presets -0 … -2). The downside is that the compression time will increase dramatically (it can easily double).

Answered By: Evandro Jr

You might try different options, for me -4e works better

tar cf - wam_GG_${dir}.nc | xz -4e > wam_GG_${dir}.nc.tar.xz 

I tested by running:

$ tar -cf - wam_GG.nc | xz -4e > wam_GG.nc.xz
$ tar -cf - wam_GG.nc | xz -9e > wam_GG.nc.xz.2

So, it seems that option -4e works a little bit better than -9e.

$ ll wam_GG.nc.xz*
-rw-rw-r--. 1 504 504 2707596 Jan 16  2015 wam_GG.nc.xz
-rw-rw-r--. 1 504 504 2708416 Jan 16  2015 wam_GG.nc.xz.2
Answered By: Szymon Roziewski

This is not an exact answer to your question but you could use one command instead of two:

7z a -t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on archive.7z dir1

adds all files from directory “dir1” to archive archive.7z using “ultras ettings”

other formats supported are: zip, gzip, bzip2 or tar. for this just replace 7z after -t.

–source man 7z

NOTE: don’t use this command to backup your system files except personal files because the 7z format doesn’t store filesystem permissions.

Answered By: Alex Jones

If you have 16 GiB of RAM (and nothing else running), you can try:

tar -cf - foo/ | xz --lzma2=dict=1536Mi,nice=273 -c - > foo.tar.xz 

This will need 1.5 GiB for decompression, and about 11x that for compression. Adjust accordingly for lesser amounts of memory.

This will only help if the data is actually that big, and in any case it won’t help THAT much, but still…

If you’re compressing binaries, add –x86 as the first xz option. If you’re playing with “multimedia” files (uncompressed audio or bitmaps), you can try with –delta=dist=2 (experiment with value, good values to try are 1..4).

If you’re feeling very adventurous, you can try playing with more LZMA options, like

--lzma2=dict=1536Mi,nice=273,lc=3,lp=0,pb=2

(these are the default settings, you can try values between 0 and 4, and lc+lp must not exceed 4)

In order to see how the default presets map to these values, you can check the source file src/liblzma/lzma/lzma_encoder_presets.c. Nothing of much interest there though (-e sets the nice length to 273 and also adjusts the depth).

Answered By: Anonymous

For those interested, -e9 is 0.4% smaller, 20% slower at compression, 3% slower for decompression, compared to -9 on a typical laptop. Here’re the timing runs on the Python source code directory structure.

Compression:

$ Tbefore=`date +%s%3N` && XZ_OPT=-9 tar cJf python3.6.tar.9xz Python-3.6.0 && Tafter=`date +%s%3N`
$ python -c "print((float($Tafter) - float($Tbefore)) / 1000.)"
43.87
$ Tbefore=`date +%s%3N` && XZ_OPT=-e9 tar cJf python3.6.tar.e9xz Python-3.6.0 && Tafter=`date +%s%3N`
$ python -c "print((float($Tafter) - float($Tbefore)) / 1000.)"
53.861

Decompression:

$ Tbefore=`date +%s%3N` && tar xf python3.6.tar.9xz && Tafter=`date +%s%3N`
$ python -c "print((float($Tafter) - float($Tbefore)) / 1000.)"  && rm -rf Python-3.6.0
1.395
$ rm -rf Python-3.6.0
$ Tbefore=`date +%s%3N` && tar xf python3.6.tar.e9xz && Tafter=`date +%s%3N`
$ python -c "print((float($Tafter) - float($Tbefore)) / 1000.)"  && rm -rf Python-3.6.0
1.443

File Size:

$ rm -rf Python-3.6.0
$ Tbefore=`date +%s%3N` && tar xf Python-3.6.0.tar.xz && Tafter=`date +%s%3N`
$ python -c "print((float($Tafter) - float($Tbefore)) / 1000.)" && rm -rf Python-3.6.0
1.49
$ ls -al ?ython*
-rw-rw-r-- 1 hobs hobs 16378500 Dec 23 13:06 python3.6.tar.9xz
-rw-rw-r-- 1 hobs hobs 16314420 Dec 23 13:05 python3.6.tar.e9xz
-rw-rw-r-- 1 hobs hobs 16805836 Dec 23 12:24 Python-3.6.0.tar.xz
Answered By: hobs

If you would like this to complete faster, using multiple threads, but without slowing down your system while you perform other work, try adding -Tn where n is how many threads you want to use, as well as nice to demote the compression to idle priority.

Model (for 4 threads):

tar c foo/ | nice -n19 xz -9 -T4 > foo.tar.xz

Try watching in top or htop when you do this in a big directory (several GB). You should hopefully see several xz threads with Nice value of 19 (lowest priority).

I’ve also stripped this down be as terse as sensible, such as: the -f - in other answers is simply not needed, since tar‘s default output is stdout.

You can nice the tar process also, but I’ve never found it necessary, as xz always bottlenecks the CPU for the pipeline.

Practical note, I rarely use xz -9 for anything, not so much due to CPU or time, but because of the high memory demands. Take a look at https://catchchallenger.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO#Memory_requirements_on_compression. The xz compressor, like bzip2, but unlike gzip, uses more memory for higher compression factors. Put that together with that xz uses far more memory than any other compressor, you can easily use up 600+ MB of memory. And if you use the -T to enable threaded compression, the memory demands go up even further. Just something to be aware of, like if you’re running some small service on a small VM with 1-2 GB memory, you could inadvertently cause an impact.

Answered By: Joshua Huber

On Mac OS X, an alternate approach to pass in the parameter with tar is to use a --options= flag. For example,

tar Jcvf targetFileName.tar.xz --options='compression-level=9' directoryName
Answered By: Samuel Li

tar --help : -I, --use-compress-program=PROG

tar -I 'xz -9' -cvf foo.tar.xz foo/  
tar -I 'gzip -9' -cvf foo.tar.gz foo/    

also compress with external compressors:

tar -I 'lz4 -9' -cvf foo.tar.lz4 foo/
tar -I 'zstd -19' -cvf foo.tar.zst foo/

decompress external compressors:

tar -I lz4 -xvf foo.tar.lz4  
tar -I zstd -xvf foo.tar.zst  

list archive external compressors:

tar -I lz4 -tvf foo.tar.lz4
tar -I zstd -tvf foo.tar.zst
Answered By: Goran Dragic

In a multicore machine from version v5.2.0 of xz-utils, check:

-T, --threads=NUM   use at most NUM threads; the default is 1; set to 0

If you wish to use the maximum number of cores and maximum compression:

export XZ_DEFAULTS="-9 -T 0 "

Or set -T to the number of cores you wish to use.

Then:

tar cJf target.tar.xz source

Also this may useful in order to choose the compression level:

https://catchchallenger.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO

Answered By: mirix

The maximum compression depends on the capabilities of the equipment on which you want to apply it. Maximum compression results in a diametrical extension of its duration, generating a heavy load of hardware resources. For this reason, it is not recommended to maximize the use of server resources (CPU / RAM / Disk) to not slow down the work of other services running on it. It is worth taking into account the compromise between the degree of compression and its duration/system load.

In my case, I used xz on a laptop (hence I used the maximum hardware capabilities) with maximally selected parameters – CPU threads, mem RAM limit and disk performance. I chose the compression level experimentally and it worked best (for me) with the DictSize = 32 MiB option. 
Below is the command used:

xz -k -8e -M 7000MB -T 8 -v sd-dump-rpi3b+-strech.img

where:

  • -k  – compress
  • -8e – compression level
  • -M  – RAM usage limit (in GB)
  • -T  – number of processor threads used
  • -v  – verbose mode – show progress compressing data

This screenshot may be informative.

I deliberately did not use compression on the fly (no using pipe) because of the limitation of the reading speed from the SD memory on my laptop (max ~28 MB/s). 
I dumped the system image from the SD card to the SSD with the dd command:

sudo dd bs=4M if=/dev/mmcblk0 > ~/Desktop/sd-dump-rpi3b+-strech.img

or optionally, fully using dd syntax:

sudo dd bs=4M if=/dev/mmcblk0 of=~/Desktop/sd-dump-rpi3b+-strech.img

and then compressed it. 
In this way, I bypassed the bottleneck of data transfer speed that is SD card
and I used the maximum: CPU threads, memory RAM and SSD (Read/Write ~540 MB/s).

It is worth considering the fact that the SD card used has a capacity of 32 GB,
the system uses ~3.6 GB on it. 
The card dump weighs ~29 GB before compression and ~1.7 GB after compression. 
The empty card space is ~28.4 GB, which was also compressed with ~3.6 GB of data – mainly binary files. Assuming 3.6 to 1.7 gives a little over 50% compression,
which is a satisfying effect with a compression time of ~15 minutes. 
I deliberately skipped free space compression,
because during this process I noticed a rapid reduction in compression time
from a first calculating ~45 minutes
and increased momentary use of the SSD up to ~266 MB/s (in impulse).

It is worth mentioning that at a high level of compression,
a large number of CPU threads (e.g., 8 threads at -9e for me)
and the amount of RAM not usable properly,
results in a reduction in the number of threads xz
(not to exceed the declared memory usage limit).

Appropriate selection of the amount of RAM memory limit and CPU threads will allow you to maintain adequate performance and fast compression without exhausting hardware resources (CPU and RAM).

This is the hardware I used:

IdeaPad Z580

  • i7-3632QM
  • 2 x 4 GB SODIMM DDR3 Synchronous PC3-12800 (1600 MHz)
  • SSD IRSSDPRS25A120

Software:

  • Debian Stretch (x86_64)
  • Kernel 4.9.0-11-amd64
  • xz (XZ Utils) 5.2.2
  • liblzma 5.2.2

You can find more info about xz by doing man xz.

Answered By: Adam Wądołkowski
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.