Pipe find list of files into xargs gzip and pipe again into pigz
I need to find files newer than x days, and then turn it into a gzip, but I want to do it using pigz.
For now I’m now doing it the slow way; this works:
find /path/to/src -type f -mtime -90 | xargs tar -zcf archive.tar.gz
But pigz
is tremendously faster, so I want to run this gzip using pigz instead. I tried this but it isn’t working:
find /path/to/src -type f -mtime -90 | xargs tar -zcf | pigz > archive.tar.gz
It returns an error because I just guessed what to do (and tried a couple ways):
tar (child): /path/to/src: Cannot open: Is a directory
tar (child): Error is not recoverable: exiting now
How to take the first line that works and pipe that into pigz?
Assuming GNU or libarchive’s tar
:
find /path/to/src -type f -mtime -90 -print0 |
tar -cf - --no-recursion --null -T - |
pigz > archive.tar.gz
(--no-recursion
not strictly necessary here as files reported by find
are meant not to be of type directory).
Don’t use xargs
(which anyway can only be used on find
‘s output if you use -0
and find
‘s -print0
) as it could end up running more than one tar
so you’d end up with the archive containing only the last batch.
Here, we’re passing the list of files to tar
directly via a pipe with -T -
so there’s no limit on how many files may be passed that way. That also means tar
can start archiving the files as soon as they’re found.
star
(@schily‘s (RIP) tar
) also has built-in find
functionality:
star cf - -find /path/to/src -type f -mtime -90 |
pigz > archive.tar.gz
Though, you can also take the same approach as for the other two above with this syntax:
find /path/to/src -type f -mtime -90 -print0 |
star cf - -read0 list=- |
pigz > archive.tar.gz
tar
is a very unportable command. Even the tar formats are unportable. X/Open / SUSv2 used to specify a tar
command (and cpio
), but they eventually gave up on it as it was impossible to conciliate the tar
s from different vendors, and instead POSIX / SUS came up with pax
as a replacement for both.
pax
takes the list of files from stdin, but unfortunately, newline delimited instead of NUL delimited which means it can’t archive arbitrary file names, though some pax
implementations support a -0
extension for that (find
‘s -print0
is also not POSIX though can be replaced with -exec printf '%s ' {} +
). So, with those:
find /path/to/src -type f -mtime -90 -print0 |
pax -0w |
pigz > archive.tar.gz
(note that the default output format is undefined per POSIX which is another weakness of pax
. Its worst weakness being its very low adoption).
With GNU tar on any shell that supports process substitution (e.g. bash, ksh, zsh):
tar cf archive.tar.gz -I pigz --null -T <(find /path/to/src -type f -mtime -90 -print0)
This uses pigz
to do the compression, and takes the (NUL-separated) list of files to include in the archive from the output of find ... -print0
, via the -T
or --files-from=FILE
option and process substitution.
Alternatively, if you are using a minimalist POSIX-features-only shell (e.g. ash or dash, or bash running as /bin/sh
or with --posix
or set -o posix
or with the POSIXLY_CORRECT
environment variable set) you can pipe a NUL-separated list of filenames into GNU tar. The -
following the -T
option tells tar to read the file list from stdin.
find /path/to/src -type f -mtime -90 -print0 | tar cf archive.tar.gz -I pigz --null -T -
Either of these work with any valid filename, even those containing spaces, newlines and shell metacharacters. It also avoids the problem of too-many-filenames mentioned by @Kusalananda in his comment.
BTW, you may want to investigate using pixz instead of pigz
. It does xz compression (which generally does much better compression than gzip, but is slower), and pixz will add an index to speed up extraction of specific files if it detects tar-like input. BTW, both pixz
and xz-utils
are packaged for most common Linux distribtions so should be easy to install.