How to get gzip operating recursively at all depths?

I mean to gzip all files *.vtu, at all depths below a given directory, in bash.
I have such files at depths 1 and 2 below ./.
I managed to do so with

$ gzip -v $(find . -name "*.vtu")

I could also use find ... -exec, and other combinations (see below).

Is there any way of doing it only with a capability of gzip (-r was my candidate)?

I expected

$ gzip -r -v "*.vtu"

where the pattern would not be expanded by the shell but expanded by gzip (and in a way to produce my intended result!), would work for this, but I get gzip: ...: No such file or directory with all combinations I tried.
What I found is the following:

  1. With shopt -s globstar (from here), the command gzip -v **/*.vtu seems to do exactly what I want.
  2. If shopt | grep globstar gives globstar off, the command above does not work. In this case, I can use gzip -v */*.vtu, but it only works with files at depth=1. Likewise with gzip -v */*/*.vtu at depth=2.

In any case, I didn’t find what is the effect/usefulness of flag -r.

Related:

  1. gzip all files with specific extensions
  2. https://stackoverflow.com/questions/10363921/how-to-gzip-all-files-in-all-sub-directories-in-bash

No, gzip can’t do this, -r just means "descend into subdirectories" but there is no option for "descend into subdirectories and then look for files matching this glob". The expansion of the *.vtu glob happens before grep is launched, and it is handled by the shell not grep, so grep is given a specific list of files: those files matching *.vtu in the current directory.

So yes, globstar is your best bet. As for the use of -r, that is explained in man gzip:

-r --recursive
       Travel the directory structure recursively.  If any of the file
       names  specified on the command line are directories, gzip will
       descend into the directory and compress all the files it  finds
       there (or decompress them in the case of gunzip ).

So gzip -r foo means "descend into foo if foo is a directory and gzip any files in it". If foo matches both files and directories, if for example you had both file.vtu and my.vtu/ in the directory you ran gzip in, then the contents of my.vtu would also be compressed. Without it, you would get my.vtu is a directory -- ignored.

Other options include:

  • find . -name "*.vtu" -exec gzip {} + to compress all matching files.
  • gzip **/*.vtu with globstar set.
  • find . -name "*.vtu" | xargs gzip (as long as your names are sane and don’t contain newlines)
  • find . -name "*.vtu" -print0 | xargs -0 gzip (if your file names can contain newlines)
Answered By: terdon

After the answer by terdon, and upon tinkering a bit, I came to the conclusion that the way -r works is the following:

  1. If what is matched is a file (only in the present directory) do gzip.
  2. If what is matched is a directory, enter that directory, and down there execute gzip -r *.

For me, this is extremely weird (and therefore I would have never imagined this is how it works).
For instance, if in ./ I have

foo
foo.vtk
test.vtk/
test.vtk/another.vtk/
test.vtk/another.vtk/cake.vtk
test.vtk/another.vtk/dow.txt
test.vtk/cake.vtk
test.vtk/dow.txt
test.vtk/this/
test.vtk/this/cake.vtk
test.vtk/this/dow.txt

command gzip -r -v *.vtk would gzip all files except ./foo.
All files (not only *.vtk), in all subdirectories *.vtk (with depth=1) and * (with depth>1) would be gzipped.

Not an exact answer to your question, but you can use xargs for that, which allows you to run multiple gzip processes in parallel, like

find -name '*.vtk' -print0 | xargs -r0n1 -P$(nproc) gzip
  • look for files
    • matching *.vtk, quoted so it is not expanded by the shell
    • print file names separated by NUL bytes (to have an unambiguous separator)
  • give the list of files to xargs
    • do not run if the list is empty (-r) because gzip would then use stdin
    • use NUL as separator (-0)
    • use one file name per gzip invocation (-n1)
    • run as many processes in parallel (-P) as the output of the nproc command says we have CPUs
    • run the gzip command for each input
Answered By: Simon Richter
Categories: Answers Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.