Retrieve large number of files from remote server with wildcards

I’m trying to download a large number of files from a remote server. Part of the path is known, but there’s a folder name that’s randomly generated, so I think I have to use wildcards. The path is something like this:

/home/myuser/files/<random folder name>/*.ext

So was trying this:

rsync -av myuser@myserver.com:~/files/**/*.ext ./

This is giving me following error:

bash: /usr/bin/rsync: Argument list too long

I also tried scp instead of rsync but got the same error. It seems bash interprets the wildcard as the full list of files.

What’s the right way to achieve this?

Instead of letting the remote shell expand a glob that results in a too long list of arguments, use --include and --exclude filters to transfer only the files that you want:

rsync -aim --include='*/' --include='*.ext' --exclude='*' 
    myuser@myserver.com:files ./

This would give you a directory called files in the current directory. Beneath it, you will find the parts of the remote directory structure that contain the .ext files, including the files themselves. Directories without .ext files would not appear on the target side as we use -m (--prune-empty-dirs).

With the --include and --exclude filters, we include any directory (needed for recursion) and any name matching *.ext. We then exclude everything else. These filters work on a "first match wins" basis, which means the --exclude='*' filter must be last. The rsync utility evaluates the filters as it traverses the source directory structure.

If you then want to move all the synced files into the current directory (ignoring the possibility of name clashes), you could use find like so:

find files -type f -name '*.ext' -exec mv {} . ;

This looks for regular files in or beneath files, whose names match the pattern *.ext. Matching files are moved to the current directory using mv.

Answered By: Kusalananda
Categories: Answers Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.