Convert glob to `find`

I have again and again had this problem: I have a glob, that matches exactly the correct files, but causes Command line too long. Every time I have converted it to some combination of find and grep that works for the particular situation, but which is not 100% equivalent.

For example:

./foo*bar/quux[A-Z]{.bak,}/pic[0-9][0-9][0-9][0-9]?.jpg

Is there a tool for converting globs into find expressions that I am not aware of? Or is there an option for find to match the glob without matching a the same glob in a subdir (e.g. foo/*.jpg is not allowed to match bar/foo/*.jpg)?

Asked By: Ole Tange

||

You could write a regex for find matching your requirements:

find . -regextype egrep -regex './foo[^/]*bar/quux[A-Z](.bak)?/pic[0-9][0-9][0-9][0-9][^/]?.jpg'
Answered By: sebasth

If the problem is that you get an argument-list-is-too-long error, use a loop, or a shell built-in. While command glob-that-matches-too-much can error out, for f in glob-that-matches-too-much does not, so you can just do:

for f in foo*bar/quux[A-Z]{.bak,}/pic[0-9][0-9][0-9][0-9]?.jpg
do
    something "$f"
done

The loop might be excruciatingly slow, but it should work.

Or:

printf "%s" foo*bar/quux[A-Z]{.bak,}/pic[0-9][0-9][0-9][0-9]?.jpg |
  xargs -r0 something

(printf being builtin in most shells, the above works around the limitation of the execve() system call)

$ cat /usr/share/**/* > /dev/null
zsh: argument list too long: cat
$ printf "%sn" /usr/share/**/* | wc -l
165606

Also works with bash. I’m not sure exactly where this is documented though.


Both Vim’s glob2regpat() and Python’s fnmatch.translate() can convert globs to regexes, but both also use .* for *, matching across /.

Answered By: muru

find (for the -name/-path standard predicates) uses wildcard patterns just like globs (note that {a,b} is not a glob operator; after expansion, you get two globs). The main difference is the handling of slashes (and dot files and dirs not being treated specially in find). * in globs won’t span several directories. */*/* will cause up to 2 levels of directories to be listed. Adding a -path './*/*/*' will match any files that are at least 3 levels deep and won’t stop find from listing the contents of any directory at any depth.

For that particular

./foo*bar/quux[A-Z]{.bak,}/pic[0-9][0-9][0-9][0-9]?.jpg

couple of globs, it’s easy to translate, you’re wanting directories at depth 3, so you can use:

find . -mindepth 3 -maxdepth 3 
       ( -path './foo*bar/quux[A-Z].bak/pic[0-9][0-9][0-9][0-9]?.jpg' -o 
          -path './foo*bar/quux[A-Z]/pic[0-9][0-9][0-9][0-9]?.jpg' ) 
       -exec cmd {} +

Or POSIXly:

find . -path './*/*/*' -prune 
       ( -path './foo*bar/quux[A-Z].bak/pic[0-9][0-9][0-9][0-9]?.jpg' -o 
          -path './foo*bar/quux[A-Z]/pic[0-9][0-9][0-9][0-9]?.jpg' ) 
       -exec cmd {} +

Which would guarantee that those * and ? could not match / characters.

(find, contrary to globs would read the content of directories other than foo*bar ones in the current directory¹, and not sort the list of files. But if we leave aside the problem that what is matched by [A-Z] or the behaviour of */? with regards to invalid characters is unspecified, you’d get the same list of files).

But in any case, as @muru has shown, there’s no need to resort to find if it’s just for splitting the list of files into several runs to work around the limit of the execve() system call. Some shells like zsh (with zargs) or ksh93 (with command -x) even have builtin support for that.

With zsh (whose globs also have the equivalent of -type f and most other find predicates), for instance:

autoload zargs # if not already in ~/.zshrc
zargs ./foo*bar/quux[A-Z](|.bak)/pic[0-9][0-9][0-9][0-9]?.jpg(.) -- cmd

((|.bak) is a glob operator contrary to {,.bak}, the (.) glob qualifier is the equivalent of find‘s -type f, add oN in there to skip the sorting like with find, D to include dot-files (doesn’t apply to this glob))


¹ For find to crawl the directory tree like globs would, you’d need something like:

find . ! -name . ( 
  ( -path './*/*' -o -name 'foo*bar' -o -prune ) 
  -path './*/*/*' -prune -name 'pic[0-9][0-9][0-9][0-9]?.jpg' -exec cmd {} + -o 
  ( ! -path './*/*' -o -name 'quux[A-Z]' -o -name 'quux[A-Z].bak' -o -prune ) )

That is prune all directories at level 1 except the foo*bar ones, and all at level 2 except the quux[A-Z] or quux[A-Z].bak ones, and then select the pic... ones at level 3 (and prune all directories at that level).

Answered By: Stéphane Chazelas

Generalising on the note on my other answer, as a more direct answer to your question, you could use this POSIX sh script to convert the glob to a find expression:

#! /bin/sh -
glob=${1#./}
shift
n=$#
p='./*'

while true; do
  case $glob in
    (*/*)
      set -- "$@" ( ! -path "$p" -o -path "$p/*" -o -name "${glob%%/*}" -o -prune )
      glob=${glob#*/} p=$p/*;;
    (*)
      set -- "$@" -path "$p" -prune -name "$glob"
      while [ "$n" -gt 0 ]; do
        set -- "$@" "$1"
        shift
        n=$((n - 1))
      done
      break;;
  esac
done
find . "$@"

To be used with one standard sh glob (so not the two globs of your example which uses brace expansion):

glob2find './foo*bar/quux[A-Z].bak/pic[0-9][0-9][0-9][0-9]?.jpg' 
  -type f -exec cmd {} +

(that doesn’t ignore dot-files or dot-dirs except . and .. and doesn’t sort the list of files).

That one only works with globs relative to the current directory, with no . or .. components. With some effort, you could extend it to any glob, more than a glob… That could also be optimised so that glob2find 'dir/*' doesn’t look for dir the same was as it would for a pattern.

Answered By: Stéphane Chazelas
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.