How to individually process each path from a list of paths output from ripgrep

I’m on Linux Ubuntu 18.04 and 20.04.

Ripgrep (rg) can output a list of paths to files containing matches like this:

# search only .txt files
rg 'my pattern to match' -g '*.txt' -l
# long form
rg 'my pattern to match' --glob '*.txt' --files-with-matches

Output will be:

path/to/file1.txt
path/to/file2.txt
path/to/file3.txt

etc.

I’d like to then run another command on each path, such as tree $(dirname $PATH), to get a list of all files in the directory containing the matching file. How can I do that?

I feel like xargs might be part of the answer maybe? But piping to xargs like this as a start seems to only handle the last-printed file:

rg 'my pattern to match' -g '*.txt' -l | xargs -0 -I {} dirname {}

Note: if you can demo with grep too that might be useful too for those without ripgrep, albeit ripgrep is super easy to install.

References:

  1. ripgrep: print only filenames matching pattern
Asked By: Gabriel Staples

||

On a GNU system, that could be like:

rg -g '*.txt' -l0 'my pattern to match' | # list files NUL-delimited
  xargs -r0 dirname -z -- |               # takes dirnames
  LC_ALL=C sort -zu |                     # remove duplicates
  xargs -r0 tree --

Note that if both dir/file.txt and dir/subdir/file.txt match, you’ll end up running tree on both dir and dir/subdir, so you’ll be seeing the contents of dir/subdir twice.

You had the right idea by using xargs which is the command to convert a string of bytes to a list of arguments to pass to a command, and to use -0 which is the most reliable way to pass arbitrary list of arguments, but:

  • xargs -0 expects the input in a format where the list of arguments are separated by NUL characters (0 bytes)¹. You need the -0 / --null option to rg for it to print the file list in that format.
  • GNU dirname can process more than one argument per invocation, so instead of using -I{}, we just pass them all². We also want -r so as not to invoke dirname at all if the file list is empty, and the (also GNU specific) -z option to dirname for dirname itself to print the directories NUL-delimited.
  • as rg doesn’t add a ./ prefix to each file, it’s import to use the -- option delimiter for commands to which we pass the file list as arguments to avoid problems with leading -s in file names.

In short, for lists whose values can be any sequence of non-NUL bytes such as file paths or arbitrary command arguments, you want to use NUL-delimited records as the interchange format, to pass lists programmatically between tools and only leave human format for the tool that gives feedback to the user (here the tree-like output of tree).


On a non-GNU system, but with the zsh shell, you could do:

files=( ${(0)"(rg -g '*.txt' -l0 'my pattern to match')"} )
typeset -U unique_dirs=( $files:h )
(( $#unique_dirs )) && tree -- $dirs

Or in one go (assuming there’s at least one matching file):

tree -- ${(u)${(0)"$(rg -g '*.txt' -l0 'my pattern to match')"}:h}

The u (for unique) is what replaces typeset -U. The 0 parameter expansion flag is how we tell zsh to split on NULs. Alternatively, we could set IFS=$'' and rely on word splitting (done upon unquoted parameter expansion) with:

IFS=$''
tree -- ${(u)$(rg -g '*.txt' -l0 'my pattern to match'):h}

If you have neither GNU utilities nor zsh, you can always resort to perl:

rg -g '*.txt' -l0 'my pattern to match' |
  perl -MFile::Basename -MList::Util=uniq  -0 -e '
    @dirs = uniq(map {dirname$_} <>);
    exec "tree", "--", @dirs if @dirs'

¹ that’s the only character / byte value that cannot occur in a command argument (as the arguments are passed as NUL-delimited strings in the execve() system call), but it can occur in a byte stream as fed through a pipe, so it’s a simple and obvious way to separate arbitrary arguments there. -0 is a non-standard extension from the GNU implementation of xargs, but it’s now found in many other implementations

² or at least as many as can fit in one invocation, calling dirname several times only if needed.

Answered By: Stéphane Chazelas

UPDATE: NEW FINAL ANSWER:

Note that sort -zu sorts and removes duplicates on a null-separated (-z) list.

rg 'my pattern to match' -0 -g '*.txt' -l 
| sort -zu 
| xargs -0 -I{} -- dirname {} 
| xargs -0 -I{} -- tree {}

OLDER ANSWER DETAILS:

See the comments below this answer. My answer here isn’t as robust as the other answer by @Stéphane Chazelas.

My answer below originally wouldn’t properly handle any filenames with spaces or other whitespace in them, nor would it handle filenames which begin with dash (-). Here is my response comment below:

@StéphaneChazelas, all of your comments make sense. Your answer is more robust. Using --null (-0) with rg and with xargs would for sure be more robust. Using -- would too. I guess I wasn’t too concerned about those things because I’m running this command in a git repo where not one file has spaces in it nor begins with dash (-). As for the multiple dirname & tree calls instead of one call with multiple paths, I was aware of that, but was okay with that too in part because I wanted an answer I could easily expand and add more pipes and commands to w/out drastically changing it.

So, look at both answers. His is technically better, but for my purposes, mine is "good enough" for now and points out that my original example in the question could have worked with super minimal changes. Ex:

# I should have done this (add `-0` to `rg` and add `--` to `xargs`):
rg 'my pattern to match' -0 -g '*.txt' -l | xargs -0 -I {} -- dirname {}

# instead of this:
rg 'my pattern to match' -g '*.txt' -l | xargs -0 -I {} dirname {}

The answer by @Stéphane Chazelas and the comments under my question (including one by the maker of ripgrep himself!) are all useful and helped me figure out the following, which I think is the simplest and best answer because it’s the simplest:

The output path strings from rg are NOT null-terminated strings, so remove -0 from the xargs command (or, conversely, add it to the rg command as well). That’s it! This now works:

# THESE WORK to get the dirnames!
# (`--null`/`-0` are removed from both `rg` and `xargs`)

rg 'my pattern to match' -g '*.txt' -l | xargs -I {} dirname {}
# OR (same thing--remove the space after `-I` is all):
rg 'my pattern to match' -g '*.txt' -l | xargs -I{} dirname {}

OR, you can force the path strings to be null-terminated by adding -0 or --null to the rg command, so this works too:

# ALSO WORKS
# (`--null`/`-0` are ADDED to both `rg` and `xargs`; note that for
# both `rg` and `xargs`, `--null` is the long form of `-0`)

rg 'my pattern to match' -g '*.txt' -l --null | xargs --null -I{} dirname {}

Now, by extension, we can pass all the paths to tree as well like this:

FINAL ANSWER:

rg 'my pattern to match' -0 -g '*.txt' -l 
| xargs -0 -I{} -- dirname {} 
| xargs -0 -I{} -- tree {}

That’s it! I simply needed to either add or subtract -0 or --null from both rg and all xargs calls, to keep them all consistent and expecting the same delineators when parsing the multiple paths.

Adding -0 or --null, however, is better, because then it allows paths with spaces or other whitespace in them, and adding -- as well is good because then it allows paths which begin with dash (-). So, that’s what I’ve done above.

Again though, see the other answer too. It also sorts, removes duplicates, and handles other intricacies.

See also

  1. More of my xargs learning and examples:
    1. How to recursively run dos2unix (or any other command) on your desired directory or path using multiple processes
    2. See xargs examples in my README, here: https://github.com/ElectricRCAircraftGuy/FatFs/tree/main

Keywords: how to use xargs properly; parse grep or ripgrep rg output paths with xargs

Answered By: Gabriel Staples
Categories: Answers Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.