How to individually process each path from a list of paths output from ripgrep
I’m on Linux Ubuntu 18.04 and 20.04.
Ripgrep (rg
) can output a list of paths to files containing matches like this:
# search only .txt files
rg 'my pattern to match' -g '*.txt' -l
# long form
rg 'my pattern to match' --glob '*.txt' --files-with-matches
Output will be:
path/to/file1.txt path/to/file2.txt path/to/file3.txt
etc.
I’d like to then run another command on each path, such as tree $(dirname $PATH)
, to get a list of all files in the directory containing the matching file. How can I do that?
I feel like xargs
might be part of the answer maybe? But piping to xargs
like this as a start seems to only handle the last-printed file:
rg 'my pattern to match' -g '*.txt' -l | xargs -0 -I {} dirname {}
Note: if you can demo with grep
too that might be useful too for those without ripgrep
, albeit ripgrep is super easy to install.
References:
On a GNU system, that could be like:
rg -g '*.txt' -l0 'my pattern to match' | # list files NUL-delimited
xargs -r0 dirname -z -- | # takes dirnames
LC_ALL=C sort -zu | # remove duplicates
xargs -r0 tree --
Note that if both dir/file.txt
and dir/subdir/file.txt
match, you’ll end up running tree
on both dir
and dir/subdir
, so you’ll be seeing the contents of dir/subdir
twice.
You had the right idea by using xargs
which is the command to convert a string of bytes to a list of arguments to pass to a command, and to use -0
which is the most reliable way to pass arbitrary list of arguments, but:
xargs -0
expects the input in a format where the list of arguments are separated by NUL characters (0 bytes)¹. You need the-0
/--null
option torg
for it to print the file list in that format.- GNU
dirname
can process more than one argument per invocation, so instead of using-I{}
, we just pass them all². We also want-r
so as not to invokedirname
at all if the file list is empty, and the (also GNU specific)-z
option todirname
fordirname
itself to print the directories NUL-delimited. - as
rg
doesn’t add a./
prefix to each file, it’s import to use the--
option delimiter for commands to which we pass the file list as arguments to avoid problems with leading-
s in file names.
In short, for lists whose values can be any sequence of non-NUL bytes such as file paths or arbitrary command arguments, you want to use NUL-delimited records as the interchange format, to pass lists programmatically between tools and only leave human format for the tool that gives feedback to the user (here the tree-like output of tree
).
On a non-GNU system, but with the zsh
shell, you could do:
files=( ${(0)"(rg -g '*.txt' -l0 'my pattern to match')"} )
typeset -U unique_dirs=( $files:h )
(( $#unique_dirs )) && tree -- $dirs
Or in one go (assuming there’s at least one matching file):
tree -- ${(u)${(0)"$(rg -g '*.txt' -l0 'my pattern to match')"}:h}
The u
(for u
nique) is what replaces typeset -U
. The 0
parameter expansion flag is how we tell zsh
to split on NULs. Alternatively, we could set IFS=$' '
and rely on word splitting (done upon unquoted parameter expansion) with:
IFS=$' '
tree -- ${(u)$(rg -g '*.txt' -l0 'my pattern to match'):h}
If you have neither GNU utilities nor zsh
, you can always resort to perl
:
rg -g '*.txt' -l0 'my pattern to match' |
perl -MFile::Basename -MList::Util=uniq -0 -e '
@dirs = uniq(map {dirname$_} <>);
exec "tree", "--", @dirs if @dirs'
¹ that’s the only character / byte value that cannot occur in a command argument (as the arguments are passed as NUL-delimited strings in the execve()
system call), but it can occur in a byte stream as fed through a pipe, so it’s a simple and obvious way to separate arbitrary arguments there. -0
is a non-standard extension from the GNU implementation of xargs
, but it’s now found in many other implementations
² or at least as many as can fit in one invocation, calling dirname
several times only if needed.
UPDATE: NEW FINAL ANSWER:
Note that sort -zu
sorts and removes duplicates on a null-separated (-z
) list.
rg 'my pattern to match' -0 -g '*.txt' -l
| sort -zu
| xargs -0 -I{} -- dirname {}
| xargs -0 -I{} -- tree {}
OLDER ANSWER DETAILS:
See the comments below this answer. My answer here isn’t as robust as the other answer by @Stéphane Chazelas.
My answer below originally wouldn’t properly handle any filenames with spaces or other whitespace in them, nor would it handle filenames which begin with dash (-
). Here is my response comment below:
@StéphaneChazelas, all of your comments make sense. Your answer is more robust. Using
--null
(-0
) withrg
and withxargs
would for sure be more robust. Using--
would too. I guess I wasn’t too concerned about those things because I’m running this command in a git repo where not one file has spaces in it nor begins with dash (-
). As for the multipledirname
&tree
calls instead of one call with multiple paths, I was aware of that, but was okay with that too in part because I wanted an answer I could easily expand and add more pipes and commands to w/out drastically changing it.
So, look at both answers. His is technically better, but for my purposes, mine is "good enough" for now and points out that my original example in the question could have worked with super minimal changes. Ex:
# I should have done this (add `-0` to `rg` and add `--` to `xargs`):
rg 'my pattern to match' -0 -g '*.txt' -l | xargs -0 -I {} -- dirname {}
# instead of this:
rg 'my pattern to match' -g '*.txt' -l | xargs -0 -I {} dirname {}
The answer by @Stéphane Chazelas and the comments under my question (including one by the maker of ripgrep himself!) are all useful and helped me figure out the following, which I think is the simplest and best answer because it’s the simplest:
The output path strings from rg
are NOT null-terminated strings, so remove -0
from the xargs
command (or, conversely, add it to the rg
command as well). That’s it! This now works:
# THESE WORK to get the dirnames!
# (`--null`/`-0` are removed from both `rg` and `xargs`)
rg 'my pattern to match' -g '*.txt' -l | xargs -I {} dirname {}
# OR (same thing--remove the space after `-I` is all):
rg 'my pattern to match' -g '*.txt' -l | xargs -I{} dirname {}
OR, you can force the path strings to be null-terminated by adding -0
or --null
to the rg
command, so this works too:
# ALSO WORKS
# (`--null`/`-0` are ADDED to both `rg` and `xargs`; note that for
# both `rg` and `xargs`, `--null` is the long form of `-0`)
rg 'my pattern to match' -g '*.txt' -l --null | xargs --null -I{} dirname {}
Now, by extension, we can pass all the paths to tree
as well like this:
FINAL ANSWER:
rg 'my pattern to match' -0 -g '*.txt' -l
| xargs -0 -I{} -- dirname {}
| xargs -0 -I{} -- tree {}
That’s it! I simply needed to either add or subtract -0
or --null
from both rg
and all xargs
calls, to keep them all consistent and expecting the same delineators when parsing the multiple paths.
Adding -0
or --null
, however, is better, because then it allows paths with spaces or other whitespace in them, and adding --
as well is good because then it allows paths which begin with dash (-
). So, that’s what I’ve done above.
Again though, see the other answer too. It also sorts, removes duplicates, and handles other intricacies.
See also
- More of my
xargs
learning and examples:
Keywords: how to use xargs properly; parse grep or ripgrep rg output paths with xargs