Just as there is "locate" to "find". Is there a database for a faster "grep"?

locate (or rather, updatedb) is somewhat simple: it takes the output of find for the required paths (usually ‘/’), sorts it, and then compresses it with a front-compression tool (frcode), in which the consecutive common prefixes are replaced by number of repeated characters.

So I’m wondering, what’s stopping anyone from creating something similar for full text search? Say, how about concatenating every file in the system, sorting every line with the format line:filename:linenumber, and doing front-compression? I guess you would end up with a faster grep, with the tradeoff of being outdated until the daily/weekly cron job runs, just like locate.

Maybe locategrep would be overkill for the entire system, but I can see it being useful to speed up a large project which won’t change much for the rest of the day.

Does something like this exists already or is it trivial to implement with some known tools?

Note: I would rather avoid enterprise-like solutions that include features beyond plain-text searching (but I appreciate regex support).

Asked By: Sebastian Carlos

||

Often, GNU grep and BSD competition is just pretty slow.

People like ag (aka the_silver_searcher), rg (aka ripgrep) or ack; they don’t try to build an index of the text, they just search it anew for every query, but in a more efficient manner than grep. I’m using (mostly) rg these days, and it really makes searching the complete Linux source tree quite manageable (a "search every file, even if not a C header" rg FOOBAR takes ~3s when I’ve warmed the filesystem caches; GNU grep takes > 10s).

There’s also full-text search engines (mostly, xapian), which I use as plugins on my IMAP server to speed up full-text searching. That’s the only use case where this has proven to actually make a difference to me.

(Ceterum censeo mandbem esse delendam; our search tools are so fast that taking 30s to rebuild a friggin index of 190 MB of man pages is simply not acceptable; and the idea that gzip is good compressor for really uniform data such as man pages where there’s one compression dictionary that would make these things incredibly small is another annoyance of me. But things are intertwined enough that I can’t be moved to get rid of mandb.)

Answered By: Marcus Müller
Categories: Answers Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.