Using grep to search for line that begins with a variable whose value is a dollar sign

I found a lot of very similar problems, but not exactly this.
I have a text file with the following contents (no repeats, fixed 4 characters per line):


I’m trying to use grep to search for a line that begins with the character $char that’s been input by using $char to define the search terms, and assign the four numbers after it to $numbr. Here’s what it looks like:

numbr=`grep ^"${char}" file.txt | cut -c 2,3,4,5`

This works for any character I need, except for dollar signs, which leave $numbr empty.

The input character $char is defined beforehand as follows, if that helps:

char="`dd if=text.txt ibs=1 skip=$skipcount count=1`"

($skipcount is an integer)

I tried with and without -E flag, and every way of escaping the value of $char I could find. I don’t need to use grep specifically, I just had the most success with it so far.

I’ve been stuck on this for quite a while, so any help would be greatly appreciated. Apologies if I managed to duplicate a post, and thanks to all who contribute here, I’ve found a solution to almost every problem within a minute or two.

Edit: Sorry to the guy whose comment I accidentally deleted. The gist was he suggested echo and pipe, something like this (I assumed this was partial and I needed to fill in the rest):

echo "${char}" | grep '^$'

Wasn’t working for me. I also wasn’t clear enough – in each string in the .txt (ex. A1234), only the first character is what’s assigned to $char. The 4 numbers after $char is what I need assigned to $numbr, and that string can occur at any line in the .txt file. There will be no other bytes on each line.

Asked By: Aylox


$ has a special meaning in RegEx (end of the line) … So, to match it litterlally, it needs to be stripped of that special meaning by e.g. escaping it with a backslash … To do that dynamically, you can use Bash’s parameter expansion like ${char/$/\$}:

grep -Po "^${char/$/\$}K[[:digit:]]+" file.txt

… where:

  • grep‘s options -Po will enable Perl style regular expressions (needed for K) and only print the matching capturing group(s).

  • ^ will match the beginning of a line.

  • ${char/$/\$} inside the double quotes " ... " will be expanded by the shell (Bash) to the variable char replacing the first occurrence (from the left) of $ (if it’s found) with $ i.e. passing it to the command line with an escape character before it so that it gets matched literally later by grep.1

  • Perl’s K will exclude the matched part on its left-side from printing (will reset the match at that point) that’s so only the right-side match of it i.e. [[:digit:]]+ will be printed … Still the whole expression left-side and right-side will be evaluated and have to match in an input line in that sequence.

  • [[:digit:]]+ will match a number [[:digit:]] (any kind UNICODE included) at least once +.

1) For other shells that do not support Bash’s ${var/find/replace} kind of parameter expansion, you can use normal parameter expansion inside square brackets like [${char}] as when that parameter expands inside [], the resulting character e.g. $ will be treated litteral.

… and use it in your variable assignment like so:

numbr=$(grep -Po "^${char/$/\$}K[[:digit:]]+" file.txt)

Notice the old notation for command substitution i.e. backticks "`…`" is a legacy compatibility feature now and is deprecated in favor of the current command substitution notation $(...) … So, use the latter.

Other none grep solutions (for more portability as grep‘s -P option might not be present/supported in all implementation) include:

With awk:

awk -F"${char}" '$2~"^[[:digit:]]+$" {print $2}' file.txt

… where the field separator is set to ${char} and then if the second field is all numbers $2~"^[[:digit:]]+$", then print it print $2.

With sed:

sed -nE "s/^([${char}])([[:digit:]]+)$/2/p" file.txt

… where -nE will default to no printing and enable Extended regular expressions to handle e.g. [] and capture groups () and then the double quotes "..." around the script string will allow for parameter expansion by the shell so that ${char} get expanded to its value and if the regular expression matches in a line then the matched numbers get assigned to the second capturing group i.e. ([[:digit:]]+) and called by its reference number 2 to substitute the whole match and then gets printed with the command p.

With perl:

export char; perl -lne 'print $1 if /^Q$ENV{char}E(d+)$/' file.txt

… where -n defaults to no printing and print $1 will print the first capturing group in (...) i.e. (d+) (short for ([[:digit:]]+)) if the RegEx match.

Notice the export char in order to call that variable as an environment variable from the Perl script between Q and E and that’s to properly handle the RegEx when char expands to $ and $] might not work … Otherwise, something like the following should work if $ isn’t part of the RegEx:

perl -lne "/(?<=^[${char}])([[:digit:]]+)$/ and print $&" file.txt

… where -n defaults to no printing and the double quotes "..." will allow shell parameter expansion to happen and the lookbehind (?<=...) will match but in a none-capturing (none printing) group and print $& will print the match from capturing group(s).

Answered By: Raffa

Some people, when confronted with a problem, think "I know, I’ll use regular expressions." Now they have two problems.1

In your case, the two problems are:

  1. the $ character is special in some contexts in your shell

  2. the $ character is special in some contexts in regular expressions

In the case of perl, there’s a third problem – again that $ is special in some contexts – which is why the trick of wrapping the shell variable expansion in brackets [${char}] works in a double-quoted sed expression, but not in a similar double-quoted perl expression (since the latter results in perl expanding $] as the revision, version, and subversion of the Perl interpreter).

So you want your shell to expand ${char} (or $char) to its value $ but for neither the shell nor the tool you’re using to further expand $. The comprehensive answer by @Raffa shows you some ways to achieve that.

One quirk of GNU grep is that in Basic Regular Expression mode (i.e. without either the -E or -P command line switch), the $ end-of-line anchor is only special when it occurs at the end of an expression. So whereas ^$ matches only empty lines, ^$[[:digit:]]{4} or even ^$. will match $ literally. So given your example file.txt, either

$ grep "^${char}[[:digit:]]{4}" file.txt | cut -c 2-

or more simply

$ grep ^"${char}". file.txt | cut -c 2-

would have given you the desired output. However, since you don’t appear to need to check that the string after the $ is actually 4 digits, you could equally well use cut alone2:

$ cut -sd "${char}" -f2 file.txt

This avoids both problems by doing away with regular expressions altogether and treating the task as a simple string splitting one. Similarly with awk3:

$ awk -F "${char}" 'NF>1 {print $2}' file.txt

Note that neither of these anchors the $ match to the start of the line – if you need to do that, then a non-regex way to do so in awk might be

awk 'index($0,ENVIRON["char"]) == 1 {print substr($0,2)}' file.txt

where the ENVIRON array similar to perl’s ENV hash requires you to export $char, but allows you to single-quote the expression – thus avoiding "problem #1" altogether.

  1. see What is meant by "Now you have two problems"?

  2. this assumes the GNU implementation of cut, with its -s, --only-delimited command line option

  3. a single character field separator is not treated as a regular expression in awk

Answered By: steeldriver