Unable to grep foreign language in shell script

I am a newbie in shell scripting, I have a text which contain text in following format:-

"some foreign language",'corresponding ID to text'

for example:-

"Назад",IDC_SSB_DLG_BACK_BTN

I need to find the text related to ID and save in in text file.

Here my sample script:-

#!/bin/sh
target_file=$1
output=$2
translationID=IDC_SSB_DLG_BACK_BTN
translation=$(cat $target_file | grep $translationID)
translationValue=$(echo "$translation" | awk -F',' '{print $1}')
translationValueFinal=$(echo "$translationValue" | tr -d '"')
echo "$translationValueFinal" >> $output

while running this script I am getting error :-grep: (standard input): binary file matches

Please suggest a way to grep and save a foriegn language in shell script. Thanks

Asked By: tabish

||

If you use the GNU grep, you can tell grep to treat the input as text no matter what characters it encounters.

grep -a

But it seems there are some non-textual bytes in the input, so better check the input file.

Answered By: choroba

Don’t use grep and a bunch of extra code for this as you want to do a literal string match on a specific field, which grep cannot do on it’s own, and the tools that can do it don’t require help from other tools.

Your existing command:

translationID=IDC_SSB_DLG_BACK_BTN
grep $translationID

even if we added the missing "s to make it grep "$translationID" would fail if any of these conditions were true:

  1. The string in the first field matched the id, e.g. IDC_SSB_DLG_BACK_BTN,any, or
  2. The string in either field contained a different string which that ID was a substring of, e.g. any,FOOIDC_SSB_DLG_BACK_BTNBAR or FOOIDC_SSB_DLG_BACK_BTNBAR,any.
  3. The string in the second field and the ID variable contained a regexp metachar, e.g. any,foo.bar and any,foodbar would both match translationID=foo.bar.

and probably others. See how-do-i-find-the-text-that-matches-a-pattern for more info on some of these types of issue.

Using this input file, for example:

$ cat file
any1,foodbar
foo.bar,any2
foofoo.barbar,any3
any4,foofoo.barbar
"Назад",foo.bar

where we want to print the value from the first field when the second field is the string foo.bar (i.e. just the last line above):

$ translationID=foo.bar

here’s your grep command finding the expected line but also making many false matches and so outputting undesirable lines:

$ grep "$translationID" file
any1,foodbar
foo.bar,any2
foofoo.barbar,any3
any4,foofoo.barbar
"Назад",foo.bar

vs this awk command only matching the correct line (as well as outputting just the desired field):

$ awk -F',' -v id="$translationID" '$2==id{print $1}' file
"Назад"

or if you want the quotes removed there are lots of options including:

$ awk -F'[,"]+' -v id="$translationID" '$3==id{print $2}' file
Назад

That awk command is doing a full-field literal* string comparison of just the target field so it will be accurate whereas your grep command is doing a partial-line regexp comparison which will fail some time unless you’re lucky with your input values.

*slight caveat – if translationID contains backslashes that you want treated literally then you need to do:

$ id="$translationID" awk -F',' '$2==ENVIRON["id"]{print $1}' file
"Назад"

or similar instead, see how-do-i-use-shell-variables-in-an-awk-script.

If your input file can contain NUL chars then use GNU awk or some other awk that documents they support that since awk is a text processing tool and so is only required to work with text files as input and per the POSIX definition a text file cannot contain UL chars, and with GNU awk you MAY need to set BINMODE, e.g.:

awk -v BINMODE=3 -F',' -v id="$translationID" '$2==id{print $1}' file
Answered By: Ed Morton
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.