Unable to grep foreign language in shell script
I am a newbie in shell scripting, I have a text which contain text in following format:-
"some foreign language",'corresponding ID to text'
for example:-
"Назад",IDC_SSB_DLG_BACK_BTN
I need to find the text related to ID and save in in text file.
Here my sample script:-
#!/bin/sh
target_file=$1
output=$2
translationID=IDC_SSB_DLG_BACK_BTN
translation=$(cat $target_file | grep $translationID)
translationValue=$(echo "$translation" | awk -F',' '{print $1}')
translationValueFinal=$(echo "$translationValue" | tr -d '"')
echo "$translationValueFinal" >> $output
while running this script I am getting error :-grep: (standard input): binary file matches
Please suggest a way to grep and save a foriegn language in shell script. Thanks
If you use the GNU grep, you can tell grep to treat the input as text no matter what characters it encounters.
grep -a
But it seems there are some non-textual bytes in the input, so better check the input file.
Don’t use grep
and a bunch of extra code for this as you want to do a literal string match on a specific field, which grep cannot do on it’s own, and the tools that can do it don’t require help from other tools.
Your existing command:
translationID=IDC_SSB_DLG_BACK_BTN
grep $translationID
even if we added the missing "
s to make it grep "$translationID"
would fail if any of these conditions were true:
- The string in the first field matched the id, e.g.
IDC_SSB_DLG_BACK_BTN,any
, or - The string in either field contained a different string which that ID was a substring of, e.g.
any,FOOIDC_SSB_DLG_BACK_BTNBAR
orFOOIDC_SSB_DLG_BACK_BTNBAR,any
. - The string in the second field and the ID variable contained a regexp metachar, e.g.
any,foo.bar
andany,foodbar
would both matchtranslationID=foo.bar
.
and probably others. See how-do-i-find-the-text-that-matches-a-pattern for more info on some of these types of issue.
Using this input file, for example:
$ cat file
any1,foodbar
foo.bar,any2
foofoo.barbar,any3
any4,foofoo.barbar
"Назад",foo.bar
where we want to print the value from the first field when the second field is the string foo.bar
(i.e. just the last line above):
$ translationID=foo.bar
here’s your grep
command finding the expected line but also making many false matches and so outputting undesirable lines:
$ grep "$translationID" file
any1,foodbar
foo.bar,any2
foofoo.barbar,any3
any4,foofoo.barbar
"Назад",foo.bar
vs this awk
command only matching the correct line (as well as outputting just the desired field):
$ awk -F',' -v id="$translationID" '$2==id{print $1}' file
"Назад"
or if you want the quotes removed there are lots of options including:
$ awk -F'[,"]+' -v id="$translationID" '$3==id{print $2}' file
Назад
That awk command is doing a full-field literal* string comparison of just the target field so it will be accurate whereas your grep command is doing a partial-line regexp comparison which will fail some time unless you’re lucky with your input values.
*slight caveat – if translationID
contains backslashes that you want treated literally then you need to do:
$ id="$translationID" awk -F',' '$2==ENVIRON["id"]{print $1}' file
"Назад"
or similar instead, see how-do-i-use-shell-variables-in-an-awk-script.
If your input file can contain NUL chars then use GNU awk or some other awk that documents they support that since awk is a text processing tool and so is only required to work with text files as input and per the POSIX definition a text file cannot contain UL chars, and with GNU awk you MAY need to set BINMODE, e.g.:
awk -v BINMODE=3 -F',' -v id="$translationID" '$2==id{print $1}' file