Usage of awk to search for string that contains characters

I am facing an issue of searching through lines of text in a text file.

Currently, I am using this command

check=`awk -F : -v "title=$title" 'tolower($1) ~ tolower(title)' test.txt`

It works fine when the strings are pure alphabetic characters. Assuming the text file contains 3 lines of string which are

C++ Programming in 21 Days
C## Programming in 21 Days
C Programming in 21 Days

When I do a partial search for just a letter C, 3 of the results will be displayed, which is what I want, however, if I keyed in C++ P, my program will return text not found. And if i key in C++, all 3 results will be displayed as well.

But the funny thing is, if I search for C## P, my program will return C## Programming in 21 Days found.

I can’t seem to find out what is causing this error, please help.

Asked By: Zac

||

The “+” is getting treated as a regular expression.

$ title="C++ P"
$ awk -F: -v "title=$title" 'tolower($1) ~ tolower(title)' test.txt
C Programming in 21 Days
$ title="C.. P"
$ awk -F: -v "title=$title" 'tolower($1) ~ tolower(title)' test.txt
C++ Programming in 21 Days
C## Programming in 21 Days

If you’re only interested in matching the start, you could use

$ awk -F: -v "title=$title" 'tolower(substr($0,0,length(title))) == tolower(title)' test.txt

Or to match anywhere within the line

$ title="C"
$ awk -F: -v "title=$title" 'index(tolower($0),tolower(title))' test.txt
C++ Programming in 21 Days
C## Programming in 21 Days
C Programming in 21 Days
$ title="C++ P"
$ awk -F: -v "title=$title" 'index(tolower($0),tolower(title))' test.txt
C++ Programming in 21 Days
$ title="C## P"
$ awk -F: -v "title=$title" 'index(tolower($0),tolower(title))' test.txt
C## Programming in 21 Days
Answered By: steve

tolower(title) is handled as regular expression:

  • C++ matches the character C literally (case sensitive)

    • Quantifier: ++ Between one and unlimited times, as many times as possible, without giving back [possessive]
  • C matches the character C literally (case sensitive)

  • C## matches the characters C## literally (case sensitive)

To get the right result for C++ you need the pattern C++


Examples

% title="C++"                                                
% awk -F : -v "title=$title" 'tolower($1) ~ tolower(title)' foo
C++ Programming in 21 Days

or shorter

% awk '/[Cc]++/' foo 
C++ Programming in 21 Days

% awk '/[Cc]##/' foo  
C## Programming in 21 Days

% awk '/[Cc] /' foo
C Programming in 21 Days

or with an external variable

% title='C## P'
% awk '/'"$title"'/' foo   
C## Programming in 21 Days

% title='C++ P'                        
% awk '/'"$title"'/' foo 
C++ Programming in 21 Days

% title='C++ P'
% check=$(awk '/'"$title"'/' foo) 
% echo $check
C++ Programming in 21 Days

and so on

Answered By: A.B.
Categories: Answers Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.