Awk: Comparing passed Bash variables with column values

Question

Awk: Comparing passed Bash variables with column values

Using this example:

#submission,date
"test1","22 April 2024"
"test2","24 April 2024"
"test3","25 March 2024"
"test6","01 April 2023"
"test7","02 April 2022"
"test8","03 April 2021"

I’d like to only print tests in the present month, which as of writing is April 2024. Trying this command:

awk -F, -v date="$(date +'%B %Y')" '/^[^#]/ && $2 ~ /'$date'"$/{print $1}' tests.csv

Prints all of the tests. How are Bash variables supposed to be compared using Awk?

Asked By: T145

||

Source

Answer 1

Regarding How are variables ... – be very clear what you mean by "variables" in your thinking, code, and questions. You are calling awk from a shell, bash. Bash is not awk – they are 2 completely separate tools, each with their own syntax, semantics, scopes and variables.

In your code you have:

awk -v date="$(date +'%B %Y')"

which is populating an awk variable, not a shell variable, with the output of the call to the other Unix tool date.

Just like in C, in awk you get the value of a variable just by using its name, unlike in shell where you have to put a $ in front of the variable’s name to get its value, so in this next part of your code:

$2 ~ /'$date'"$/

you would just use date to get the value of the awk variable date BUT now you have a second problem – you’re using it inside literal regexp delimiters /.../ but you need to construct your regexp from a variable date plus, apparently, the string "$ and so you need a dynamic regexp there, not a literal regexp. Given that, that part of your code should be:

$2 ~ (date ""$")

Given that, your script would be:

$ awk -F, -v date="$(date +'%B %Y')" '/^[^#]/ && $2 ~ (date ""$"){print $1}' tests.csv
"test1"
"test2"

If it were me, though, then I’d do the concatenation to form the regexp once either where date is first initialized:

awk -F, -v date="$(date +'%B %Y')"$" '/^[^#]/ && $2 ~ date{print $1}'

or in the BEGIN section:

awk -F, -v date="$(date +'%B %Y')" 'BEGIN{date=date ""$"} /^[^#]/ && $2 ~ (date ""$"){print $1}'

so it doesn’t happen once per input line as string concatenation is a relatively slow operation.

Another option if you have GNU awk is to store a strongly typed regexp constant instead of a dynamic regexp string in the date variable:

awk -F, -v date="@/$(date +'%B %Y')[^"]+"$/" '/^[^#]/ && $2 ~ date{print $1}' file

or:

awk -F, -v d="$(date +'%B %Y')" 'BEGIN{date=@/x[^"]+"$/; sub(/x/,d,date)} /^[^#]/ && $2 ~ date{print $1}' file

but that second one is getting pretty arcane and a plain old dynamic regexp works fine in your particular case.

If you ever did want to use the value of a shell variable inside an awk script then see how-do-i-use-shell-variables-in-an-awk-script for how to do that.

Answered By: Ed Morton