awk get column value from file 2 for file 1 if value matches in 2 columns

I have 2 files, trying to set column value that is taken from file2 into file1 when their columns match

file1:
signup||20200620|A3332|en|LA||ACTIVE
signup||20200620|B4443|en|CA|66001|ACTIVE
signup||20200620|C2221|en|WC||ACTIVE
signup||20200620|D1110|en|LA||ACTIVE
signup||20200620|E5554|en|WC|66003|ACTIVE

file2:
A3332||99001
B4443|66001|99003
D1110|66004|99007
E5554||99004

CRITERIA:
$4 of file1 must match $1 of file2, after matching it should set $7 of file1 equal to $3 of file2


DESIRED:

signup||20200620|A3332|en|LA|99001|ACTIVE
signup||20200620|B4443|en|CA|99003|ACTIVE
signup||20200620|C2221|en|WC||ACTIVE
signup||20200620|D1110|en|LA|99007|ACTIVE
signup||20200620|E5554|en|WC|99004|ACTIVE

what I’m trying

awk 'BEGIN{ FS=OFS="|" } NR==FNR{a[NR]=$1; b[NR]=$3; next} {if (a[FNR] = $4); $7=b[FNR]; print}' file2 file1

What I’m getting

signup||20200620|A3332|en|LA|99001|ACTIVE
signup||20200620|B4443|en|CA|99003|ACTIVE
signup||20200620|C2221|en|WC|99007|ACTIVE
signup||20200620|D1110|en|LA|99004|ACTIVE
signup||20200620|E5554|en|WC||ACTIVE

I’m scratching head if it’s a logical error or something else

Asked By: Sollosa

||

There are two problems. First, you have a typo:

if (a[FNR] = $4)

That will always be true since you’re using =, the assignment operator instead of == for comparison. You need:

if (a[FNR] == $4)

The next problem is that you don’t have the same number of lines in your file. So even if you fix the typo, it won’t work as expected:

$ awk 'BEGIN{ FS=OFS="|" } NR==FNR{a[NR]=$1; b[NR]=$3; next} {if (a[FNR] == $4){$7=b[FNR]}; print}' file2 file1
signup||20200620|A3332|en|LA|99001|ACTIVE
signup||20200620|B4443|en|CA|99003|ACTIVE
signup||20200620|C2221|en|WC||ACTIVE
signup||20200620|D1110|en|LA||ACTIVE
signup||20200620|E5554|en|WC|66003|ACTIVE

The D1110 line isn’t working because D1110 is on line 4 in file but line 3 in file2. So you need a different approach, something like this:

$ awk 'BEGIN{ FS=OFS="|" } NR==FNR{a[$1]=$3; next} {if ($4 in a){$7=a[$4]}; print}' file2 file1
signup||20200620|A3332|en|LA|99001|ACTIVE
signup||20200620|B4443|en|CA|99003|ACTIVE
signup||20200620|C2221|en|WC||ACTIVE
signup||20200620|D1110|en|LA|99007|ACTIVE
signup||20200620|E5554|en|WC|99004|ACTIVE
Answered By: terdon

Perhaps formatting the script will help the logic:

awk -F| '
NR == FNR       {A[$1] = $3; next}
($4 in A)       {$7 = A[$4]}
1' file2 OFS=| file1

signup||20200620|A3332|en|LA|99001|ACTIVE
signup||20200620|B4443|en|CA|99003|ACTIVE
signup||20200620|C2221|en|WC||ACTIVE
signup||20200620|D1110|en|LA|99007|ACTIVE
signup||20200620|E5554|en|WC|99004|ACTIVE

If you still need to match on the second field, then you can combine the fields:

awk -F| '
NR == FNR       {A[$1,$2] = $3; next}
($4,$7) in A    {$7 = A[$4,$7]}
1' file2 OFS=| file1

signup||20200620|A3332|en|LA|99001|ACTIVE
signup||20200620|B4443|en|CA|99003|ACTIVE
signup||20200620|C2221|en|WC||ACTIVE
signup||20200620|D1110|en|LA||ACTIVE
signup||20200620|E5554|en|WC|66003|ACTIVE
Answered By: nezabudka
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.