Make 2nd column unique and add the respective value of them in 1st columns
I have input as shown below, and I need to keep only unique strings from the second column, where the first column will become the sum of all values for each unique string.
For ex: take the OIA
in the 2nd column and add the respective values of OIA from the first column and print before OIA
.
INPUT:
1079 OIA
1079 OIA
975 OIA
975 OIA
372 CLN
243 TLN
224 TLN
224 TLN
224 TLN
224 TLN
224 TLN
224 TLN
224 TLN
224 TLN
224 TLN
224 TLN
Expected Output:
4108 OIA
372 CLN
2483 TLN
awk '{m[$2]+=$1}END{for(i in m)print m[i],i}' file
this should work fine for the majority of cases with whitespace-separated columns.
Assuming your input is always grouped by the 2nd column values as shown in your sample input:
$ awk '$2!=prev{if (NR>1) print sum, prev; sum=0; prev=$2} {sum+=$1} END{print sum, prev}' file
4108 OIA
372 CLN
2483 TLN
The above will keep almost nothing in memory and so will work for arbitrarily large files and will produce the output in the same order as the input $2 values.
Assuming the data is sorted on the second column, using GNU datamash
:
datamash -W groupby 2 sum 1 <file
This reads the input as whitespace-delimited fields, groups the data by the second field and sums the first field for each group.
The output given the data in the question will be tab-delimited:
CLN 372
OIA 4108
TLN 2483
Note that the fields are swapped from what’s expected in the question. To fix this, and also sort the input data on the second field (in case it may not be as neatly sorted as in the question):
sort -b -k 2,2 file | datamash -W groupby 2 sum 1 | awk -v OFS='t' '{ print $2, $1 }'