Make 2nd column unique and add the respective value of them in 1st columns

I have input as shown below, and I need to keep only unique strings from the second column, where the first column will become the sum of all values for each unique string.

For ex: take the OIA in the 2nd column and add the respective values of OIA from the first column and print before OIA.


1079 OIA
1079 OIA
975  OIA
975  OIA
372  CLN
243  TLN
224  TLN
224  TLN
224  TLN
224  TLN
224  TLN
224  TLN
224  TLN
224  TLN
224  TLN
224  TLN

Expected Output:

4108    OIA
372     CLN
2483    TLN
Asked By: Renga

awk '{m[$2]+=$1}END{for(i in m)print m[i],i}' file

this should work fine for the majority of cases with whitespace-separated columns.

Answered By: DanieleGrassini

Assuming your input is always grouped by the 2nd column values as shown in your sample input:

$ awk '$2!=prev{if (NR>1) print sum, prev; sum=0; prev=$2} {sum+=$1} END{print sum, prev}' file
4108 OIA
372 CLN
2483 TLN

The above will keep almost nothing in memory and so will work for arbitrarily large files and will produce the output in the same order as the input $2 values.

Answered By: Ed Morton

Assuming the data is sorted on the second column, using GNU datamash:

datamash -W groupby 2 sum 1 <file

This reads the input as whitespace-delimited fields, groups the data by the second field and sums the first field for each group.

The output given the data in the question will be tab-delimited:

CLN     372
OIA     4108
TLN     2483

Note that the fields are swapped from what’s expected in the question. To fix this, and also sort the input data on the second field (in case it may not be as neatly sorted as in the question):

sort -b -k 2,2 file | datamash -W groupby 2 sum 1 | awk -v OFS='t' '{ print $2, $1 }'
Answered By: Kusalananda
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.