# Make 2nd column unique and add the respective value of them in 1st columns

I have input as shown below, and I need to keep only unique strings from the second column, where the first column will become the sum of all values for each unique string.

For ex: take the `OIA` in the 2nd column and add the respective values of OIA from the first column and print before `OIA`.

INPUT:

``````1079 OIA
1079 OIA
975  OIA
975  OIA
372  CLN
243  TLN
224  TLN
224  TLN
224  TLN
224  TLN
224  TLN
224  TLN
224  TLN
224  TLN
224  TLN
224  TLN
``````

Expected Output:

``````4108    OIA
372     CLN
2483    TLN
``````
``````awk '{m[\$2]+=\$1}END{for(i in m)print m[i],i}' file
``````

this should work fine for the majority of cases with whitespace-separated columns.

Assuming your input is always grouped by the 2nd column values as shown in your sample input:

``````\$ awk '\$2!=prev{if (NR>1) print sum, prev; sum=0; prev=\$2} {sum+=\$1} END{print sum, prev}' file
4108 OIA
372 CLN
2483 TLN
``````

The above will keep almost nothing in memory and so will work for arbitrarily large files and will produce the output in the same order as the input \$2 values.

Assuming the data is sorted on the second column, using GNU `datamash`:

``````datamash -W groupby 2 sum 1 <file
``````

This reads the input as whitespace-delimited fields, groups the data by the second field and sums the first field for each group.

The output given the data in the question will be tab-delimited:

``````CLN     372
OIA     4108
TLN     2483
``````

Note that the fields are swapped from what’s expected in the question. To fix this, and also sort the input data on the second field (in case it may not be as neatly sorted as in the question):

``````sort -b -k 2,2 file | datamash -W groupby 2 sum 1 | awk -v OFS='t' '{ print \$2, \$1 }'
``````