How can aggregate lines in CSV based on value of one column

My input is the following sorted CSV file (using : as the field delimiter instead of the usual comma):

version:device
1.0.0:dev1
1.0.0:dev2
1.2.3:dev3
1.3.4:dev4
1.3.4:dev5

I want to aggregate it so that each version is in 1 line:

version:devices
1.0.0:dev1,dev2
1.2.3:dev3
1.3.4:dev4,dev5
Asked By: Gavriel

||

Something like this should do what you want.

awk -F':' 'OFS=""; NR == 1; NR > 1 {x=$1;$1="";a[x]=a[x]$0","}END{for(x in a)print x":"a[x]}' file
Answered By: ralz

try

awk -F: '$1 == before { printf ",%s",$2 ; } 
         $1 != before { printf "%s%s",nl,$0 ; before=$1 ; nl="n" } 
         END {printf nl ; }' 

where

  • -F: tell awk to use : as separator
  • printf do not print newline, unless tol ("n")
  • this can be onelined (e.g.awk -F: '....' file), I split line for readability.

this use two variables to hold new line (nl) and previous $1‘s value.

Answered By: Archemar

is a pretty amazing tool:

mlr --csv --fs : group-by version then nest --ivar , -f device file.csv
version:device
1.0.0:dev1,dev2
1.2.3:dev3
1.3.4:dev4,dev5
Answered By: glenn jackman
awk -F: -v OFS=: '
    NR == 1{print;next} 
    !v {v=$1;d=$2;next} 
    v == $1 {d = d","$2}
    v != $1 {print v,d;v=$1;d=$2}
    END{print v,d}
' file
Answered By: DanieleGrassini

Using GNU datamash to group by the first colon-separated field, collapsing the second field:

$ datamash -t : groupby 1 collapse 2 <file
version:device
1.0.0:dev1,dev2
1.2.3:dev3
1.3.4:dev4,dev5
Answered By: Kusalananda
$ cat tst.awk
BEGIN { FS=OFS=":" }
$1 != prev {
    if ( NR == 1 ) {
        devs = $2 "s"
    }
    else {
        print prev, devs
        devs = $2
    }
    prev = $1
    next
}
{ devs = devs "," $2 }
END { print prev, devs }

$ awk -f tst.awk file
version:devices
1.0.0:dev1,dev2
1.2.3:dev3
1.3.4:dev4,dev5
Answered By: Ed Morton

Yet another solution with csvkit:

csvsql -d ':' --query 
    'SELECT version, GROUP_CONCAT(device) as devices FROM file GROUP BY version' file.txt 
    | csvformat -D ':'
version:devices
1.0.0:dev1,dev2
1.2.3:dev3
1.3.4:dev4,dev5
Answered By: Robert Smith
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.