awk add a column if it doesn't exist

I would like to add a column 3 (description) if it does not exist in a ‘;’ delimited file. example;

#Gene;Transcripts;Description;Group
gene1;G1a,G1b,G1c;gene1 is a good gene;6
gene2;G2a,G2b,G2c;gene2 is a funny gene;3
gene3;G3a,G3b;4
gene4;G4a;gene4 description;5
gene5;G5a,G5b;6

Expected output

#Gene;Transcripts;Description;Group
gene1;G1a,G1b,G1c;gene1 is a good gene;6
gene2;G2a,G2b,G2c;gene2 is a funny gene;3
gene3;G3a,G3b;No description;4
gene4;G4a;gene4 description;5
gene5;G5a,G5b;No description;6
Asked By: alex kiarie

||

A slightly different way of looking at this problem might be "if there are not 4 columns of data then move column 3 to column 4 and set column 3 to No description".

This leads to the following code

awk -F';' 'BEGIN {OFS=";"} NF!=4 {$4=$3; $3="No description"} {print}'
Answered By: Stephen Harris

Using miller:

$ mlr --nidx --fs ';' put 'if (NF != 4) {$4 = $3; $3 = "No description"}' file

This is same approach used in awk answer but written in miller.

If file’s third column is empty as in

Gene;Transcripts;Description;Group
gene1;G1a,G1b,G1c;gene1 is a good gene;6
gene2;G2a,G2b,G2c;gene2 is a funny gene;3
gene3;G3a,G3b;;4
gene4;G4a;gene4 description;5
gene5;G5a,G5b;;6

Then the following command can be used.

# With headers 
$ mlr --csv --fs ';' put 'is_null($Description) {$Description = "No description" }' file 

# Without headers
$ mlr --csv  -N --fs ';' put 'is_null($3) {$3 = "No description"}' file
Answered By: Prabhjot Singh

The awk answer given by @Stephen Harris is the best imho. But since perl is an option too, therse another couple of ways.

If there are just 2 ; ( y/;/;/==2 ) then the third column is missing, so ad the the description No description:

perl -pe's/(;[^;]+$)/;No description$1/ if y/;/;/==2' data

or basically a translation to perl of the @Stephen Harris awk answer:

perl -F'/;/' -spe'$_=join $,,@F[0..1],q^No description^,$F[2] if $#F==2' -- -,=';' data.csv
Answered By: DanieleGrassini

For completeness and since the question is tagged sed as well, here is a not column-based but rather RegEx based sed solution (Notice: this is as good as its RegEx groups are … So, tweak them as needed):

$ sed -E 's/^([^;]+);([^;]+);([0-9]+)$/1;2;No description;3/' file
#Gene;Transcripts;Description;Group
gene1;G1a,G1b,G1c;gene1 is a good gene;6
gene2;G2a,G2b,G2c;gene2 is a funny gene;3
gene3;G3a,G3b;No description;4
gene4;G4a;gene4 description;5
gene5;G5a,G5b;No description;6
Answered By: Raffa
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.