Delete a whole word from a CSV file that is not part of another word using SED

I’ve searched for an answer to this and have come close, but not close enough.
We receive a CSV file that contains the text "NULL" which needs to be replaced by nothing. Example:

  • Input
    12345,George,MCNULLMAN,NULL,green,NULL
    
  • Replacement should result in:
    12345,George,MCNULLMAN,,green,
    

I tried this as a test, but obviously the word boundaries don’t account for commas.

echo "MCNULLMAN,HELLO,NULL,NULL" | sed 's|bNULL/b||g'

Using sed 's|NULL||g' worked great for a while until we received a person’s name that contained NULL in the middle of it. Any suggestions?

Asked By: aRBee

||

When working with tabulated data, I would recommend using awk:

awk 'BEGIN{FS=OFS=","}{for (i=1;i<=NF;i++) if ($i=="NULL") $i=""}1' input.csv

This would set the input and output field separators to ,. It then iterates over all fields of the line, and if they are exactly equal to NULL, replace them with the empty string. The 1 instructs awk to print the line including all modifications (if any).

If it needs to be sed, I would recommend to hard-encode the field separators (while allowing the preceding comma to be the start-of-line instead, and the trailing one the end-of-line):

sed -E 's/(^|,)(NULL)(,|$)/13/g' input.csv 

This makes use of capture groups to record the actual value of the preceding and trailing field separator (which may be the comma or start-of-line/end-of-line depending on which field is modified), and replaces the entire "preceding-separator+field+trailing-separator" combination by just "preceding-separator+trailing-separator".

Please note that this only works for "simple CSV" files where it is guaranteed that the NULL will not be quoted.

Answered By: AdminBee

Using Miller (mlr) to empty each field that is the exact string NULL in the header-less CSV input file:

$ cat file.csv
12345,George,MCNULLMAN,NULL,green,NULL
$ mlr --csv -N put 'for (k,v in $*) { v == "NULL" { $[k] = "" } }' file.csv
12345,George,MCNULLMAN,,green,

This will also work with CSV files that contain complex quoting:

$ cat file.csv
12345,"George
NULL,MacGregor",MCNULLMAN,"NULL,NULL","NULL",green,"""NULL"""
$ mlr --csv -N put 'for (k,v in $*) { v == "NULL" { $[k] = "" } }' file.csv
12345,"George
NULL,MacGregor",MCNULLMAN,"NULL,NULL",,green,"""NULL"""

In the example record above, there is only a single field that contains the string NULL only, and that’s the 4th field (before green). It additionally has a set of unnecessary quotes. (The last filed is "NULL", including the literal quotes, so it is not emptied. The first NULL on the second line is part of the 1st field, which contains a literal newline character. Likewise, the field whose value is NULL,NULL is not touched.)

Answered By: Kusalananda

Using awk:

awk '{sub(/^NULL,/, ",");
gsub(/,NULL,/, ",,"); 
sub(/,NULL$/, ",")}1' file

Using csvsql:

file.csv as simple CSV file.

12345,George,MCNULLMAN,NULL,green,NULL
$ csvsql -H -I --query 'select * from file' file.csv | csvformat -K 1
12345,George,MCNULLMAN,,green,

file.csv with complex quoting.

12345,"George
NULL,MacGregor",MCNULLMAN,"NULL,NULL","NULL",green,"""NULL"""
$ csvsql -H -I --query 'select * from file' file.csv | csvformat -K 1
12345,"George
NULL,MacGregor",MCNULLMAN,"NULL,NULL",,green,"""NULL"""
  • -H for --no-header-row.

  • -I for --no-inference.
    Without this option the command changes 12345 to 12345.0.

  • -K n for --skip-lines. Skips first n lines.

This is used because csvsql -H command adds a header row. - K 1 removes that.

Answered By: Prabhjot Singh

Using Raku (formerly known as Perl_6)

~$ raku -ne '.split(",").map(*.subst: :global, /^NULL$/ ).join(",").put;'  file

OR

~$ raku -ne '.split(",")>>.subst( :global, /^NULL$/ ).join(",").put;'  file

Above is an answer coded in Raku, a member of the Perl family of programming languages. The file is read linewise using the -ne non-autoprinting linewise flags. These flags instruct Raku to run the code that follows the flags over each line.

The input data (i.e. line) loads into Raku’s $_ topic variable. From here the text is $_.split on commas (note above: you can drop the leading $_ and just write .split).

Next the resultant elements are mapped over to apply a substitution to each. Here every match to /^NULL$/ an element that starts-and-ends with NULL and nothing else in between is replaced with nothing (the :global adverb/parameter is superfluous but may prove useful in other situations).

Finally the modified elements are joined together on commas and output.

Sample Input:

12345,George,MCNULLMAN,NULL,green,NULL
12345,George,MCNULLMAN,NULL,green,nail
NULL,George,MCNULLMAN,NULL,green,neal

Sample Output:

12345,George,MCNULLMAN,,green,
12345,George,MCNULLMAN,,green,nail
,George,MCNULLMAN,,green,neal

https://raku.org

Answered By: jubilatious1
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.