How can I blank the nth to mth field using the awk command?

I would like to solve the problem below using AWK.

If any other solutions are possible using languages such as sed or
Perl, that would also be much appreciated.

Below is the input:

U,N,UNIX,000,A,5
N,P,SHELL,111,B,6
I,M,UNIX,222,C,7
X,Y,BASH,333,D,8
P,R,SCRIPT,444,E,9

I want the output as below:

U,N,,,A,5
N,P,,,B,6
I,M,,,C,7
X,Y,,,D,8
P,R,,,E,9

Please also note that: the total number of fields per line is
unknown to me. I only know that fields 3 and 4 are to be blanked.

Asked By: PriB

||
</path/to/in_file awk -v 'FS=,' -v 'OFS=,' '{$3=$4=""; print}'

Explanation

  • </path/to/in_file: read file to standard in.
  • -v 'FS=,' -v 'OFS=,': set file separators and output file separator to ,.
  • '{$3=$4=""; print}': set 3rd and 4th fields to blank, then print entire line (shorted form courtesy of jasonwryan).
Answered By: Sparhawk
sed 's/([^,]*,){2}/,,/2' <in >out

U,N,,,A,5
N,P,,,B,6
I,M,,,C,7
X,Y,,,D,8
P,R,,,E,9

That replaces the second occurrence of a group of two consecutive comma-delimited fields with two commas.

You could also do it like:

sed 's/[^,]*//4;s///3' <in >out

…which replaces the the 4th and 3rd occurrence of a sequence of any num not-comma characters with nothing.

To do it as @Wildcard did – with a scalable loop:

sed -e:t -e'/n{2}/!s/(n*)[^,]*./n1/3;/n$/!tt' -e's///;y/n/,/'

…or…

sed -e:t -e's/n$//;s/n/&/2;to'  
    -e's/(n*)[^,]*./1n/3;tt' 
    -e:o -ey/\n/,/

…where 3 is the field number you would start blanking, , is the delimeter, and 2 is the number of fields you would blank all told.

either way you write it…

sed "$script" <<""
U
N,P
I,M,UNIX
X,Y,BASH,333
P,R,SCRIPT,444,E,9

U
N,P
I,M,
X,Y,,
P,R,,,E,9

…though you may need to use a literal newline in place of n in … /1n/3.

Answered By: mikeserv

To scalably blank all fields from the nth to the mth in an awk command, you shouldn’t hardcode the values; you should use a “for” loop:

awk 'BEGIN { FS = ","; OFS = ","} {for (i = 3; i <= 4; i++) { $i = "" }; print}' inputfile

If you want to blank out a different range, adjust the values “3” and “4” in the above code.


Explanation:

The BEGIN { ... } block is processed before looking at any of the lines of the file.

OFS sets the output field separator, and FS sets the field separator for input. We want them both to be commas.

The for loop is just like C syntax. In this case it performs the following { code block } for i as 3 and as 4.

The $i deserves mention because it is entirely unlike shell syntax. In shell scripting, the name of a variable must be prefixed with $ to expand to the value of the variable. Not so in awk. In awk, i by itself expands to its value—3 or 4 in this case—and the $ followed by a number means the field in that numbered position. So $i = "" sets the ith field to a blank string.

Then the print command, given without arguments, defaults to printing the entire line. Actually it takes all the fields of the line as delimited by FS, and as modified by any previous commands, and prints them all, separated by OFS and followed by a newline at the end.


An equivalent shorter command:

I feel that the above command is the cleanest and most easily extensible if you are going to include it in a script. It is very explicit about what it is doing and very readable. Plus, the entire thing can be broken out to a standalone awk script without change; something that can’t be done automatically when using -v and -F switches to your awk invocation. (That’s no reason not to use them, of course. Just something to be aware of.)

For a one-off usage especially, I would use the following:

awk -F, -v OFS=, '{for (i = 3; i <= 4; i++) { $i = "" }; print}' inputfile

The -F switch sets the value of FS. The -v switch allows you to set values of awk variables on the command line.

On a more general note, the -v switch can be extremely useful for passing shell variables in as awk variables: -v myawkvar="$myshellvar" and for changing the runtime behavior of a standalone awk script that you pull from a script file with the -f scriptname option at the command line.

Answered By: Wildcard

I’d use perl

perl -F, -lane '@F[2,3]=""; print join ",", @F'

This uses the -a autosplit, with -F field seperator of comma. -n iterates STDIN by line. Then -e to specify a script which replaces the fields 2 and 3 (perl starts from zero) and prints the result.

-l implicitly removes and adds line endings.

Answered By: Sobrique
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.