UNIX command for replacing within delimiter based on position of the delimiter
I have a input string with |
[pipe] delimiter and like to replace the empty string 3rd and 5th column by &
character.
Input File:
a a|b b|c c|d d|e e
f f|g g|h h|i i|j j
Output File:
a a|b b|c&c|d d|e&e
f f|g g|h&h|i i|j&j
You can see the space between cc, ee, hh and jj
is replaced with &
I have an alternate solution which involves read file using while loop and by using cut
command based on delimiter and storing it in variable based on position and replacing the space by ‘&’ using sed
and append all the splitted variable in to one variable and append it in a new file. Is there a single command which can be used to achieve this?
Use awk
for this:
awk -F| '{gsub(" ","\&",$3); gsub(" ","\&",$5)}1' OFS=| infile.txt
-
The
-F|
, telling ‘awk’ that fields are delimited by|
pipe (it’s escaped byto shell don’t interpret it as
pipeline stdin
, we could use-F"|"
or either-F'|'
). -
The
gsub("regexp","replacement"[, INDEX])
syntax used to replace" "
(space) with literal&
in index (column)$3
and$5
, below is showing each Index position based on|
delimiter.a a|b b|c c|d d|e e ^^^|^^^|^^^|^^^|^^^ $1 |$2 |$3 |$4 |$5
Read more about why we escaped
\&
there and two times?! -
What is the
1
used at the end inawk '{...}1'
? it’s awk’s default action control to print. read more in details -
The
OFS=|
again bring back or print the fields with specified|
delimiter.
You could do
sed 's/(|[^| ]*) */1&/4;s//1&/2'
for your example. Explained:
|[^| ]*
searches for your field separator and all non-spaces in that column. It’s grouped with ()
so it can later be copied to the replacement by 1
. Then one or more whitespaces will get replaced by the &
, which needs to be escaped in the replacement string. The 4
means to apply this to the fourth occurence which is the fifth column. Then repeat it with 2
for the third column. You don’t need to repeat the pattern by giving an empty pattern.
More complicated if there can be more than a single group of spaces in the column or none at all. Then better use a different tool like awk
.
On the other hand, if you know that there is always one whitespace in each column, do a simple
sed 's/ /&/5;s//&/3'
perl -aF'(|)' -lne 's/h/&/ for @F[2*2,2*4]; print @F' input_file
Results
a a|b b|c&c|d d|e&e
f f|g g|h&h|i i|j&j
Working
Split the current record on pipe |
and also include the delimiter in the fields. Hence, the 3rd and 5th fields become 2*2 and 2*4 fields.
For both these fields, we replace the horizontal whitespace h
with a literal &
. When done, just print the fields.