A set of paragraphs of 4 lines to manage with AWK
I have a file composed of several paragraphs (more than 2000) of 4 lines.
For each paragraph, I need to match the content between brackets like the example below.
So for each paragraph,
 the entries are the first two lines.
 for the third line, the current content between the brackets is replaced by the content between the second line brackets.
 for the fourth line, the current content between the brackets is replaced by the content between the first line brackets.
I hope it’s clear enough.
–Inputs–
A1 [A3 A4 A5] A2
B1 [B3 B4 B5] B2
C1 [C3 C4] C2
D1 [D3 D4] D2
E1 [E3 E4 E5] E2
F1 [F3 F4 F5] F2
G1 [G3 G4] G2
H1 [H3 H4] H2
–Outputs–
A1 [A3 A4 A5] A2
B1 [B3 B4 B5] B2
C1 [B3 B4 B5] C2
D1 [A3 A4 A5] D2
E1 [E3 E4 E5] E2
F1 [F3 F4 F5] F2
G1 [F3 F4 F5] G2
H1 [E3 E4 E5] H2
Do you have a solution? With awk and gsub I guess but how it’s the problem.
GNU awk, assuming that there are no regexspecial characters between the brackets:
$ gawk vRS= '
BEGIN{OFS=FS="n"}
match($1,/[[^]]*]/,x) && match($2,/[[^]]*]/,y) {
sub(/[[^]]*]/,y[0],$3);
sub(/[[^]]*]/,x[0],$4);
printf "%s%s", $0, RT
}
' file
A1 [A3 A4 A5] A2
B1 [B3 B4 B5] B2
C1 [B3 B4 B5] C2
D1 [A3 A4 A5] D2
E1 [E3 E4 E5] E2
F1 [F3 F4 F5] F2
G1 [F3 F4 F5] G2
H1 [E3 E4 E5] H2
The same is essentially doable in nonGNU awk except you will need to use substr($1,RSTART,RLENGTH)
etc. to obtain the replacements, and you won’t be able to use RT
to restore the original input record separators:
awk '
BEGIN{RS=""; ORS="nn"; OFS=FS="n"}
match($1,/[[^]]*]/) {x = substr($1,RSTART,RLENGTH)}
match($2,/[[^]]*]/) {y = substr($2,RSTART,RLENGTH)}
{
sub(/[[^]]*]/,y,$3);
sub(/[[^]]*]/,x,$4);
print
}
' file
awk F[][] vOFS= '++i==1 {a=$2} i==2 {b=$2} i==3 {$2="[" b "]"} i==4 {$2="[" a "]"} !NF {i=0} 1' input.txt
With square brackets as the field separators, your replacement sources/targets are in $2
.
We increment i
on each line, and reset it to zero between paragraphs. The value of i
(1 though 4) tells us what to do with $2
.
$ cat tst.awk
match($0,/[.*]/) {
idx = (NR  1) % 5 + 1
sect[idx] = substr($0,RSTART,RLENGTH)
if ( idx == 3 ) {
$0 = $1 OFS sect[2] OFS $NF
}
else if ( idx == 4 ) {
$0 = $1 OFS sect[1] OFS $NF
}
}
{ print }
$ awk f tst.awk file
A1 [A3 A4 A5] A2
B1 [B3 B4 B5] B2
C1 [B3 B4 B5] C2
D1 [A3 A4 A5] D2
E1 [E3 E4 E5] E2
F1 [F3 F4 F5] F2
G1 [F3 F4 F5] G2
H1 [E3 E4 E5] H2
The above does string replacement so it’ll work even if the sections inside brackets contain regexp metachars or backreferences.
And also with GNU awk
for the 3rd argument to match()
and using a second array indexed with NR
and the default settings for RS
and FS
:
Updated:
awk '
{
match($0, /([^[]*)([.*])([^]]*)/,a)
b[NR]=a[2]
if (NR==3){print a[1], b[NR1],a[3];next}
if (NR==4){print a[1], b[NR3],a[3];next}
else {print a[1], a[2], a[3]}
if ($0 == "") {NR=0}
}' file
A1 [A3 A4 A5] A2
B1 [B3 B4 B5] B2
C1 [B3 B4 B5] C2
D1 [A3 A4 A5] D2
E1 [E3 E4 E5] E2
F1 [F3 F4 F5] F2
G1 [F3 F4 F5] G2
H1 [E3 E4 E5] H2
With GNU sed
:
sed n E '
1~5 { p; s/.*([.*]).*/1/;h };
2~5 { p; s/.*([.*]).*/1/;
N; s/n//; s/^([.*?])([^[]*)[.*]/21/;p;x;
N; s/n//; s/^([.*?])([^[]*)[.*]/21/;p;
}; 5~5p' infile
TL;DR
1~5 { ... }
: this applies on every 5^{th} lines start from the first line; and same
2~5 { ... }
: applies on every 5^{th} lines but start from the second line; and
5~5 p
: applies on every 5^{th} lines start from the fifth line;
breaking each command down:

1~5 { p; s/.*([.*]).*/1/;h }
:
the
p
command: prints the entire line that matched1~5
condition, so for the first paragraph first line read and will go to output without change; output now is:A1 [A3 A4 A5] A2

with
s/.*([.*]).*/1/
, we captures[ ... ]
part only from that line and remove everything else from the output; then 
with
h
command we copy that result into holdspace; so now holdspace contains this[A3 A4 A5]
.


2~5 { p; s/.*([.*]).*/1/;
:
the
p
command: almost same as the above, but this is for every 5^{th} lines number starting from the second line as said; so it will print second line; now output is:A1 [A3 A4 A5] A2 B1 [B3 B4 B5] B2

with
s/.*([.*]).*/1/
, we again capture the[ ... ]
part from the second line, and remove everything else and do nothing; now our patternspace contains this[B3 B4 B5]
(and reminder that holdspace is still not changed and that is[A3 A4 A5]
) 
in
N; s/n//; s/^([.*?])([^[]*)[.*]/21/; p; x;

N
, read the next line (3^{rd} line now) and append it into patternspace with embedded newline between; so now our patternspace changed as following:[B3 B4 B5] C1 [C3 C4] C2

with
s/n//;
we delete that embedded newline first; now we have below in patternspace[B3 B4 B5]C1 [C3 C4] C2

in
s/^([.*?])([^[]*)[.*]/21/; p; x;
: 
with
^([.*?])
, we capture[B3 B4 B5]
part with backreference of1
that is beginning of line 
with
([^[]*)
, capturesC1
part with backreference of2

with
[.*]
, captures[C3 C4]
part, but will remove from the line 
in replacement part
21
will preserve only, so now patternspace is:C1 [B3 B4 B5] C2

next command is
p
, OK, print it; now output is:A1 [A3 A4 A5] A2 B1 [B3 B4 B5] B2 C1 [B3 B4 B5] C2

now patternspace is
C1 [B3 B4 B5] C2
and holdspace is still[A3 A4 A5]
; and 
with the next command
x
, we exchange the patternspace with holdspace; now patternspace is[A3 A4 A5]
; and holdspace we don’t need it and leave it for now.


in
N; s/n//; s/^([.*?])([^[]*)[.*]/21/; p;
:
N
, read the next line (4^{th} line now) and append it into patternspace with embedded newline between; so now our patternspace changed as following:[A3 A4 A5] D1 [D3 D4] D2

in
s/n//;
we delete that embedded newline first; now we have below in patternspace[A3 A4 A5]D1 [D3 D4] D2

with
s/^([.*?])([^[]*)[.*]/21/; p;
: 
with
^([.*?])
, we capture[A3 A4 A5]
part with backreference of1
that is beginning of line 
with
([^[]*)
, capturesD1
part with backreference of2

with
[.*]
, captures[D3 D4]
part, but will remove from the line 
in replacement part
21
will preserve only, so now patternspace is:D1 [A3 A4 A5] D2

next command is
p
, OK, print it; now output is:A1 [A3 A4 A5] A2 B1 [B3 B4 B5] B2 C1 [B3 B4 B5] C2 D1 [A3 A4 A5] D2



with
5~5p
we print every 5^{th} line start from line 5, that is empty line between each paragraph.
now first paragraph procced and the same steps will continue by thesed
until all lines read and proceed.
perl 00pe 's/(.*)([.*])(.*n)
(.*)([.*])(.*n)
(.*)([.*])(.*n)
(.*)([.*])(.*n)
/$1$2$3$4$5$6$7$5$9$10$2$12/x' input1
perl 00pe
— for each paragraph.
Each line of the RE matches an input paragraph line and separates it in the relevant parts.
In the substitution group we
just have to reorder the parts.
Sorry for the obfuscation…