Remove duplicates by adding numerical suffix
How do I append a numerical suffix to lines to remove duplicates?
Pseudo code:
if currLine.startsWith("tag:")
x = numFutureLinesMatching(currLine)
if (x > 0)
currLine = currLine + ${x:01}
Input file
tag:20230901-FAT
val:1034
tag:20230901-FAT
val:1500
tag:20230901-LAX
val:8934
tag:20230901-SMF
val:2954
tag:20230901-LAX
val:1000
tag:20230901-FAT
val:1500
Desired output
tag:20230901-FAT-02
val:1034
tag:20230901-FAT-01
val:1500
tag:20230901-LAX-01
val:8934
tag:20230901-SMF
val:2954
tag:20230901-LAX
val:1000
tag:20230901-FAT
val:1500
Notes:
- The final duplicate must remain unchanged.
- The earlier duplicates can have any suffix to be unique, so I chose a countdown.
- Awk seems to be a good choice, but any common scripted language will work.
Here we go, exactly as required:
awk '
NR==FNR{
if (/^tag:/) {
a[$1]++
}
next
}
{
c=--a[$1]
if (c>0) {
printf "%s-%.2dn", $1, c
} else {
print
}
}
' file file
With explanations:
awk '
# first block for first file
NR==FNR{ # first file
if (/^tag:/) # if the line starts with ^tag:
a[$1]++ # increment array a with key as column 1
next # stop processing this line
}
# 2th block for second file
{
c=--a[$1] # c = decrement array a with key as column 1
if (c>0) { # ... pretty simple, no ?
printf "%s-%.2dn", $1, c # %s = string %.2d integer, zero pading
} else {
print # else, print current line
}
}
' file file
Output
tag:20230901-FAT-02
val:1034
tag:20230901-FAT-01
val:1500
tag:20230901-LAX-01
val:8934
tag:20230901-SMF
val:2954
tag:20230901-LAX
val:1000
tag:20230901-FAT
val:1500
awk
can take arbitrary array indices – even a whole record ("line").
Make a regex match for tag:
and start the counter, but correct by one due to the first match
awk '$0 ~ /^tag:/ { n[$0]++?$0=sprintf("%s-%02d",$0,n[$0]-1):1 } 1'
To make it a countdown, use tac
twice:
tac infile |
awk '$0 ~ /^tag:/ { n[$0]++?$0=sprintf("%s-%02d",$0,n[$0]-1):1 } 1' |
tac
With perl:
#!/usr/bin/perl
use strict; use warnings;
use feature qw/say/;
my (%h, $c);
while (<>) {
chomp;
if (/^tag:/) {
$c = sprintf "%.2d", ++$h{$_};
if ($c>1) {
say $_ . "-" . $c;
} else {
say;
}
} else {
say $_;
}
}
Usage:
./script file
Output:
tag:20230901-FAT
val:1034
tag:20230901-FAT-02
val:1500
tag:20230901-LAX
val:8934
tag:20230901-SMF
val:2954
tag:20230901-LAX-02
val:1000
tag:20230901-FAT-03
val:1500
Using Raku (formerly known as Perl_6)
~$ raku -ne 'BEGIN my %hash; put /^tag:/ && %hash{$_}++ ?? $_ ~ sprintf("-%02d", %hash{$_}-1) !! $_;' file
Above is the Raku version of an excellent awk
answer posted by @EdMorton in a comment.
Start by calling Raku at the commandline with the -ne
non-autoprinting linewise flags. Before entering the linewise code BEGIN
by declaring a %hash
. Run the put
… statement over the input. If the line /^tag:/
starts with tag:
add the line to the %hash
and ++
increment its value.
This &&
conditional forms the beginning of Raku’s "Test ??
True !!
False" ternary operator. If True, the $_
line is output
with the line’s value minus one appended (value decoded using %hash{$_}
). If False, the line is output
unchanged.
Sample Input:
tag:20230901-FAT
val:1034
tag:20230901-FAT
val:1500
tag:20230901-LAX
val:8934
tag:20230901-SMF
val:2954
tag:20230901-LAX
val:1000
tag:20230901-FAT
val:1500
Sample Output:
tag:20230901-FAT
val:1034
tag:20230901-FAT-01
val:1500
tag:20230901-LAX
val:8934
tag:20230901-SMF
val:2954
tag:20230901-LAX-01
val:1000
tag:20230901-FAT-02
val:1500
Above implements a count-up suffix, leaving the earliest tag:
lines unchanged. To implement a count-down suffix that leaves the final tag:
lines unchanged, use tac
twice as instructed in the accepted answer by @FelixJN. Below, the answer implemented on MacOS which uses tail -r
instead of tac
:
~$ tail -r Steve_suffix.txt | raku -ne 'BEGIN my %hash; put /^tag:/ && %hash{$_}++ ?? $_ ~ sprintf("-%02d", %hash{$_}-1) !! $_;' | tail -r
tag:20230901-FAT-02
val:1034
tag:20230901-FAT-01
val:1500
tag:20230901-LAX-01
val:8934
tag:20230901-SMF
val:2954
tag:20230901-LAX
val:1000
tag:20230901-FAT
val:1500
https://unix.stackexchange.com/a/114043
https://docs.raku.org/language/operators#infix_??_!!
https://raku.org