How to print duplicate records if greater than 12 Times?

How to print duplicate records which is repeated more than 12 time using awk.

Input:

1|abc123
2|abc123
3|abc123
4|abc123
5|abc123
6|abc123
7|abc123 
8|abc123
9|abc123 
10|abc123
11|abc123
12|abc123
13|cde456
14|xyz321
15|jkl245
16|abc123
17|abc123
18|abc123
19|def567
20|abc123

Expected output:

1|abc123
2|abc123
3|abc123
4|abc123
5|abc123
6|abc123
7|abc123 
8|abc123
9|abc123 
10|abc123
11|abc123
12|abc123
15|abc123
16|abc123
17|abc123
18|abc123
20|abc123

I tried below command, but i am not getting exact output.

awk -F'|' 'NR==FNR{cnt[$2]++; next} cnt[$2]>12' input > output
Asked By: Joe

||

Your command will be good if you double parse the file, like:

awk -F'|' 'FNR==NR{c[$2]++;next} c[$2]>12' input input > output

At first pass you count occurences and at second pass you print only for c>12. It is also memory efficient. Also the line order of the original file is honoured. You can easily enforce any sorting if you like.

By the way, your current input example has no lines appearing more than 12 times. abc123 is appearing exactly 12 times. Additionally there is a trailing whitespace into one of these occurences, 7|abc123 , which means a different second field.

Answered By: thanasisp

Assuming you actually mean "2 or more" and not "more than 2", since that is the output you show, you can get your desired output using GNU core tools and a bit of sed:

$ sed 's/  *$//' file | tr '|' ' ' | sort -t ' ' -k 2 | uniq -Df1 | tr ' ' '|'
3|6W0Q3WKP3DZ
6|6W0Q3WKP3DZ
10|81TE22WWDEDCVXBAQ6F20Z86GFW
7|81TE22WWDEDCVXBAQ6F20Z86GFW
9|81TE22WWDEDCVXBAQ6F20Z86GFW
2|BWDY6IGYBDTMAVQA
5|BWDY6IGYBDTMAVQA
1|PTPX9L1Y31QEL55H
4|PTPX9L1Y31QEL55H
  • sed 's/ *$//' file: remove the extra spaces you have at the end of most lines, then
  • tr '|' ' ': replace the | with a space, then
  • sort -t ' ' -k 2 : sort on the second, space-delimited field, then
  • uniq -Df1: keep only duplicated lines (-D) and ignore the first field (-f1) when checking for dupes; then
  • tr ' ' '|': convert the spaces back to | again.
Answered By: terdon
Categories: Answers Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.