How can I print records with difference in maximum and minimum value greater than a certain threshold?

I have data like this in two columns:

[id1] 09:51:07,175
[id1] 09:51:07,215
[id2] 10:09:47,550
[id2] 10:09:47,588
[id2] 10:09:47,942
[id2] 10:09:47,947
[id3] 10:05:25,945
[id3] 10:05:26,001

The first column is IDs and the other is time.
Now, I want to print IDs if the difference between their minimum and maximum times is greater than x.

If x = 100, the desired output would be:

id2

Because max(id2) = 10:09:47,947 and min(id2) = 10:09:47,550
and their difference is 397. If x=30, the desired output would be:

id1
id2
id3

because

max(id1) - min(id1) = 40
max(id2) - min(id2) = 397
max(id3) - min(id3) = 56

if x is 50 then output would be:

id2
id3

How can I approach this?

Asked By: akib

||

A typical job for perl:

perl -MList::Util=min,max -lsne '
  push @{$v{$1}}, $5 + 1000 * ($4 + 60 * ($3 + 60 * $2)) if
    m{^[(.*)] (dd):(dd):(dd),(d+)$};
  END{
    for (keys %v) {
      print if max(@{$v{$_}}) - min(@{$v{$_}}) > $threshold
    }
  }' -- -threshold=100 < your-file

If could be made more efficient and less memory at the expense of more complicated code hungry by computing the min and max as we read the lines rather than storing all the values and calling min() and max() in the end. But unless the input is very large, it’s not likely to make much difference.

Answered By: Stéphane Chazelas
Categories: Answers Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.