How to extract multiple values from a file in a single pass?

I have a huge log file (about 6GB) from a simulation. Among the millions of lines in that file, there are two lines that are frequently repeating for a given time:

...
Max value of omega = 3.0355
Time = 0.000001
....
Max value of omega = 4.3644
Time = 0.000013
...
Max value of omega = 3.7319
Time = 0.000025
...
...
...
Max value of omega = 7.0695
Time = 1.32125
...
... etc.

I would like to extract both "Max value of omega" and "Time" and save them in a single file as columns:

#time max_omega
0.000001 3.0355
0.000013 4.3644
0.000025 3.7319
...etc.

I proceeded as follows:

# The following takes about 15 seconds
grep -F 'Max value of omega' logfile | cut -d "=" -f 2 > max_omega_file.txt  

, and the same for "Time"

# This also takes about 15 seconds
# Very important: match exactly 'Time =' because there other lines that contain the word 'Time'
grep -F 'Time =' logfile | cut -d "=" -f 2 > time.txt

Then I need to use the command paste to create a two-columns file: Time.txt as the first column and "max_omega_file.txt" as the second column.

As you can see, the time is doubled in the steps above. I wonder if there a single solution to achieve the same results in a single pass so I save some time?

Asked By: adhrar_nmatrous

||

I can’t guarantee it will be faster, but you could do something like this in awk:

awk -F' = ' '$1=="Max value of omega" {omega = $2} $1=="Time" {print omega,$2}' file
Answered By: steeldriver
sed -n '/^Max/ { s/^.*=s*//;h; };
        /^Time/{ s/^.*=s*//;G; s/n/ /;p; }' infile
  • match-run syntax /.../{ ... }:
    commands within {...} will only run on the lines that matched with regex/pattern within /.../;

  • s/^.*=s*//:
    deletes everything up-to last = and whitespaces s* also if there was any.

  • h:
    copy the result into hold-space

  • G:
    append the hold-space to pattern-space with embedded newline

  • s/n/ /:
    replace that embedded newline with space in the pattern-space

  • p:
    print pattern-space; you can use P command here instead too.

    0.000001 3.0355
    0.000013 4.3644
    0.000025 3.7319
    1.32125 7.0695
    

A similar approach proposed by @stevesliva that is used s//<replace>/ which is shorthand to do substitution on the last match:

sed -n '/^Max.*=s*/ { s///;h; };
        /^Time.*=s*/{ s///;G; s/n/ /;p; }' infile
Answered By: αғsнιη
$ awk 'BEGIN{print "#time", "omega"} /^Max value of omega =/{omega=$NF; next} /^Time =/{print $NF, omega}' file
#time omega
0.000001 3.0355
0.000013 4.3644
0.000025 3.7319
1.32125 7.0695

but this will probably be faster:

$ grep -E '^(Max value of omega|Time) =' file |
    awk 'BEGIN{print "#time", "omega"} NR%2{omega=$NF; next} {print $NF, omega}'
#time omega
0.000001 3.0355
0.000013 4.3644
0.000025 3.7319
1.32125 7.0695
Answered By: Ed Morton

Something like

paste 
  <(<file awk -F= '$1 ~ /omega/ {print $2}') 
  <(<file awk -F= '$1 ~ /Time/ {print $2}')

I think even

<file grep -o '[[:digit:].]*' | paste - -

Or

<file cut -d= -f2 | paste - -

Would do

Answered By: D. Ben Knoble

grep may search for multiple patterns in one go

-e PATTERNS, –regexp=PATTERNS
Use PATTERNS as the patterns. If this option is used multiple
times
or is combined with the -f (–file) option, search for
all patterns given
. This option can be used to protect a
pattern beginning with “-”.

So

grep -F -e 'Max value of omega = ' -e 'Time = ' logfile

will reduce the size of the search space. Then you can post process with one of the other suggestions.

Answered By: Olaf Dietsche

an alternative perhaps simpler sed solution would be

sed -nr 'N;s/^Max value of omega = ([0-9.]+)nTime = ([0-9.]+)$/1 2/p;D;' logfile

where ‘N’ adds a second line to pattern space, the ‘s/pattern/string/p’ block seeks the two line pattern and prints out the two capture groups (1 2) separated by space, and finally D discards the first line from pattern space.

One advantage of this approach that I’ve used in past when seeking multi-line patterns is you can print out the capture groups in an arbitrary order, not necessarily the order they appear in the file.
So that in your example if you wanted "Time" in the first column you can simply do this

sed -nr 'N;s/^Max value of omega = ([0-9.]+)nTime = ([0-9.]+)$/2 1/p;D;' logfile

Note it now says "2 1" rather than "1 2".

Answered By: MNB
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.