Get value after a keyword that is in the middle of the line only from given line

Hey so I have the following line in the middle of a file and I only need to get the value following "energy=".
The line number is stored in a variable called "lineNumber".
There are other lines in the file that have the same structure, but with different values. I only want the value of energy from the line defined in "lineNumber".
Help would be much appreciated. Thank you!

Properties=species:S:1:pos:R:3:velocities:R:3:forces:R:3:local_energy:R:1:fix_atoms:S:3 Lattice="42.0000000000       0.0000000000    0.0000000000    0.0000000000   46.0000000000    0.0000000000    0.0000000000    0.0000000000   50.0000000000" temperature=327.11679001 pressure=14.24003276 time_step=5.0000 time=5000.0000 energy=-18.022194 virial="0.46990039            0.48760331     -0.77576961      0.48760331      0.78141847      0.59471844     -0.77576961      0.59471844      0.64787347" stress="-0.00000486          -0.00000505      0.00000803     -0.00000505     -0.00000809     -0.00000616      0.00000803     -0.00000616     -0.00000671" volume=96600.000000 step=1000
Asked By: karri104

||
awk -v lineNumber="$lineNumber" -v FS="energy=" 'NR == lineNumber {print $2}' FILE | awk '{print $1}'
Answered By: Arkadiusz Drabczyk

Since you’re using a Linux-based system you can almost certainly use GNU grep

grep -oP 'energy=K[^s]+'

For example

echo 'Properties=species:S:1:pos:R:3:velocities:R:3:forces:R:3:local_energy:R:1:fix_atoms:S:3 Lattice="…" temperature=327.11679001 … time=5000.0000 energy=-18.022194 virial="0.46990039 …" stress="…" volume=96600.000000 step=1000' |
    grep -oP 'energy=K[^s]+'

Output

-18.022194

You can pick out a particular line number using a tool such as sed

lineNumber=123
sed -n "${lineNumber}{p;q}" file

Putting these together,

sed -n "${lineNumber}{p;q}" file | grep -oP 'energy=K[^s]+'

You can use something like perl too:

perl -e '
    $lineNumber = shift;                                 # Arg 1 is line number
    $fieldName = shift;                                  # Arg 2 is field name
    while (defined($line = <>)) {                        # Read lines from file or stdin
        next unless $. == $lineNumber;                   # Skip until required line
        chomp $line;                                     # Discard newline
        %a =                                             # Create key/value array. Read the next lines upwards
            map { split(/=/, $_, 2) }                    # 3. Split into {key,value} tuples
            grep { /=/ }                                 # 2. Only interested in assignments
            split(/(w+=(".*?"|[^"].*?)s+)/, $line);    # 1. Split line into « key=value » and « key="several values" » fields
        print $a{$fieldName}, "n";                      # Print chosen field value
        exit 0
    }
' "$lineNumber" 'energy' file
Answered By: roaima

A sed answer

sed -En "$lineNumber {s/.*energy=//; s/[[:blank:]].*//; p; q; }" file
Answered By: glenn jackman

May be overkill for your current need but you could create an array of tag to value mappings (stored in the array f[] below):

$ awk -v FPAT='[^=[:space:]]+=([^[:space:]]+|"[^"]*")' -v n="$lineNumber" '
    NR == n {
        delete f
        for (i=1; i<=NF; i++) {
            f[gensub(/=.*/,"",1,$i)] = gensub(/[^=]+=/,"",1,$i)
        }
        print f["energy"]
    }
' file
-18.022194

and then you can do anything else you like with any value or combination of values just by indexing f[] with the tags (names), e.g. you could write:

awk -v FPAT='[^=[:space:]]+=([^[:space:]]+|"[^"]*")' '
    {
        delete f
        for (i=1; i<=NF; i++) {
            f[gensub(/=.*/,"",1,$i)] = gensub(/[^=]+=/,"",1,$i)
        }
    }
    (f["time"] < 6) && (f["volume"] > 8) {
        print f["temperature"], f["energy"], f["step"] / f["time_step"]
    }
' file
327.11679001 -18.022194 200

or anything else you might need to compare/calculate/print.

The above uses GNU awk for FPAT and gensub(), you can do the same with any POSIX awk that supports delete array, as most if not all do now, with a bit more code:

$ awk -v n="$lineNumber" '
    NR == n {
        delete f
        rec = $0
        while ( match(rec,/[^=[:space:]]+=([^[:space:]]+|"[^"]*")/) ) {
            tag = val = substr(rec,RSTART,RLENGTH)
            sub(/=.*/,"",tag)
            sub(/[^=]+=/,"",val)
            f[tag] = val
            rec = substr(rec,RSTART+RLENGTH)
        }
        print f["energy"]
    }
' file
-18.022194

If your awk complains about delete f then just change that line to split("",f) which will work in any awk.

Answered By: Ed Morton

Using Raku (formerly known as Perl_6)

~$ raku -ne 'put ++$ => $/ if ++$ == 1 && m/ s energy= <( <+ :N + [-+.]>+  )> s /;'   file

OR:

~$ raku -ne 'put ++$ => $0 if ++$ == 1 && m/ s energy=  ( <+ :N + [-+.]>+  )  s /;'    file

OR:

~$ raku -ne 'put ++$ => $<val> if ++$ == 1 && m/ s energy=  $<val>=<+ :N + [-+.]>+   s /;'   file

The Raku code above uses -ne non-autoprinting linewise flags (code is run linewise over the input file). Reading from right-to-left, a match m/.../ is sought for the desired value following energy=. This match is written in the Raku Regex dialect as <+ :N + [-+.]>+ which is a composite character class consisting of any :N Unicode number and the characters -, +, ., taken one-or-more times (- is one of the few characters that need to be backslash escaped in this construct). This character class can also be written <+[0..9-+.]>+, but you lose matches to Unicode numbers other than 0..9.

The easiest way to count line numbers in Raku is to run a ++$ auto-incrementing counter starting from one. Putting it all together (now reading left-to-right), the first example code says "put the ++$ line-number followed by the built-in $/ match variable if ++$ == 1 if the line-number numerically equals one && the desired match is found, dropping everything outside <( ... )> capture markers". Because && is used, the boolean short-circuits. (Remove the ++$ => from the output if all you want returned is the value).

Sample Input (OP’s data taken three times):

Properties=species:S:1:pos:R:3:velocities:R:3:forces:R:3:local_energy:R:1:fix_atoms:S:3 Lattice="42.0000000000       0.0000000000    0.0000000000    0.0000000000   46.0000000000    0.0000000000    0.0000000000    0.0000000000   50.0000000000" temperature=327.11679001 pressure=14.24003276 time_step=5.0000 time=5000.0000 energy=-18.022194 virial="0.46990039            0.48760331     -0.77576961      0.48760331      0.78141847      0.59471844     -0.77576961      0.59471844      0.64787347" stress="-0.00000486          -0.00000505      0.00000803     -0.00000505     -0.00000809     -0.00000616      0.00000803     -0.00000616     -0.00000671" volume=96600.000000 step=1000
Properties=species:S:1:pos:R:3:velocities:R:3:forces:R:3:local_energy:R:1:fix_atoms:S:3 Lattice="42.0000000000       0.0000000000    0.0000000000    0.0000000000   46.0000000000    0.0000000000    0.0000000000    0.0000000000   50.0000000000" temperature=327.11679001 pressure=14.24003276 time_step=5.0000 time=5000.0000 energy=-18.022194 virial="0.46990039            0.48760331     -0.77576961      0.48760331      0.78141847      0.59471844     -0.77576961      0.59471844      0.64787347" stress="-0.00000486          -0.00000505      0.00000803     -0.00000505     -0.00000809     -0.00000616      0.00000803     -0.00000616     -0.00000671" volume=96600.000000 step=1000
Properties=species:S:1:pos:R:3:velocities:R:3:forces:R:3:local_energy:R:1:fix_atoms:S:3 Lattice="42.0000000000       0.0000000000    0.0000000000    0.0000000000   46.0000000000    0.0000000000    0.0000000000    0.0000000000   50.0000000000" temperature=327.11679001 pressure=14.24003276 time_step=5.0000 time=5000.0000 energy=-18.022194 virial="0.46990039            0.48760331     -0.77576961      0.48760331      0.78141847      0.59471844     -0.77576961      0.59471844      0.64787347" stress="-0.00000486          -0.00000505      0.00000803     -0.00000505     -0.00000809     -0.00000616      0.00000803     -0.00000616     -0.00000671" volume=96600.000000 step=1000

Sample Output (tab-delimited return):

1   -18.022194

Finally, if you’d rather input the lineNumber off the command-line rather than hard-coding it in the one-liner, env environment variables are available inside Raku as the dynamic %*ENV hash variable. So you can do the following (note the initial env is optional):

~$ env lineNumber="1" perl6 -ne 'put $0 if ++$ == %*ENV<lineNumber> && m/ s energy=  ( <+[0..9-+.]>+ )  s /;'   file
-18.022194

https://docs.raku.org/language/regexes
https://raku.org

Answered By: jubilatious1
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.