Return only the portion of a line after a matching pattern

So pulling open a file with cat and then using grep to get matching lines only gets me so far when I am working with the particular log set that I am dealing with. It need a way to match lines to a pattern, but only to return the portion of the line after the match. The portion before and after the match will consistently vary. I have played with using sed or awk, but have not been able to figure out how to filter the line to either delete the part before the match, or just return the part after the match, either will work.
This is an example of a line that I need to filter:

2011-11-07T05:37:43-08:00 <0.4> isi-udb5-ash4-1(id1) /boot/kernel.amd64/kernel: [gmp_info.c:1758](pid 40370="kt: gmp-drive-updat")(tid=100872) new group: <15,1773>: { 1:0-25,27-34,37-38, 2:0-33,35-36, 3:0-35, 4:0-9,11-14,16-32,34-38, 5:0-35, 6:0-15,17-36, 7:0-16,18-36, 8:0-14,16-32,34-36, 9:0-10,12-36, 10-11:0-35, 12:0-5,7-30,32-35, 13-19:0-35, 20:0,2-35, down: 8:15, soft_failed: 1:27, 8:15, stalled: 12:6,31, 20:1 }

The portion I need is everything after “stalled”.

The background behind this is that I can find out how often something stalls:

cat messages | grep stalled | wc -l

What I need to do is find out how many times a certain node has stalled (indicated by the portion before each colon after “stalled”. If I just grep for that (ie 20:) it may return lines that have soft fails, but no stalls, which doesn’t help me. I need to filter only the stalled portion so I can then grep for a specific node out of those that have stalled.

For all intents and purposes, this is a freebsd system with standard GNU core utils, but I cannot install anything extra to assist.

Asked By: MaQleod

||

The canonical tool for that would be sed.

sed -n -e 's/^.*stalled: //p'

Detailed explanation:

  • -n means not to print anything by default.
  • -e is followed by a sed command.
  • s is the pattern replacement command.
  • The regular expression ^.*stalled: matches the pattern you’re looking for, plus any preceding text (.* meaning any text, with an initial ^ to say that the match begins at the beginning of the line). Note that if stalled: occurs several times on the line, this will match the last occurrence.
  • The match, i.e. everything on the line up to stalled: , is replaced by the empty string (i.e. deleted).
  • The final p means to print the transformed line.

If you want to retain the matching portion, use a backreference: 1 in the replacement part designates what is inside a group (…) in the pattern. Here, you could write stalled: again in the replacement part; this feature is useful when the pattern you’re looking for is more general than a simple string.

sed -n -e 's/^.*(stalled: )/1/p'

Sometimes you’ll want to remove the portion of the line after the match. You can include it in the match by including .*$ at the end of the pattern (any text .* followed by the end of the line $). Unless you put that part in a group that you reference in the replacement text, the end of the line will not be in the output.

As a further illustration of groups and backreferences, this command swaps the part before the match and the part after the match.

sed -n -e 's/^(.*)(stalled: )(.*)$/321/p'

To get the part after the first occurrence of the string instead of last (for those lines where the string can occur several times), a common trick is to replace that string once with a newline character (which is the one character that won’t occur inside a line), and then remove everything up to that newline:

sed -n '
  /stalled: / {
    s//
/
    s/.*n//p
  }'

With some sed implementations, the first s command can be written s//n/ though that’s not standard/portable.

The other canonical tool you already use: grep:

For example:

grep -o 'stalled.*'

Has the same result as the second option of Gilles:

sed -n -e 's/^.*(stalled: )/1/p'

The -o flag returns the --only-matching part of the expression, so not the entire line which is – of course – normally done by grep.

To remove the “stalled :” from the output, we can use a third canonical tool, cut:

grep -o 'stalled.*' | cut -f2- -d:

The cut command uses delimiter : and prints field 2 till the end. It’s a matter of preference of course, but the cut syntax I find very easy to remember.

Answered By: Anne van Rossum

I used ifconfig | grep eth0 | cut -f3- -d: to take this

    [root@MyPC ~]# ifconfig
    eth0  Link encap:Ethernet  HWaddr AC:B4:CA:DD:E6:F8
          inet addr:192.168.0.2  Bcast:192.168.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:78998810244 errors:1 dropped:0 overruns:0 frame:1
          TX packets:20113430261 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:110947036025418 (100.9 TiB)  TX bytes:15010653222322 (13.6 TiB)

and make it look like this

    [root@MyPC ~]# ifconfig | grep eth0 | cut -f3- -d:
    C4:7A:4D:F6:B8
Answered By: Luis Perez

Yet another canonical tool you considered awk could be used with the following line:

awk -F"stalled" '/stalled/{print $2}' messages

Detailed explanation:

  • -F defines a separator for the line, i.e., “stalled”. Everything before the separator is addressed with $1 and everything after with $2.
  • /reg-ex/ Searches for the matching regular expression, in this case “stalled”.
  • {print $<n>} – prints n column. Since your separator is defined as stalled, everything after stalled is considered to be the second column.
Answered By: robertm.tum

there seems to a simpler way. just do:

sed "s/installed.*//g"

which removes all the words after “installed”.

for i in *
do
    se=$(echo $i|sed "s/---.*//g")
    echo $se
    mv "$i" $se
done
Answered By: minor hash

Using Perl (i.e. Perl5) and Raku (previously known as Perl6):

Perl:

perl -pe 's/^.*stalled: //; #leaves non-matching and/or blank lines intact

Or:

perl -nE '/^.*stalled: (.*)/ and say $1;'  #removes non-matching lines

Raku:

raku -pe 's/^.*stalled:s//;' #leaves non-matching and/or blank lines intact

Or:

raku -ne '/^.*stalled:s (.*)/ and say ~$0;' #removes non-matching lines

OUTPUT (for 2nd Perl and 2nd Raku examples above):

12:6,31, 20:1 }

The code above is virtually identical between the two languages. The most significant difference is that in Raku all non-alnum/non-underscore characters must be escaped to be ‘understood literally’ by the Raku regex engine.

Other minor differences include the fact that:

  1. Raku changes capture numbering to start from $0 (Perl starts from $1),
  2. in Raku a leading ~ tilde is used to stringify the match object, and
  3. in Perl a -E commandline flag must be used to enable the say function.

http://www.wall.org/~larry/natural.html
https://www.perl.org/
https://www.raku.org/

Answered By: jubilatious1
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.