Find all occurrences in a file with sed

Using OPEN STEP 4.2 OS…
I am currently using the following sed Command:

sed -n '1,/141.299.99.1/p' TESTFILE | tail -3

This command will find one instance in a file with the IP of 141.299.99.1 and also include 3 lines before it which is all good, with the exception that I would also like to find all the instances of the IP and the 3 lines before it and not just the first.

Asked By: Dale

||

grep will do a better job of this:

grep -B 3 141.299.99.1 TESTFILE

The -B 3 means to print the three lines before each match. This will print -- between each group of lines. To disable that, use --no-group-separator as well.

The -B option is supported by GNU grep and most BSD versions as well (OSX, FreeBSD, OpenBSD, NetBSD), but it is technically not a standard option.

Answered By: Michael Homer

Since you mention that you don’t have the -B option to grep, you can use Perl (for example) to make a sliding a window of 4 lines:

perl -ne '
    push @window,$_;
    shift @window if @window > 4;
    print @window if /141.299.99.1/
' your_file

Ramesh’s answer does a similar thing with awk.

Answered By: Joseph R.
awk '/141.299.99.1/{for(i=1;i<=x;)print a[i++];print} {for(i=1;i<x;i++)
     a[i]=a[i+1];a[x]=$0;}'  x=3 filename

In this awk solution, an array is used which will always contain 3 lines before the current pattern. Hence, when the pattern is matched, the array contents along with the current pattern is printed.

Testing

-bash-3.2$ cat filename
10.0.0.1
10.0.0.2
10.0.0.3
10.0.0.4
141.299.99.1
10.0.0.5
10.0.0.6
10.0.0.7
10.0.0.8
10.0.0.9
10.0.0.10
141.299.99.1
10.0.0.11
10.0.0.12
10.0.0.13
10.0.0.14
10.0.0.15
10.0.0.16
141.299.99.1
10.0.0.17
10.0.0.18
10.0.0.19

After I execute the command, the output is,

10.0.0.2
10.0.0.3
10.0.0.4
141.299.99.1
10.0.0.8
10.0.0.9
10.0.0.10
141.299.99.1
10.0.0.14
10.0.0.15
10.0.0.16
141.299.99.1
Answered By: Ramesh

When available you can use pcregrep:

pcregrep -M '.*n.*n.*n141.299.99.1' file
Answered By: chaos

You can implement the same basic approach as the other non-grep answers in the shell itself (this assumes a relatively recent shell that supports =~):

while IFS= read -r line; do 
    [[ $line =~ 141.299.99.1 ]] && printf "%sn%sn%sn%sn" $a $b $c $line;
    a=$b; b=$c; c=$line; 
done < file 

Alternatively, you could slurp the whole file into an array :

perl -e '@F=<>; 
        for($i=0;$i<=$#F;$i++){
          print $F[$i-3],$F[$i-2],$F[$i-1],$F[$i] if $F[$i]=~/141.299.99.1/
        }' file 
Answered By: terdon

With sed you can do a sliding window.

sed '1N;$!N;/141.299.99.1/P;D'

That does it. But beware – bash‘s insane behavior of expanding ! even when quoted!!! into the command string from your command history might make it go a little crazy. Prefix the command with set +H;if you find this is the case. To then re-enable it (but why???) do set -H afterward.

That, of course, would only apply if you were using bash – though I don’t believe you are. I’m fairly certain you’re working with csh(which happens to be the shell whose insane behavior bash emulates with the history expansion, but maybe not to the extremes the c shell took it). So probably a ! should work. I hope.

It’s all portable code: POSIX describes its three operators thus: (though it’s worth noting that I’ve only confirmed this description existed as early as 2001)

[2addr]N
Append the next line of input, less its terminating newline, to the pattern space, using an embedded newline to separate the appended material from the original material. Note that the current line number changes.

[2addr]P
Write the pattern space, up to the first newline, to standard output.

[2addr]D
Delete the initial segment of the pattern space through the first newline and start the next cycle.

So on the first line you add an extra line to pattern space, so it looks like this:

^line 1s contentsnline 2s contents$

Then on the first line and every line thereafter – excepting the very last – you add another line to pattern space. So it looks like this:

^line 1nline 2nline 3$

If your ip address is found within you Print up to the first newline, so just line 1 here. At the end of every cycle you Delete same and start over with what remains. So the next cycle looks like:

^line 2nline 3nline 4$

…and so on. If your ip is to be found on any one of those three the oldest will print out – every time. So you’re always only three lines ahead.

Here’s a quick example. I’ll get a three line buffer printed for every number ending in zero:

seq 10 52 | sed '1N;$!N;/0(n|$)/P;D'

10
18
19
20
28
29
30
38
39
40
48
49
50

That one’s a little more complicated than your case because I had to alternate from either 0n newline or 0$ end of pattern space to more closely resemble your problem – but they are subtly different in that this requires an anchor – which can be a little difficult to do since pattern-space constantly shifts.

I used the odd cases of 10 and 52 to show that as long as the anchor is flexible then so is the output. Fully portably, I can achieve the same results by instead counting on the algorithm and do:

seq 10 52 | sed '1N;$!N;/[90]n/P;D'

And widen the search while restricting my window – from 0 to 9 and 0 and from 3 lines to two.

Anyway, you get the idea.

Answered By: mikeserv

If your system doesn’t support grep context, you can try ack-grep instead:

ack -B 3 141.299.99.1 file

ack is a tool like grep, optimized for programmers.

Answered By: cuonglm

Here’s an attempt to emulate grep -B3 using a sed moving window, based on this GNU sed example (but hopefully POSIX-compliant – with acknowledgement to @St├ęphaneChazelas):

sed -e '1h;2,4{;H;g;}' -e '1,3d' -e '/141.299.99.1/P' -e '$!N;D' file

The first two expressions prime a multi-line pattern buffer and allow it to handle the edge case in which there are fewer than 3 lines of preceding context before the first match. The middle (regex match) expression prints a line off the top of the window until the desired match text has rippled up through the pattern buffer. The final $!N;D scrolls the window by one line except when it reaches the end of input.

Answered By: steeldriver

In most of these, /141.299.99.1/ will also match (e.g.) 141a299q99+1 or 141029969951 because . in a regular expression can represent any character.

Using /141[.]299[.]99[.]1/ is safer, and you can add additional context at the beginning and end of the whole regexp to make sure it doesn’t match 3141., .12, .104 , etc.

Answered By: user117529
Categories: Answers Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.