egrep regular expression – same word in the beginning and end

I want to find all the lines that have the same word in the beginning and in the end of the line.

For example:

goodword         fgdlakj 3t sfkl 43lk fkl goodword
bad sfa;lk3t   dgk;gs    34;kl bad334
singleword

Desired output

goodword         fgdlakj 3t sfkl 43lk fkl goodword
singleword

My code is:

egrep "(^.+)([ ]+.*1)$"

it does work if the line has more than 1 word. But I want a line containing a single word to match too.

So I tried:

egrep "(^.+)($|([ ]+.*1)$)"

and it does not work anymore – and I don’t know why.

Asked By: Silas2033

||

I propose to use awk instead:

awk '$1==$NF' file

The advantage of this solution is that it is way simpler to read, and secondly you can easily change field separator (with -F option), so that eg. even the same number of spaces at the beginning and end of the line will match.

Answered By: jimmij

There is really nice jimmij‘s answer but if you insist on grep:

grep -Ex '(S+)(.*1)?' file
Answered By: Costas

With a POSIX grep, an equivalent of awk '$1 == $NF' would be:

grep -x '[[:blank:]]*([^[:blank:]]{1,})([[:blank:]](.*[[:blank:]]){0,1}1){0,1}[[:blank:]]*'
Answered By: Stéphane Chazelas