Backreference in Awk regex

Is it possible to do this in Awk?:

echo "eoe" | sed -nr '/^(.*)o1$/p'
Asked By: Ignacio


Not in standard awk (POSIX awk uses POSIX EREs which don’t support back references, and 1 means the 0x1 character in awk, though there are some ambiguities). It’s possible with busybox awk though using:

busybox awk '$0 ~ "^(.*)o\1$"'

(what that may or may not do (whether that "\1" should match a literal 1 or the 0x1 character or be unspecified) is unclear in the POSIX specification. In my reading it seems to imply it should match a 0x1 character, but it doesn’t with /usr/xpg4/bin/sh on Solaris 11 for instance which is a certified OS (where it matches on a literal 1 instead))

With any awk, for that particular regexp, you could take another approach like:

awk 'length % 2 && 
       substr($0, (length+1)/2, 1) == "o" && 
       substr($0, 1, (length-1)/2) == substr($0, (length+3)/2)'

As mentioned above POSIX EREs don’t support back-references. GNU sed with -r uses EREs, but that’s GNU EREs that support back-references as an extension over the standard. What that means is that

grep -Ex '(.*)o1'

(or same with egrep) is not portable. However:

grep -x '(.*)o1'

is POSIX and portable. POSIX BREs do support back-references, as did historical implementations of grep. perl regexps or PCREs do support back references as well so you can do:

perl -lne 'print if /^(.*)o1$/'
Answered By: Stéphane Chazelas
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.