Bash regex capture group

I’m trying to match multiple alphanumeric values (this number could vary) from a string and save them to a bash capture group array. However, I’m only getting the first match:

mystring1='<link rel="self" href="/api/clouds/1/instances/1BBBBBB"/> dsf <link rel="self" href="/api/clouds/1/instances/2AAAAAAA"/>'

regex='/instances/([A-Z0-9]+)'

[[ $mystring1 =~ $regex ]]

echo ${BASH_REMATCH[1]}
1BBBBBB

echo ${BASH_REMATCH[2]}

As you can see- it matches the first value I’m looking for, but not the second.

Asked By: Arthur Lyssenko

||

To get the second array value, you need to have a second set of parentheses in the regex:

mystring1='<link rel="self" href="/api/clouds/1/instances/1BBBBBB"/> dsf <link rel="self" href="/api/clouds/1/instances/2AAAAAAA"/>'

regex='/instances/([A-Z0-9]+).*/instances/([A-Z0-9]+)'

[[ $mystring1 =~ $regex ]]

$ echo ${BASH_REMATCH[1]}
1BBBBBB
$ echo ${BASH_REMATCH[2]}
2AAAAAAA
Answered By: Jeff Schaller

It’s a shame that you can’t do global matching in bash. You can do this:

global_rematch() { 
    local s=$1 regex=$2 
    while [[ $s =~ $regex ]]; do 
        echo "${BASH_REMATCH[1]}"
        s=${s#*"${BASH_REMATCH[1]}"}
    done
}
global_rematch "$mystring1" "$regex" 
1BBBBBB
2AAAAAAA

This works by chopping the matched prefix off the string so the next part can be matched. It destroys the string, but in the function it’s a local variable, so who cares.

I would actually use that function to populate an array:

$ mapfile -t matches < <( global_rematch "$mystring1" "$regex" )
$ printf "%sn" "${matches[@]}"
1BBBBBB
2AAAAAAA
Answered By: glenn jackman

Python implementation :

import re

def getall(mysentence):
    regex = re.compile(r'.*?/instances/([0-9A-Z]+)')
    result = regex.findall(mysentence)
    return result

print(getall('<link rel="self" href="/api/clouds/1/instances/1BBBBBB"/> dsf <link rel="self" href="/api/clouds/1/instances/2AAAAAAA"/>'))
Answered By: ADEMILOLA ALADETAN
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.