How to match a pattern instead of a single letter/number using brackets?

I want exclude MSG, PDF and DOC from my path with shell parameter expension using brackets.

When I place MSG between brackets only deletes the M instead of MSG. I looked around on the internet and read the documenation but still fail to understand how to do it correctly. Maybe I don’t know the right keywords to search for.

My code to only to delete MSG only

find "${INPUTPATH}" -mindepth 2 -maxdepth 2 -type d -print0 | while IFS= read -r -d '' file; do
    echo "${file}"
    casenumber="${file#${INPUTPATH}/[MSG]}"
    echo "${casenumber}"
done

input:

/home/user/output/test/PDF/2218-0
/home/user/output/test/PDF/2218-0
/home/user/output/test/DOC/2218-0
/home/user/output/test/DOC/2218-0
/home/user/output/test/MSG/2226-4
/home/user/output/test/MSG/2226-4
/home/user/output/test/MSG/2222 -2
/home/user/output/test/MSG/2222 -2
/home/user/output/test/MSG/2218-0
/home/user/output/test/MSG/2218-0

Current output to delete MSG:

/home/user/output/test/PDF/2218-0
/home/user/output/test/PDF/2218-0
/home/user/output/test/DOC/2218-0
/home/user/output/test/DOC/2218-0
/home/user/output/test/MSG/2226-4
SG/2226-4
/home/user/output/test/MSG/2222 -2
SG/2222 -2
/home/user/output/test/MSG/2218-0
SG/2218-0

Expected output:

/home/user/output/test/PDF/2218-0
/home/user/output/test/PDF/2218-0
/home/user/output/test/DOC/2218-0
/home/user/output/test/DOC/2218-0
/home/user/output/test/MSG/2226-4
/2226-4
/home/user/output/test/MSG/2222 -2
/2222 -2
/home/user/output/test/MSG/2218-0
/2218-0

I actually want to delete MSG, PDF and DOC in this way

find "${INPUTPATH}" -mindepth 2 -maxdepth 2 -type d -print0 | while IFS= read -r -d '' file; do
    echo "${file}"
    casenumber="${file#${INPUTPATH}/[MSG][PDF][DOC]/}"
    echo "${casenumber}"
done

I understand why the above code doesn’t work. But I first need to solve MSG only make this work

Final expected output:

/home/user/output/test/PDF/2218-0
2218-0
/home/user/output/test/DOC/2218-0
2218-0
/home/user/output/test/MSG/2226-4
2226-4
/home/user/output/test/MSG/2222 -2
2222 -2
/home/user/output/test/MSG/2218-0
2218-0
Asked By: unixcandles

||

Not actually globbing, but …

Recent versions of Bash can do Regular Expressions based matching with its RegEx operator =~ when used in extended test brackets [[ ... ]] … It can do capture groups (...) and has a builtin BASH_REMATCH array where the zeroth index ${BASH_REMATCH[0]} refers to the whole match and the next index ${BASH_REMATCH[1]} refers to the first capturing group match and the next ${BASH_REMATCH[2]} refers to the second capturing group match and so on.

So, you could possibly do something like this:

$ printf '%s' "/home/user/output/test/PDF/2218-0" "/home/user/output/test/DOC/2218-0" "/home/user/output/test/MSG/2226-4" |
while IFS= read -r -d '' file; do
  [[ "$file" =~ .*(DOC|MSG|PDF)(.*) ]] && printf '%sn' "$file" "${BASH_REMATCH[2]}"
  done
/home/user/output/test/PDF/2218-0
/2218-0
/home/user/output/test/DOC/2218-0
/2218-0
/home/user/output/test/MSG/2226-4
/2226-4
Answered By: Raffa

[MSG] as a glob pattern matches any one of the characters M, S or G. To match either of MSG, DOC or PDF, you use (MSG|DOC|PDF) in zsh or @(MSG|DOC|PDF) in ksh. bash doesn’t support zsh glob operators but it supports a subset of ksh operators including that one after shopt -s extglob, so in bash:

shopt -s extglob
casenumber=${file#"${INPUTPATH}"/@(MSG|DOC|PDF)}

Would assign to casenumber, the contents of $file stripped of the shortest leading part that matches the contents of $INPUTPATH (literally thanks to the quotes around it which are needed in ksh/bash contrary to zsh), followed by / followed by either of MSG, DOC or PDF.

In ksh, just omit the shopt -s extglob which is bash specific and not needed in ksh. In zsh:

casenumber=${file#$INPUTPATH/(MSG|DOC|PDF)}
Answered By: Stéphane Chazelas

Not to answer your exact question, but some notes on this particular case.

Given you have find "${INPUTPATH}" -mindepth 2 -maxdepth 2 there, all the resulting paths should only have one slash after the initial $INPUTPATH, so you could disregard which particular three-letter string you have there and just remove anything up to the next /:

casenumber=${file#"${INPUTPATH}"/*/}

or, since it’s the last slash anyway, just remove everything up to the last /:

casenumber="${file##*/}"

Here, the doubled # means to take the longest match.

Also, you can drop the $INPUTPATH part from the output (replacing it with just .) if you cd there first, before running find:

(cd -P -- "${INPUTPATH}" && find . -mindepth 2 -maxdepth 2 -type d -print0) |
 while IFS= read -r -d '' file; do
    echo "${file}"
    casenumber="${file#./*/}"
    echo "${casenumber}"
done
Answered By: ilkkachu
Categories: Answers Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.