How to replace both lower and uppercase extensions with Parameter Expansion?

I’m converting doc files to txt using catdoc on Linux. To keep the same file name as output file I’m replacing the .doc extension with .txt using parameter expension. But there are many doc files ending on .DOC. How to make the .doc in ${filename%.doc}.txt case insentive while keeping the capitals in the filename itself? I can’t use ${filename%.*}.txt because some files have dots in the filename

My current code:

find "${COMPANYPATH}" -iname '*.doc' | while read -r file; do
    echo "${file}"
    filename=$(basename "${file}")
    path="${file%/*}/"
    mkdir -p "${OUTPUTPATH}/DOC/${path#$COMPANYPATH/}"
    catdoc "${file}" >> "${OUTPUTPATH}/DOC/${path#$COMPANYPATH}${filename%.doc}.txt"
done

input

/home/user/test/2218-0/test.doc
/home/user/test/2218-0/Test2.DOC

Expected output

/home/user/output/test/DOC/2218-0/test.txt
/home/user/output/test/DOC/2218-0/Test2.txt

There are no duplicated files.

Asked By: unixcandles

||

You don’t. Just remove the extension entirely instead:

find "${COMPANYPATH}" -iname '*.doc' | while read -r file; do
    echo "${file}"
    filename=$(basename "${file}")
    name="${file%.*}"
    path="${file%/*}"
    noComPath="${path#$COMPANYPATH/}"
    mkdir -p "${OUTPUTPATH}/DOC/$noComPath"
    catdoc "${file}" >> "${OUTPUTPATH}/DOC/$noComPath/$name.txt"
done

The expression name="${file%.*}" will set the variable name to the name of the file with the anything from the last . to the end removed. If there are many ., only the last is removed:

$ foo=file.foo.bar.DoC
$ echo "${foo%.*}"
file.foo.bar

And here is a more robust version that can deal with arbitrary file names (your would fail if a file name contains a newline character for instance):

LC_ALL=C find "${COMPANYPATH}" -iname '*.doc' -type f -print0 |
  while IFS= read -r -d '' file; do
    printf>&2 'Processing "%s"n' "${file}"
    basename="${file##*/}"
    dirname="${file%/*}"
    rootname="${basename%.*}"
    targetdir=${OUTPUTPATH}/DOC/${dirname#"${COMPANYPATH}/"}
    mkdir -p -- "${targetdir}" &&
      catdoc -- "${file}" >> "${targetdir}/${rootname}.txt"
  done
Answered By: terdon

I don’t think you can make the pattern match in ${filename%.doc} case-insensitive in Bash. You could do it zsh, with ${filename%(#i).doc} (requires setopt extendedglob enabled) or ksh93 with ${filename%~(i:.doc)}. Bash’s nocasematch doesn’t help, it only works in case and [[ .. ]] constructs.

In any POSIX shell, there’s always the workaround of explicitly listing both uppercase and lowercase characters with ${filename%.[dD][oO][cC]}. Or just remove the three last characters with ${filename%.???}, knowing find only gives you the correct ones.

Then again ${filename%.*} should only remove the shortest matching part, so that should also not be a problem. (%% would remove the longest.)

zsh:

% setopt extendedglob
% filename=foo.bar.DoC
% echo ${filename%.(#i)doc}.txt
foo.bar.txt

sh/Bash:

$ filename=foo.bar.DoC
$ echo "${filename%.[dD][oO][cC]}.txt"
foo.bar.txt
$ echo "${filename%.*}.txt"
foo.bar.txt
Answered By: ilkkachu
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.