How to define a multiple page ranges for pdftk with a bash variable
I’m using Arch linux, Openbox window manager, and bash.
Everything is up to date with the latest versions.
Can anyone tell me why I can’t get the "$page_range"
variable to show up within pdftk
when I specify a couple of page ranges as 3-5 7-9
?
When I specify only one page range 3-5
in my yad pop-up box everything works as it should.
pdftk does allow more than one page range to be defined within the command. Indeed when I type the command out on the command line without using bash variables within it, pdftk works as expected taking the page ranges 3-5 7-9
. Just not when I contain this value within the variable "$page_range"
.
All I want to do is extract page ranges 3-5 and 7-9 from file
/home/$USER/my_file.pdf
into another pdf file
using the variable $page_range
to define my ranges.
Here is my simple script.
#!/bin/bash
# collect the values with yad
extract_values=$(yad --form --width=200
--title="Enter the page ranges you wish to extract"
--text="nn Enter the page ranges you wish to extractn as eg 301-302n or 301-302 305-306n for grouping"
--field="Page range":text "11-13 21-23"
--button="Cancel!gtk-close":2
--button="Edit script":1
--button="Submit":0)
# strip out the values from the string
page_range=$(echo $extract_values | cut -d '|' -f 1)
echo $page_range
# produce a unique file extender
page_range_slugify="$(echo "$page_range" | sed 's/ /_/g')"
echo;echo $page_range_slugify
echo
# specify the filename
f=/home/$USER/my_file.pdf
# get path and file name without pdf extension
fz="${f%.*}"
# check everything is as it should be
yad --text="n page range = $page_rangen page_range_slugify = $page_range_slugifyn file + path without file extension = $fznn"
# below works only for one range but will not expand for two page ranges
pdftk "$f" cat "$page_range" output "$fz"_"$page_range_slugify".pdf
# below takes one range only as above
#pdftk "$f" cat "$(printf %s "$page_range")" output "$fz"_"$page_range_slugify".pdf
# below takes both ranges when ranges are directly placed within the command
#pdftk "$f" cat 3-5 7-9 output "$fz"_"$page_range_slugify".pdf
The solution is to not double quote the variable $page_range
.
At least it gets the script working functionally.
ie
do this $page_range
not this "$page_range"
For some reason pdftk does not like " "
expansion of that particular variable.
I was guessing that pdftk
was eating one of the quotation marks and not the other at that position because of some bug which causes it to fail.
But that can’t be that because
page_range="3-5"
expands correctly as "$page_range"
double quoted no space
but
page_range="3-5 7-9"
does not expand correctly as "$page_range"
double quoted
So it must be something to do with the space in the middle of the page ranges when double quoting and the way this is expanded or the way pdftk sees it.
Anyone any ideas?
Even if it is all now working without the quotation marks around $page_range
variable this is very odd.
Beause normally quotation marks around the variable in bash is safe. We are all used to doing this for the eventuality that a file path and name we are processing contains dreaded spaces!
It’s thus very odd that not double quoting handles the spaces and quoting does not.
How strange.
Another thought is that space expansion may be to a particular type of space, in a particular character encoding format that pdftk does not like.
This is happening because you are doing the right thing, you are quoting your variables. However, because they are quoted, that means the two ranges are passed as a single string to pdftk
and it expects two or more strings separated by spaces. In this specific case, where you know and control what the variable’s value is, you might be able to get away with no quoting. But not in all cases, and this looks like you’re asking users for input so they could pass anything to the script, making that a security risk, so the clean solution is to use an array instead. Try this:
page_range=( $(printf '%sn' "$extract_values" | cut -d '|' -f 1) )
You can then pass that as "${page_range[@]}"
and have both the benefits of safely quoting your variables, and the ease of use of having multiple ranges in a variable.
So, the relevant lines in your script become:
page_range=( $(printf '%sn' "$extract_values" | cut -d '|' -f 1) )
[ . . . ]
## With thanks to https://stackoverflow.com/a/9429887/1081936
page_range_slugify="$(IFS="_" ; printf '%sn' "${page_range[*]}")"
[ . . . ]
pdftk "$f" cat "${page_range[@]}" output "${fz}_$page_range_slugify".pdf