Getting the word count of a pdf document in Evince
Is there any way I can get the word count of a PDF document that I’m viewing in Evince, Ubuntu’s default pdf viewer? I’m able to convert the documents to text files and get the word count from the terminal, but I’d quite like to be able to quickly get at them without having to use the terminal. Is there any plugin that can do this, or is it already built in and I’m just missing it?
P.S. I’d prefer not to change my viewer as Evince is the default PDF viewer in Ubuntu, and I’d quite like to do as much as possible using the default applications since a lot of them, Evince included, are really nice.
I don’t believe that is possible (well it’s technically possible but hasn’t been implemented).
You have to remember Evince is a document viewer and a word count is a feature more usually required in an editor (yes I know this isn’t always the case).
You might like to contact the Evince developers and ask if they would have any interest in implementing this feature.
You can do this via command line:
pdftotext filename.pdf - | tr -d '.' | wc -w
How about a quick bash script requiring zenity and evince. When called without an argument, it’ll give you a dialogue box so you can choose a file. When called with an argument (or after said dialogue box), it’ll both open the file in evince and give you a dialogue box with a word count.
In other words, copy the following into a text file, called evince-word-count.sh
or something, save it somewhere in your path (e.g., ~/bin/
), make it executable (either through Nautilus’s right click and properties or with chmod +x ~/bin/evince-word-count.sh
),
#!/bin/bash
if [ "$#" -gt "0" ] ; then
filename="$1"
else
filename="$(zenity --file-selection)"
fi
evince "$filename" &
zenity --info --text "This PDF has $(pdftotext "$filename" - | tr -d '.' | wc -w) words"
exit 0
Now, right click on some on some PDF in nautilus, choose “Open with…” and then have it open with evince-word-count.sh. Now, when you open a PDF, it’ll both open in evince, and give you a word count.
A response from Olaf Leidinger on the Evince mailing list:
I think such a feature is better suited for document editors, as they have
more information on the document as a plain viewer and counting words is
trivial.Take a PDF file as an example. What you see as text might actually be some kind
of vector graphic shape. Even if the text is contained as such in the PDF file,
those words you see might be composed of multiple “draw text at position
(y,x)”-commands — e.g. in case of umlauts or end of line. So a single word
might count as multiple words. Therefore I think it might be hard to implement
such a feature reliably. Have a look at pdftotext to see what I mean.