What Is poppler-utils ?
As alluded to in the introduction for this article, we need to install a small utility set named poppler-utils to help us convert PDF files to images.
The poppler-utils utility set allows us to convert images to PDF, and PDF to images.
Installing poppler-utils
To install poppler-utils on your Debian/Apt based Linux distribution (Like Ubuntu and Mint), do:
sudo apt install poppler-utils
To install poppler-utils on your RedHat/Yum based Linux distribution (Like RedHat and Fedora), do:
sudo yum install poppler-utils
Converting PDF to images
The command required is simple and straightforward:
With the pdftoppm command we can convert PDF to images. We specify that we want a PNG file for the output format (by using -png) and that our input file is test.pdf.
The output file we specify as test. pdftoppm will automatically add a page number suffix (like -1) and an extension (based on the earlier -png option passed).
The output file name will thus be test-1.png, as we can verify next:
Any subsequent pages would be test-2.png etc. The eog command (if eog is installed) will open the file for you so you can review the output, though you can use any other image handling program you like.
Batch Processing of PDF Files to Images
We can make a one-liner command to do batch processing of all PDF files with a given name to images. We could then simply add this line to a small script .sh file and automate it further, or we can just use it at the command line whenever we need to convert a large amount of PDF files to images.
In this command, we first obtain a directory listing for all PDF files which have a name that starts with test and ends with .pdf, using the ls –color=never test*.pdf.
The –color=never is important, as shell color coding symbols (if active, as they are by default) may sometimes confuse xargs.
Next we use a simple sed substitute command to replace a literal dot followed by pdf to nothing. In other words, we remove the .pdf file extension.
This gives us the benefit of adding it back later only where needed, i.e. when specifying the input file for pdftoppm, but not when specifying the output file for the same pdftoppm command, much alike to our earlier example above.
Finally, we use xargs to sent each pdf filename (minus the .pdf) to pdftoppm one by one. We use the -I option to xargs which allows us to specify any input received (i.e. the shortened pdf filenames) by simply using {} in the command that follows.
As you can see, our pdftoppm command now looks much alike to the first example, with each individual pdf file name as input (re-suffixed with .pdf), and as output the pdf filename without .pdf.
Let’s execute it:
This worked fine: the three PDF files, all with one page each, were converted to three individual .png files (one image per page and in this case per PDF as each PDF had only one page), all aptly named and suffixed correctly.
As an alternative to the -png option, one can also use -jpeg to generate JPEG files instead. Use pdftoppm –help or man pdftoppm to see a full list of options.
Wrapping up
In this article we saw how easy and straightforward it can be to convert PDF files to image files, and that directly from the Linux command line! We also look at a straightforward way to automate this process. Enjoy!