The Portable Document Format

The Portable Document Format (PDF) solved a problem. When you created a document on a computer and wanted to share it with someone else, sending them the document didn’t always work.

Even if they had the same software package you’d used to create your document, they might not have the same fonts installed on their computer that you had on yours. They’d be able to open the document but it would look wrong.

If they didn’t have a copy of the software you used to create the package they wouldn’t be able to open it at all. If you used software that was only available on Linux, it was pointless sending that document to someone who only used Windows.

Adobe created a new file format in 1992 and called it the portable document format. Documents created to that standard—ISO 32000—contain the images and fonts needed to correctly render the contents of the file. PDF files can be opened by PDF viewers on any platform. It was a cross-platform, simple, and elegant solution.

A PDF file isn’t intended to be malleable like a word-processor document. They don’t readily lend themselves to editing. If you need to change the content of a PDF, it’s always better to go back to the source material, edit that, and generate a new PDF. In contrast to trying to change the content, structural manipulations can be performed on PDF files with relative ease.

Here are some ways to create PDF files on Linux, and how to perform some of the transforms that can be applied to them.

Creating PDF Files on Linux

Many of the applications available on Linux can generate PDF files directly. LibreOffice has a button right on the toolbar that generates a PDF of the current document. It couldn’t be easier.

For fine-grained control of PDF creation, the Scribus desktop publishing application is hard to beat.

If you need to create documents with scientific or mathematical content, perhaps for submission to academic journals, an application that uses LaTeX, such as Texmaker, will be perfect for you.

If you prefer a plain-text workflow, perhaps using Markdown, you can use pandoc to convert to, and from, a great many file formats, including PDF. We have a guide dedicated to pandoc but a simple example will show you how easy it is to use.

Install Texmaker first. pandoc relies on some LaTeX libraries for PDF generation. Installing Texmaker is a convenient way to meet those dependencies.

The -o (output) option is used to specify the type of file that will be created. The “raw-notes.md” file is a plain-text Markdown file.

If we open the “new.pdf” file in a PDF viewer we see that it is a correctly-formed PDF.

The qpdf Command

The  qpdf  command allows you to manipulate existing PDF files, whilst preserving their content. The changes you can make are structural. With qpdf you can perform tasks such as merging PDF files, extracting pages, rotating pages, and setting and removing encryption.

To install qpdf on Ubuntu use this command:

The command on Fedora is:

On Manjaro you must type:

Merging PDF Files

At first, some of the qpdf command line syntax may seem confusing. For example, many of the commands expect an input PDF file.

If a command doesn’t require one, you need to use the –empty option instead. This tells qpdf not to expect an input file. The –pages option lets you choose pages. If you just provide the PDF names, all pages are used.

To combine two PDF files to form a new PDF file, use this command format.

This command is made up of:

qpdf: Calls the qpdf command. –empty: Tells qpdf there is no input PDF. You could argue that “first. pdf” and “second. pdf” are input files, but qpdf considers them to be command line parameters. –pages: Tells qpdf we’re going to be working with pages. first. pdf second. pdf: The two files we’re going to extract the pages from. We’ve not used page ranges, so all pages will be used. —: Indicates the end of the command options. combined. pdf: The name of the PDF that will be created.

If we look for PDF files with ls, we’ll see our two original files—untouched—and the new PDF called “combined.pdf.”

There are two pages in “first.pdf” and one page in “second.pdf.” The new PDF file has three pages.

You can use wildcards instead of listing a great many source files. This command creates a new file called “all.pdf” that contains all the PDF files in the current directory.

We can use page ranges by adding the page numbers or ranges behind the file names the pages are to be extracted from.

This is will extract pages one and two from “first.pdf” and page two from “second.pdf.” Note that if “combined.pdf” already exists it isn’t overwritten. It has the selected pages added to it.

Page ranges can be as detailed as you like. Here, we’re asking for a very specific set of pages from a large PDF file, and we’re creating a summary PDF file.

The output file, “summary.pdf” contains pages 1 to 3, 7, 11, 18 to 21, and 55 from the input PDF file. This means there are 10 pages in “summary.pdf”

We can see that page 10 is page 55 from the source PDF.

Splitting PDF Files

The opposite of merging PDF files is splitting PDF files. To split a PDF into separate PDF files each holding a single page, the syntax is simple.

The file we’re splitting is “summary.pdf”, and the output file is given as “page.pdf.” This is used as the base name. Each new file has a number added to the base name. The –split-pages option tells qpdf what type of action we’re performing.

The output is a series of sequentially numbered PDF files.

If you don’t want to split out every page, use page ranges to select the pages you want.

If we issue this next command, we’ll split out a collection of single-page PDF files. The page ranges are used to specify the pages or ranges we want, but each page is still stored in a single PDF.

The extracted pages have names based on “section.pdf” with a sequential number added to them.

If you want to extract a page range and have it stored in a single PDF, use a command of this form. Note that we don’t include the –split-pages option. Effectively, what we’re doing here is a PDF merge, but we’re only “merging” pages from one source file.

This creates a single, multi-page PDF called “chapter2.pdf.”

Rotating Pages

To rotate a page, we create a new PDF that’s the same as the input PDF with the specified page rotated.

We use the –rotate option to do this. The +90 means rotate the page 90 degrees clockwise. You can rotate a page 90, 180, or 270 degrees. You can also specify the rotation in degrees anticlockwise, by using a negative number, but there’s little need to do so. A rotation of -90 is the same as a rotation +270.

The number separated from the rotation by a colon “:” is the number of the page you want to rotate. This could be a list of page numbers and page ranges, but we’re just rotating the first page. To rotate all pages use a page range of 1-z.

The first page has been rotated for us.

Encrypting and Decrypting

PDF documents can be encrypted so that they require a password to open them. That password is called the user password. There’s another password that’s required to change the security and other permission settings for a PDF. It’s called the owner password.

To encrypt a PDF we need to use the –encrypt option and provide both passwords. The user password comes first on the command line.

We also specify the strength of encryption to use. You’d only need to drop from 256-bit encryption to 128-bit if you want to support very old PDF file viewers. We suggest you stick with 256-bit encryption.

We’re going to create an encrypted version of the “summary.pdf” called “secret.pdf.”

When we try to open the PDF, the PDF viewer prompts us for a password. Entering the user password authorizes the viewer to open the file.

Remember that qpdf doesn’t change the existing PDF. It creates a new one with the changes we’ve asked it to make. So if you make an encrypted PDF you’ll still have the original, unencrypted version. Depending on your circumstances you might want to delete the original PDF or safely store it away.

To decrypt a file, use the –decrypt option. Obviously, you must know the owner password for this to work. We need to use the –password option to identify the password.

The “unlocked.pdf” can be opened without a password.

qpdf is an Excellent Tool

We’re deeply impressed with qpdf. It provides a flexible and richly featured toolset for working with PDF files. And it is very fast, too.

Check out their well-written and detailed documentation to see just how much more it can do.