PDF editing

PDF arranger

Can make booklets. Arrange, delete, add pages to pdfs.

pdfbook

Use pdfbook2 from texlive-extra-utils to create a booklet.

pdfjam

If some pdf program complains about different sized pages, use pdfjam to make them all the same size

pdfjam --outfile out.pdf --paper a5paper in.pdf

Jpdf Tweak

Swiss army knife for pdfs. Watermarks, page numbering, bookmarks, attachments, document info, encryption, etc.

OCRmyPDF

OCR on images and pdfs.

ocrmypdf no_ocr.pdf ocr.pdf --sidecar ocr.txt -l deu+eng --force-ocr

If --force-ocr is issued, then all pages will be rasterized to images, discarding any hidden OCR text, rasterizing any printable text, and flattening form fields or interactive objects into their visual representation. This is useful for redoing OCR, for fixing OCR text with a damaged character map (text is selectable but not searchable), and destroying redacted information.

Installing additional language packs¶

OCRmyPDF uses Tesseract for OCR, and relies on its language packs for all languages. On most platforms, English is installed with Tesseract by default, but not always.

Tesseract supports most languages. Languages are identified by standardized three-letter codes (called ISO 639-2 Alpha-3). Tesseract’s documentation also lists the three-letter code for your language. Some are anglicized, e.g. Spanish is spa rather than esp, while others are not, e.g. German is deu and French is fra.

After you have installed a language pack, you can use it with ocrmypdf -l , for example ocrmypdf -l spa. For multilingual documents, you can specify all languages to be expected, e.g. ocrmypdf -l eng+fra for English and French. English is assumed by default unless other language(s) are specified.

For more info, see: https://ocrmypdf.readthedocs.io/en/latest/languages.html#lang-packs

pdf.tocgen

Generate table of contents.

To write a toc file back to doc.pdf:

pdftocio doc.pdf < toc

https://github.com/Krasjet/pdf.tocgen