PDF editing
PDF arranger
Can make booklets. Arrange, delete, add pages to pdfs.
pdfbook
Use pdfbook2 from texlive-extra-utils to create a booklet.
pdfjam
If some pdf program complains about different sized pages, use pdfjam to make them all the same size
pdfjam --outfile out.pdf --paper a5paper in.pdf
Jpdf Tweak
Swiss army knife for pdfs. Watermarks, page numbering, bookmarks, attachments, document info, encryption, etc.
OCRmyPDF
OCR on images and pdfs.
ocrmypdf no_ocr.pdf ocr.pdf --sidecar ocr.txt -l deu+eng --force-ocr
If --force-ocr is issued, then all pages will be rasterized to images, discarding any hidden OCR text, rasterizing any printable text, and flattening form fields or interactive objects into their visual representation. This is useful for redoing OCR, for fixing OCR text with a damaged character map (text is selectable but not searchable), and destroying redacted information.
Installing additional language packs¶
OCRmyPDF uses Tesseract for OCR, and relies on its language packs for all languages. On most platforms, English is installed with Tesseract by default, but not always.
Tesseract supports most languages. Languages are identified by standardized three-letter codes (called ISO 639-2 Alpha-3). Tesseract’s documentation also lists the three-letter code for your language. Some are anglicized, e.g. Spanish is spa rather than esp, while others are not, e.g. German is deu and French is fra.
After you have installed a language pack, you can use it with ocrmypdf -l
For more info, see: https://ocrmypdf.readthedocs.io/en/latest/languages.html#lang-packs
pdf.tocgen
Generate table of contents.
To write a toc file back to doc.pdf:
pdftocio doc.pdf < toc
https://github.com/Krasjet/pdf.tocgen