AD Research Wiki:

Points of Interests

The following is a list of common steps executed by pdftotext to extract text from a PDF file, and in which file the corresponding code is located. Note that the stated locations refer to commit 065dca3 and may have changed by now.

HOWTOs

Create a PDF with human-readable objects + content streams

Put the following in the preamble of your TeX file (between \documentclass{} and \begin{document}):

   1 \pdfobjcompresslevel=0 
   2 \pdfcompresslevel=0

Create a PDF with specified crop box

Put the following in the preamble of your TeX file (between \documentclass{} and \begin{document}):

   1 \pdfpageattr{
   2   /CropBox [50 50 100 100]
   3 }

Create A PDF without page numbering

Put the following in the preamble of your TeX file:

   1 \thispagestyle{empty}

AD Research Wiki: Projects/pdftotext++ (last edited 2023-01-17 17:35:29 by adpult)