Contents
Points of Interests
The following is a list of common steps executed by pdftotext to extract text from a PDF file, and in which file the corresponding code is located. Note that the stated locations refer to commit 065dca3 and may have changed by now.
Opening and reading a PDF file
PDFDoc::PDFDoc(), line 144ff
HOWTOs
Create a PDF with human-readable objects + content streams
Put the following in the preamble of your TeX file (between \documentclass{} and \begin{document}):
Create a PDF with specified crop box
Put the following in the preamble of your TeX file (between \documentclass{} and \begin{document}):
Create A PDF without page numbering
Put the following in the preamble of your TeX file:
1 \thispagestyle{empty}