>
= Points of Interests =
The following is a list of common steps executed by ''pdftotext'' to extract text from a PDF file, and in which file the corresponding code is located. Note that the stated locations refer to commit [[https://github.com/freedesktop/poppler/tree/065dca3816db3979dfacdc2f8592abed2ff6859a|065dca3]] and may have changed by now.
* '''Opening and reading a PDF file''' <
> [[https://github.com/freedesktop/poppler/blob/065dca3816db3979dfacdc2f8592abed2ff6859a/poppler/PDFDoc.cc#L144|PDFDoc::PDFDoc(), line 144ff]]
* '''Parsing the PDF version number from the PDF file header''' <
> [[https://github.com/freedesktop/poppler/blob/065dca3816db3979dfacdc2f8592abed2ff6859a/poppler/PDFDoc.cc#L350|PDFDoc::checkHeader(), line 350]]
* '''Reading startxref''' <
> [[https://github.com/freedesktop/poppler/blob/065dca3816db3979dfacdc2f8592abed2ff6859a/poppler/PDFDoc.cc#L1999|PDFDoc::getStartXRef(), line 1999ff]]
= HOWTOs =
== Create a PDF with human-readable objects + content streams ==
Put the following in the preamble of your TeX file (between `\documentclass{}` and `\begin{document}`):
{{{#!highlight tex
\pdfobjcompresslevel=0
\pdfcompresslevel=0
}}}
== Create a PDF with specified crop box ==
Put the following in the preamble of your TeX file (between `\documentclass{}` and `\begin{document}`):
{{{#!highlight tex
\pdfpageattr{
/CropBox [50 50 100 100]
}
}}}
== Create A PDF without page numbering ==
Put the following in the preamble of your TeX file:
{{{#!highlight tex
\thispagestyle{empty}
}}}