286
Comment:
|
2098
|
Deletions are marked like this. | Additions are marked like this. |
Line 2: | Line 2: |
= Points of Interests = The following is a list of common steps executed by ''pdftotext'' to extract text from a PDF file, and in which file the corresponding code is located. Note that the stated locations refer to commit [[https://github.com/freedesktop/poppler/tree/065dca3816db3979dfacdc2f8592abed2ff6859a|065dca3]] and may have changed by now. * '''Opening and reading the PDF file''' <<BR>> [[https://github.com/freedesktop/poppler/blob/065dca3816db3979dfacdc2f8592abed2ff6859a/poppler/PDFDoc.cc#L144|PDFDoc::PDFDoc(), line 144ff]] * '''Parsing the PDF version number from the PDF file header''' <<BR>> [[https://github.com/freedesktop/poppler/blob/065dca3816db3979dfacdc2f8592abed2ff6859a/poppler/PDFDoc.cc#L350|PDFDoc::checkHeader(), line 350]] * '''Parsing startxref''' <<BR>> [[https://github.com/freedesktop/poppler/blob/065dca3816db3979dfacdc2f8592abed2ff6859a/poppler/PDFDoc.cc#L1999|PDFDoc::getStartXRef(), line 1999ff]] * '''Parsing the xref table and the trailer dictionary''' <<BR>> [[https://github.com/freedesktop/poppler/blob/065dca3816db3979dfacdc2f8592abed2ff6859a/poppler/XRef.cc#L535|XRef::readXRefTable(), line 535]] * '''Parsing the document catalog''' <<BR>> [[https://github.com/freedesktop/poppler/blob/065dca3816db3979dfacdc2f8592abed2ff6859a/poppler/Catalog.cc#L76|Catalog::Catalog(), line 76]] |
|
Line 13: | Line 23: |
== Create a PDF with specified crop box == Put the following in the preamble of your TeX file (between `\documentclass{}` and `\begin{document}`): {{{#!highlight tex \pdfpageattr{ /CropBox [50 50 100 100] } }}} == Create A PDF without page numbering == Put the following in the preamble of your TeX file: {{{#!highlight tex \thispagestyle{empty} }}} |
Contents
Points of Interests
The following is a list of common steps executed by pdftotext to extract text from a PDF file, and in which file the corresponding code is located. Note that the stated locations refer to commit 065dca3 and may have changed by now.
Opening and reading the PDF file
PDFDoc::PDFDoc(), line 144ffParsing the PDF version number from the PDF file header
PDFDoc::checkHeader(), line 350Parsing startxref
PDFDoc::getStartXRef(), line 1999ffParsing the xref table and the trailer dictionary
XRef::readXRefTable(), line 535Parsing the document catalog
Catalog::Catalog(), line 76
HOWTOs
Create a PDF with human-readable objects + content streams
Put the following in the preamble of your TeX file (between \documentclass{} and \begin{document}):
Create a PDF with specified crop box
Put the following in the preamble of your TeX file (between \documentclass{} and \begin{document}):
Create A PDF without page numbering
Put the following in the preamble of your TeX file:
1 \thispagestyle{empty}