How to use the command pdftotext (with examples)
The command pdftotext
is a tool that allows users to convert PDF files to plain text format. It is a command-line tool and provides various options for customization such as preserving the layout and extracting specific pages from a PDF.
Use case 1: Convert filename.pdf
to plain text and print it to stdout
Code:
pdftotext filename.pdf -
Motivation: This use case is useful when you want to view the contents of a PDF file directly in the command-line interface or redirect it to another command for further processing.
Explanation:
pdftotext
: The command-name to invoke the tool.filename.pdf
: The input PDF file to be converted.-
: The hyphen sign indicates the output will be printed to stdout (the command-line interface).
Example output:
This is an example text extracted from the PDF file.
Use case 2: Convert filename.pdf
to plain text and save it as filename.txt
Code:
pdftotext filename.pdf
Motivation: This use case is useful when you want to convert the contents of a PDF file to plain text and save it as a separate file.
Explanation:
pdftotext
: The command-name to invoke the tool.filename.pdf
: The input PDF file to be converted.
Example output:
This use case will create a new file named filename.txt
containing the plain text extracted from the PDF.
Use case 3: Convert filename.pdf
to plain text and preserve the layout
Code:
pdftotext -layout filename.pdf
Motivation: This use case is useful when you want to convert a PDF file while maintaining its original layout, including tables, columns, and other structural elements.
Explanation:
pdftotext
: The command-name to invoke the tool.-layout
: An option flag that tells the command to preserve the layout of the PDF during conversion.filename.pdf
: The input PDF file to be converted.
Example output: The converted plain text will retain the original layout from the PDF, ensuring the structure and visual elements are maintained.
Use case 4: Convert input.pdf
to plain text and save it as output.txt
Code:
pdftotext input.pdf output.txt
Motivation: This use case is useful when you want to convert a specific PDF file to plain text format and save it with a custom output file name.
Explanation:
pdftotext
: The command-name to invoke the tool.input.pdf
: The input PDF file to be converted.output.txt
: The desired name of the output plain text file.
Example output:
The converted plain text will be saved as output.txt
, containing the extracted text from the PDF file.
Use case 5: Convert pages 2, 3 and 4 of input.pdf
to plain text and save them as output.txt
Code:
pdftotext -f 2 -l 4 input.pdf output.txt
Motivation: This use case is useful when you want to extract specific pages from a PDF and convert them to plain text format.
Explanation:
pdftotext
: The command-name to invoke the tool.-f 2
: An option flag specifying the starting page number to be converted (in this case, page 2).-l 4
: An option flag specifying the ending page number to be converted (in this case, page 4).input.pdf
: The input PDF file to be converted.output.txt
: The desired name of the output plain text file.
Example output:
The converted plain text will include the content of pages 2, 3, and 4 from the PDF file, saved as output.txt
.
Conclusion:
The pdftotext
command provides a convenient way to convert PDF files to plain text format with various customization options. Whether you need to view the contents of a PDF, extract specific pages, or preserve the original layout, this command offers flexibility for different use cases.