How to use the command 'mutool' (with examples)
MuPDF’s ‘mutool’ command is a versatile utility designed to work with PDF files. It facilitates various operations such as conversion, extraction, and querying information from PDF documents. Its capability to handle a broad range of PDF-related tasks makes it highly valuable for anyone working extensively with PDFs, whether for professional, academic, or personal purposes.
Use case 1: Convert a range of pages to PNGs (with examples)
Code:
mutool convert -o path/to/output%nd.png path/to/input.pdf 1-10
Motivation: Converting a range of PDF pages into image files can be incredibly useful when you need to share a specific section of a document without giving access to the full content. This is particularly beneficial for presentation purposes or when dealing with sensitive information where only visual data is required. By converting PDF pages to PNGs, users can insert the resulting images into slides, websites, or other documents with ease.
Explanation:
mutool convert
: The main operation we are performing here is conversion.-o path/to/output%nd.png
: The-o
option specifies the output file path pattern. The%nd
placeholder is replaced with a sequence number formatted according to the print modifier like%d
or%2d
to ensure each image file is uniquely named.path/to/input.pdf
: This is the path to the input PDF file that contains the pages to be converted.1-10
: This final argument specifies the range of pages to be converted into images, from page 1 to page 10.
Example Output:
By executing this command, the pages from 1 to 10 of the specified PDF are transformed into individual PNG image files, which could look something like: output1.png, output2.png, ..., output10.png
.
Use case 2: Convert one or more pages of a PDF into text in stdout
Code:
mutool draw -F txt path/to/input.pdf 2,3,5,...
Motivation: Extracting text from PDF files and displaying it in the terminal can be advantageous when performing quick searches, text analysis, or when integrating PDF text into scripts or back-end processing pipelines. It allows users to bypass GUI-based solutions, granting flexibility and efficiency for those who are comfortable with command-line utilities.
Explanation:
mutool draw
: Initiates the draw command which is responsible for rendering or exporting content.-F txt
: Specifies the format to extract, in this case, text.path/to/input.pdf
: Indicates the PDF file from which pages will be extracted.2,3,5,...
: Defines specific pages to be converted into text and displayed in the standard output.
Example Output: The extracted text from pages 2, 3, and 5 appears directly in the terminal, allowing you to view or further manipulate it instantly.
Use case 3: Concatenate multiple PDF files
Code:
mutool merge -o path/to/output.pdf path/to/input1.pdf path/to/input2.pdf ...
Motivation: Merging multiple PDF files into a single document streamlines the process of consolidating reports, presentations, or any document-heavy workflow. This is essential for creating comprehensive documents from several sources, ensuring uniformity and ease of access.
Explanation:
mutool merge
: This command tells mutool to merge multiple PDFs.-o path/to/output.pdf
: Specifies the output file in which the merged PDFs will be stored.path/to/input1.pdf path/to/input2.pdf ...
: Lists the paths to the PDF files to be merged, in their desired order of appearance in the resultant document.
Example Output:
The result is a newly created PDF document named output.pdf
containing the concatenated content of the specified input PDF files in the order they were listed.
Use case 4: Query information about all content embedded in a PDF
Code:
mutool info path/to/input.pdf
Motivation: Understanding the structure and embedded elements within a PDF file can be valuable for auditing, compliance, security checks, or when troubleshooting PDF content issues. It provides insights into metadata, fonts, images, or other resources that might affect the document’s behavior across different viewers.
Explanation:
mutool info
: The command used to query detailed information about the PDF.path/to/input.pdf
: Refers to the PDF file being evaluated.
Example Output: A comprehensive breakdown of the PDF’s internals is displayed, detailing elements like page count, embedded fonts, images, and metadata, similar to what you might find in a PDF’s properties section.
Use case 5: Extract all images, fonts, and resources embedded in a PDF to the current directory
Code:
mutool extract path/to/input.pdf
Motivation: Extracting all embedded assets from a PDF can be crucial for design, editing, or archiving purposes. This allows reuse of high-quality, original materials like images or fonts independently of the original document, which can save time and maintain the integrity of these resources for integration into new projects.
Explanation:
mutool extract
: Initiates extraction of embedded resources within the PDF.path/to/input.pdf
: Indicates the file from which resources will be extracted.
Example Output: The assets such as images and fonts are extracted from the PDF and saved as separate files in the current working directory, ready for use or cataloguing.
Use case 6: Show the outline (table of contents) of a PDF
Code:
mutool show path/to/input.pdf outline
Motivation: Viewing the outline of a PDF simplifies navigation through complex documents. It is especially beneficial for lengthy documents where quick access to specific sections is needed, such as academic theses or technical manuals.
Explanation:
mutool show
: Executes the show command meant for displaying details about a document.path/to/input.pdf
: The subject PDF file in question.outline
: Specifies that you want to see the table of contents or outline.
Example Output: Outputs a structured list representing the document’s outline or table of contents, facilitating easier navigation through sections or chapters.
Conclusion:
MuPDF’s mutool
command-line tool offers diverse functionalities for efficient handling and manipulation of PDF files. By converting pages into images, extracting text, merging documents, and more, it provides powerful tools for managing digital documents in ways that enhance productivity and information accessibility.