Using the Tesseract Command for OCR (with examples)
Tesseract is an OCR (Optical Character Recognition) engine that allows you to extract text from images. In this article, we will explore various use cases of the tesseract
command by providing code examples and explanations for each case.
1: Recognize text in an image and save it to output.txt
tesseract image.png output
Motivation: The tesseract
command is used to recognize and extract text from an input image. In this particular case, we specify the input image image.png
and the output file name output.txt
. The output file will be saved with the recognized text.
Explanation: The command tesseract image.png output
performs OCR on the image image.png
and saves the recognized text to the file output.txt
. Note that the .txt
extension is added automatically to the output file name.
Example Output: Suppose the input image image.png
contains the text “Hello, World!”. After running the command, the recognized text will be saved in output.txt
as “Hello, World!”.
2: Specify a custom language with an ISO 639-2 code
tesseract -l deu image.png output
Motivation: By default, Tesseract recognizes text in English. However, it also supports several other languages. In this example, we specify the language using an ISO 639-2 code (e.g. “deu” for German). This allows us to perform OCR on images containing text in a specific language.
Explanation: The -l
flag is used to specify the language for OCR. In this case, we use the ISO 639-2 code deu
to indicate German. The command tesseract -l deu image.png output
performs OCR on the image image.png
with the specified language and saves the recognized text to output.txt
.
Example Output: Suppose the input image image.png
contains German text “Guten Tag!”. After running the command, the recognized text will be saved in output.txt
as “Guten Tag!”.
3: List the ISO 639-2 codes of available languages
tesseract --list-langs
Motivation: Tesseract supports multiple languages for OCR. It is often useful to know the available languages and their corresponding ISO 639-2 codes. This allows us to specify the language correctly when performing OCR.
Explanation: The command tesseract --list-langs
lists all the available languages that Tesseract supports for OCR. It provides a convenient way to obtain the ISO 639-2 codes for each language.
Example Output: Running the command tesseract --list-langs
will display a list of languages supported by Tesseract along with their ISO 639-2 codes. For example:
List of available languages:
eng (English)
deu (German)
fra (French)
...
4: Specify a custom page segmentation mode
tesseract -psm 0_to_10 image.png output
Motivation: Tesseract provides different page segmentation modes to handle various types of input images. By specifying a custom page segmentation mode, we can improve the accuracy of the OCR results for specific scenarios.
Explanation: The -psm
flag is used to specify the page segmentation mode. In this case, we use the value 0_to_10
to indicate a custom page segmentation mode. The command tesseract -psm 0_to_10 image.png output
performs OCR on the image image.png
using the specified page segmentation mode and saves the recognized text to output.txt
.
Example Output: The custom page segmentation mode 0_to_10
may yield different results depending on the input image. For example, if the input image contains a single uniform block of text, the recognized text will be accurate. However, if the input image contains multiple columns or an irregular layout, the results may not be as accurate.
5: List page segmentation modes and their descriptions
tesseract --help-psm
Motivation: Tesseract provides various page segmentation modes to handle different types of input images. It is useful to know the available page segmentation modes and their descriptions to choose the appropriate mode for specific OCR tasks.
Explanation: The command tesseract --help-psm
lists all the available page segmentation modes along with their descriptions. This information helps users understand the behavior of each mode and select the most suitable mode for their OCR needs.
Example Output: Running the command tesseract --help-psm
will display a list of page segmentation modes and their corresponding descriptions. For example:
Page segmentation modes:
0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR.
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
...
By understanding and utilizing the various features provided by the tesseract
command, you can enhance your OCR tasks and achieve better text recognition accuracy.