Using the Tesseract Command for OCR (with examples)

Using the Tesseract Command for OCR (with examples)

Tesseract is an OCR (Optical Character Recognition) engine that allows you to extract text from images. In this article, we will explore various use cases of the tesseract command by providing code examples and explanations for each case.

1: Recognize text in an image and save it to output.txt

tesseract image.png output

Motivation: The tesseract command is used to recognize and extract text from an input image. In this particular case, we specify the input image image.png and the output file name output.txt. The output file will be saved with the recognized text.

Explanation: The command tesseract image.png output performs OCR on the image image.png and saves the recognized text to the file output.txt. Note that the .txt extension is added automatically to the output file name.

Example Output: Suppose the input image image.png contains the text “Hello, World!”. After running the command, the recognized text will be saved in output.txt as “Hello, World!”.

2: Specify a custom language with an ISO 639-2 code

tesseract -l deu image.png output

Motivation: By default, Tesseract recognizes text in English. However, it also supports several other languages. In this example, we specify the language using an ISO 639-2 code (e.g. “deu” for German). This allows us to perform OCR on images containing text in a specific language.

Explanation: The -l flag is used to specify the language for OCR. In this case, we use the ISO 639-2 code deu to indicate German. The command tesseract -l deu image.png output performs OCR on the image image.png with the specified language and saves the recognized text to output.txt.

Example Output: Suppose the input image image.png contains German text “Guten Tag!”. After running the command, the recognized text will be saved in output.txt as “Guten Tag!”.

3: List the ISO 639-2 codes of available languages

tesseract --list-langs

Motivation: Tesseract supports multiple languages for OCR. It is often useful to know the available languages and their corresponding ISO 639-2 codes. This allows us to specify the language correctly when performing OCR.

Explanation: The command tesseract --list-langs lists all the available languages that Tesseract supports for OCR. It provides a convenient way to obtain the ISO 639-2 codes for each language.

Example Output: Running the command tesseract --list-langs will display a list of languages supported by Tesseract along with their ISO 639-2 codes. For example:

List of available languages:
eng (English)
deu (German)
fra (French)
...

4: Specify a custom page segmentation mode

tesseract -psm 0_to_10 image.png output

Motivation: Tesseract provides different page segmentation modes to handle various types of input images. By specifying a custom page segmentation mode, we can improve the accuracy of the OCR results for specific scenarios.

Explanation: The -psm flag is used to specify the page segmentation mode. In this case, we use the value 0_to_10 to indicate a custom page segmentation mode. The command tesseract -psm 0_to_10 image.png output performs OCR on the image image.png using the specified page segmentation mode and saves the recognized text to output.txt.

Example Output: The custom page segmentation mode 0_to_10 may yield different results depending on the input image. For example, if the input image contains a single uniform block of text, the recognized text will be accurate. However, if the input image contains multiple columns or an irregular layout, the results may not be as accurate.

5: List page segmentation modes and their descriptions

tesseract --help-psm

Motivation: Tesseract provides various page segmentation modes to handle different types of input images. It is useful to know the available page segmentation modes and their descriptions to choose the appropriate mode for specific OCR tasks.

Explanation: The command tesseract --help-psm lists all the available page segmentation modes along with their descriptions. This information helps users understand the behavior of each mode and select the most suitable mode for their OCR needs.

Example Output: Running the command tesseract --help-psm will display a list of page segmentation modes and their corresponding descriptions. For example:

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  ...

By understanding and utilizing the various features provided by the tesseract command, you can enhance your OCR tasks and achieve better text recognition accuracy.

Related Posts

Using the "history" Command (with examples)

Using the "history" Command (with examples)

The “history” command is a built-in command in the Bash shell that allows users to view and manipulate the command-line history.

Read More
How to use the command 'docker logs' (with examples)

How to use the command 'docker logs' (with examples)

The ‘docker logs’ command is used to print the logs of a container.

Read More
How to use the command `paci` (with examples)

How to use the command `paci` (with examples)

paci is a package manager for Bash scripts. It enables users to easily manage and install packages from a collection of scripts.

Read More