Using the GOCR Command for Optical Character Recognition (with examples)
GOCR is an Optical Character Recognition (OCR) tool designed to convert images of text into machine-encoded text. The tool can recognize characters using its OCR engine and prompt the user to store unknown patterns in a database for future reference. This makes it particularly useful for tasks such as digitizing printed documents, simplifying data extraction from images, and continuous improvement in recognizing characters over time by updating its database of patterns.
Use case 1: Recognize Characters and Output to a File
Code:
gocr -m 130 -p path/to/db_directory -i path/to/input_image.png -o path/to/output_file.txt
Motivation:
This use case is essential when you have a collection of images containing text that you need to convert into a digital format. By using this command, you store recognized text in a file for easy access, editing, or sharing. The ability of GOCR to learn from new characters and expand its recognition database ensures enhanced accuracy over time.
Explanation:
gocr
: This is the command to invoke the GOCR tool.-m 130
: The mode option130
enables the GOCR engine to create a new database, use an existing one, and extend it with newly recognized patterns.-p path/to/db_directory
: Specifies the path where the pattern database is stored or to be created. Ensure this path exists to avoid the database usage being silently skipped.-i path/to/input_image.png
: The path to the input image file containing the text that needs to be recognized.-o path/to/output_file.txt
: The path and name of the file where the recognized text will be saved.
Example Output:
Assuming the input image contains the text “Hello World”, the output file output_file.txt
will have:
Hello World
Use case 2: Recognize Only Numerical Characters
Code:
gocr -m 130 -p path/to/db_directory -i path/to/input_image.png -o path/to/output_file.txt -C "0..9"
Motivation:
This use case is particularly useful in scenarios where the image contains only numerical data, such as receipts, invoices, or any document where numbers are predominant and need extraction. By focusing exclusively on numerical characters, the command minimizes errors associated with misinterpreting non-numerical text.
Explanation:
gocr
: Initiates the GOCR tool.-m 130
: Allows the use, extension, and creation of the database.-p path/to/db_directory
: Specifies the database directory.-i path/to/input_image.png
: Input image file path.-o path/to/output_file.txt
: Output file path.-C "0..9"
: Assumes that all characters to be recognized in the image are numeric, ranging from 0 to 9.
Example Output:
If the image contains the text “09/30/2022 Invoice Total: 1830”, the output file will have:
09302022 1830
Use case 3: Recognize Characters with 100% Certainty
Code:
gocr -m 130 -p path/to/db_directory -i path/to/input_image.png -o path/to/output_file.txt -a 100
Motivation:
Using this command is crucial when accuracy is paramount, and you’re either dealing with highly legible text or are willing to manually confirm and add unknown characters to your database. This ensures precise data extraction, although it may increase instances where the tool asks for confirmation due to the high certainty threshold.
Explanation:
gocr
: Starts the GOCR OCR tool.-m 130
: Directs the command to create, use, and extend the database.-p path/to/db_directory
: Indicates the directory containing the pattern database.-i path/to/input_image.png
: Specifies the input image location.-o path/to/output_file.txt
: Determines the output text file path.-a 100
: Sets the certainty level to 100%, meaning that only characters recognized with absolute precision are considered known, any others are flagged as unknown.
Example Output:
If the input image clearly shows the text “Data Analysis”, but some letters such as ‘D’ and ‘A’ are imperfectly scanned, the output file may return something like:
Data An lysis
This indicates potential spots needing manual verification and input.
Conclusion
The GOCR command offers a flexible and extensible approach to Optical Character Recognition, ready to tackle text recognition challenges across various applications. By adjusting parameters, users can specify and refine the level of certainty, focus strictly on numerical data, and incrementally expand their recognition database with new patterns, thereby customizing the tool to fit their specific needs and achieving the highest possible accuracy.