Key Concepts

Before you learn about the use cases and implementation of Optical Character Recognition, it’s important to understand its fundamental concepts in detail.

Text Recognition Process

OCR systems in general follow a top-down approach to the text detection and identification process.

When an image or a digital document is submitted to Zia OCR, the text detection and recognition process proceeds as follows:

  1. Zia analyzes the structure of the image and divides it into blocks of contiguous sets of textual lines, like paragraphs.
Note: A block could also contain pictorial content. However, any content that is not text, such as diagrams, symbols, or images will not be identified by Zia OCR.
  1. Zia then breaks the blocks down further and identifies individual lines of text.

  2. The lines of text are then divided into words and each word is broken down into individual characters.

  3. Zia compares the characters it has detected with its dataset and runs advanced algorithms and analysis to identify the characters and recognize words based on the of character groupings.

  4. Zia also identifies the language the content is in by processing it through volumes of probabilities and hypotheses using Intelligent Character Recognition (ICR) technology.

  5. The processed and recognized text is finally returned to the user as either a JSON or a document response.

Model Types

A model type is a key attribute that describes the type of OCR feature supported by Catalyst. All general image and document files that you process for the common optical character recognition feature will fall under the OCR Model Type. You will need to specify this as the model type, whenever you process an image or a document through the Catalyst OCR API or SDK.

Catalyst also enables you to process ID proofs and official documents, and perform secure identity checks through an independent feature called Identity Scanner. These will fall under their respective model types of AADHAAR, PAN, CHEQUE and PASSBOOK.

Supported Languages

The OCR models can detect and recognize textual content in 9 international languages and 10 Indian languages.

Indian Languages

  1. English
  2. Hindi
  3. Bengali
  4. Marathi
  5. Telugu
  6. Tamil
  7. Gujarati
  8. Urdu
  9. Kannada
  10. Malayalam
  11. Sanskrit

Additional International Languages

  1. Arabic
  2. Chinese
  3. French
  4. Italian
  5. Japanese
  6. Portuguese
  7. Romanian
  8. Spanish

If the user doesn’t specify the language, Zia can detect the language automatically. Zia can recognize handwritten content as long as the text is legible, clear, and uses a standard font structure. However, it cannot recognize any non-textual content such as images or diagrams.

Input Format

Zia OCR supports input files in the following formats for processing:

  1. .jpg/.jpeg
  2. .png
  3. .tiff
  4. .bmp
  5. .pdf

You could provide a space for the user to upload the image or document file from the device’s memory to the Catalyst application. You can also code the Catalyst application to use the end user device’s camera to capture a photo with textual content, and process the image as the input file.

The input provided using the API contains the source file, the language of the text to be recognized (optional), and the model type (optional).

You can check the request format from the API documentation.

The user must follow these guidelines while providing the input, for better results:

  • Avoid providing blurred or unrecognizable text in images.
  • Ensure that the text in an image file is clear, visible, and legible.
  • If handwritten text is present in an image file, ensure that it uses a standard font.
  • The image size must not be too small.

Response Format

Zia returns the response of OCR processing in the following ways:

  • In the Console
    When you upload a sample image or a document file to be processed in the console, it will return the response in two formats:
    • Document response: This returns a formatted readable text that is visually segregated into lines and paragraphs based on the original content, along with a confidence score for the OCR model type in a percentage value.
    • JSON response: This returns the recognized text in JSON format along with the confidence score for the OCR model type.
  • Using the SDKs
    When you send an image or document file using an API request, you will receive a JSON response containing the recognized text in the same format mentioned above. You can customize the formatting of the JSON response in your code using SDKs. For example, you can return separate paragraphs or individual words from a line as the response. For more information, refer to the Java, Node.js and Python SDK documentation.

Last Updated 2023-09-05 15:16:18 +0530 +0530