Optical Character Recognition

Zia Optical Character Recognition electronically detects textual characters in images or digital documents, and converts them into machine-encoded text. Zia OCR can recognize text in 9 international languages and 10 Indian languages. You can check the list of languages and language codes from the API documentation

Note: Catalyst does not store any of the files you upload in its systems. The files you upload are used for one-time processing only. They are not used for ML model training purposes either. Catalyst components are fully compliant with all applicable data protection and privacy laws.

You must specify the path to the image or document file that needs to be processed for OCR, as shown in the code below. You can also format the response you receive as shown in the sample code. The response will also include a confidence score, which defines the accuracy of the processing, in addition to the recognized text.

Allowed file formats: .jpg, .jpeg, .png, .tiff, .bmp, .pdf

File size limit: 20 MB

You must specify the model type as OCR in setModelType(), and the language codes using setLanguageCode. These values are optional for the OCR model type. By default, it is processed as the OCR model type, and the languages are automatically detected if they are not specified.

Ensure the following packages are imported:

import com.zc.component.ml.ZCContent; import com.zc.component.ml.ZCLine; import com.zc.component.ml.ZCML; import com.zc.component.ml.ZCOCRModelType; import com.zc.component.ml.ZCOCROptions; import com.zc.component.ml.ZCParagraph; import java.io.File;
File file = new File("/Users/amelia-421/Desktop/MyImage.jpg"); //Specify the file path ZCOCROptions options = ZCOCROptions.getInstance().setModelType(ZCOCRModelType.OCR).setLanguageCode("eng,tam"); //Set the model type and languages ZCContent ocrContent = ZCML.getInstance().getContent(file, options); //Call getContent() with the file object to get the detected text in ZCContent object //To get individual paragraphs List paragraphs = ocrContent.getParagraphs(); for(ZCParagraph paragraph : paragraphs) { //To get individual lines in the paragraph List paraLines = paragraph.lines; for(ZCLine line : paraLines) { //To get individual words in the line String words = line.words; String text = line.text; //Raw line text } String text = paragraph.text; //Returns the raw paragraph text } String text = ocrContent.text; //Returns the raw image text

Last Updated 2023-09-03 01:06:41 +0530 +0530