Overview
The Tesseract OCR is used to extract text from image using tesseract.js library which is an open source library. It supports multiple languages with higher accuracy.
Documentation
data:image/s3,"s3://crabby-images/75b70/75b7032423897ab145a07ef6a934c2b08889a8e8" alt=""
The Tesseract OCR is used to extract text from image using tesseract.js library which is an open source library.
It supports multiple languages with higher accuracy.
Prequisites :
Community commons
Implementation:
Use 'SUB_GetBase64Image' to convert image to base64 where base64 java action is used.
Select Load Language and Initialize Language in 'JS_TesseractOCR' javascript action
Result :
Fetch text from image as string type
Supported File:
JPG
PNG
GIF
PNM
TIFF
Features :
It supports multiple languages. Please check HERE for supported languages.
The accuracy is pretty high with normal fonts and clear background
Limitation:
Accuracy will be low with noisy backgrounds and custom scripted fonts.