Mendix Tesseract OCR | Extract Text from Images

Overview

The Tesseract OCR is used to extract text from image using tesseract.js library which is an open source library. It supports multiple languages with higher accuracy.

Modules Link
Github

Documentation

The Tesseract OCR is used to extract text from image using tesseract.js library which is an open source library.

It supports multiple languages with higher accuracy.

Prequisites :

Community commons

Implementation:

Use 'SUB_GetBase64Image' to convert image to base64 where base64 java action is used.
Select Load Language and Initialize Language in 'JS_TesseractOCR' javascript action

Result :

Fetch text from image as string type

Supported File:

JPG
PNG
GIF
PNM
TIFF

Features :

It supports multiple languages. Please check HERE for supported languages.
The accuracy is pretty high with normal fonts and clear background

Limitation:

Accuracy will be low with noisy backgrounds and custom scripted fonts.

Mendix Modules : Tesseract OCR

Latest Release

Downloads

Requirements

Categories

Resources

Tesseract OCR