OCR (Optical Character recognition)¶
OCR is the process of recognizing, parsing and extracting text from images.
Dataiku leverages two open source OCR engines:
The Tesseract library to perform OCR in 100 languages
The EasyOCR library
It is an offline capability, meaning that it does not leverage a 3rd party API.
Note
This capability is provided by the “Text extraction and OCR” plugin, which you need to install. Please see Installing plugins.
This plugin is Not supported
Please see our OCR plugin page for detailed instructions