OCR (Optical Character recognition)

OCR is the process of recognizing, parsing and extracting text from images.

Dataiku leverages the Tesseract library to perform OCR in 100 languages

It is an offline capability, meaning that it does not leverage a 3rd party API.

Note

This capability is provided by the “Tesseract OCR” plugin, which you need to install. Please see Installing plugins.

This plugin is Not supported

Please see our Tesseract plugin page for detailed instructions