OCR (Optical Character recognition)

OCR is the process of recognizing, parsing and extracting text from images.

Dataiku leverages two open source OCR engines:

It is an offline capability, meaning that it does not leverage a 3rd party API.

Note

This capability is provided by the “Text extraction and OCR” plugin, which you need to install. Please see Installing plugins.

This plugin is Not supported

Please see our OCR plugin page for detailed instructions