Text cleaning

Text cleaning is the process of cleaning up, simplifying text, and preparing it for further analysis

Dataiku provides offline text cleaning

Offline text cleaning

The native text cleaning capability of Dataiku provides capabilities in 59 languages

It provides:

  • Tokenization

  • Filtering of punctuation, stop words, and multiple other categories

  • Lemmatization

It is an offline capability, meaning that it does not leverage a 3rd party API.

Note

This capability is provided by the “Text Preparation” plugin, which you need to install. Please see Installing plugins.

This plugin is Not supported

Please see our Text preparation plugin page for detailed documentation.