Text cleaning¶
Text cleaning is the process of cleaning up, simplifying text, and preparing it for further analysis
Dataiku provides offline text cleaning
Offline text cleaning¶
The native text cleaning capability of Dataiku provides capabilites in 59 languages
It provides:
Tokenization
Filtering of punctuation, stop words, and multiple other categories
Lemmatization
It is an offline capability, meaning that it does not leverage a 3rd party API.
Note
This capability is provided by the “Text Preparation” plugin, which you need to install. Please see Installing plugins.
This plugin is Not supported
Please see our Text preparation plugin page for detailed documentation.