Text

A large amount of information is available in the form of text. For example, tweets, emails, survey responses, product reviews and so forth contain information that is written in natural language.

The goal of working with text is to convert it into data that can be useful for analysis. Some applications of text analysis include: sentiment analysis, named entity recognition, summarization, and so forth.

The following table lists the plugins currently available for working with text data.

Note

Support level: These plugins are not supported / Tier 2 supported features

Plugin

Description

Language coverage

Text preparation

Detect languages, correct misspellings and clean text data using open source libraries

Language detection: 114
Spell checking: 37
Text cleaning: 59

Text Analysis

Analyze text data with ontology tagging

59 languages

Sentiment analysis

Estimate sentiment polarity (positive/negative) of text data using open source models

English

Text summarization

Automatically summarize text data using open source algorithms to extract sentences

Language-agnostic

Named entity recognition

Extract information on named entities (people, dates, places, etc.) from text data using open source models

7 languages

Speech to Text

Convert speech to text offline using open-source components

English

Amazon Transcribe

Use the Amazon Transcribe API to convert speech to text

40 languages

Sentence embedding

Compute numerical sentence representations for use as feature vectors in a Machine Learning model or for similarity search, using open source models

English

Amazon Comprehend

Use the Amazon Comprehend API for language detection, sentiment analysis, named entity recognition and key phrase extraction

Language detection: 100
Other tasks: 12

Amazon Comprehend Medical

Use the Amazon Comprehend Medical API for Protected Health Information extraction and medical entity recognition

English

Azure Cognitive Services – Text Analytics

Use the Azure Cognitive Services – Text Analytics API for language detection, sentiment analysis, named entity recognition and key phrase extraction

Language detection: 108
Sentiment analysis: 13
Named entity recog.: 23
Key phrase extraction: 16

Crowlingo Multilingual NLP

Use the Crowlingo Multilingual NLP API for language detection, sentiment analysis, summarization and multiple other tasks

102 languages

Google Cloud NLP

Use the Google Cloud NLP API for sentiment analysis, named entity recognition and text classification

Sentiment analysis: 16
Named entity recog.: 11
Text classification: English

Google Cloud Translation

Use the Google Cloud Translation API to translate text

109 languages

Amazon Translation

Use the Amazon Translation API to translate text

71 languages

Azure Translation

Use the Azure Translation API to translate text

90 languages

DeepL Translation

Use the DeepL Translation API to translate text

28 languages

Offline Translation

Translate text offline using open-source components

100 languages

MeaningCloud

Use the MeaningCloud API for language detection, sentiment analysis, topic extraction, summarization and text classification

Language detection: 180
Sentiment analysis: 10
Topic extraction: 13
Summarization: language-agnostic
Text classification: 2

NLG Tasks

Use the OpenAI API to perform tasks expressed in natural language, such as Zero-shot Classification or Q&A

English

Tesseract OCR

Perform Optical Character Recognition (OCR) offline using the Tesseract engine

100 languages