Speech-to-Text

Speech to Text is the process of transforming audio files to text.

This capability is provided by the “Speech to Text” plugin, which you need to install. Please see Installing plugins.

Dataiku provides several speech-to-text capabilities

Native speech-to-text

The native speech to text capability of Dataiku provides speech-to-text in English. It is an offline capability, meaning that it does not leverage a 3rd party API.

Warning

The underlying DeepSpeech library requires the following system libraries:

  • libstdc++6 >= 4.8.5

  • glibc >= 2.19

libstdc++6 >= 4.8 is not installed by default on several Linux distributions. If that is the case, you will need *sudo * access to the server hosting your Dataiku instance in order to upgrade libstdc++6.

Download DeepSpeech model macro

This macro downloads the weights of the DeepSpeech pre-trained model into a folder in your project. Note that this model has been trained on American English speech data.

Speech to Text recipe

This recipe takes as input the folder with DeepSpeech weights from the macro and a folder with audio files of .WAV format. The output will be a dataset with two columns: the audio file path and the associated transcription.

AWS Transcribe

The AWS Transcribe integration provides speech-to-text extraction in 40 languages

Please see NLP using AWS APIs for more details