Data Science Studio can both read and write datasets on Elasticsearch versions 5.0 to 7.4.
Please note that support for Elasticsearch 1.x and 2.x is now deprecated and will be removed in a future release.
Append Mode (to append to an elasticsearch dataset instead of replacing) is not supported.
Define an Elasticsearch connection¶
Go to Administration > Connections
Click the “New connection” button and pick Elasticsearch
Enter a name for the new connection, and the required connection parameters, then test and save the new connection
The port parameter should be Elasticsearch’s HTTP API port (9200 by default), not the Java API port.
Managed Elasticsearch datasets¶
If you allow DSS to write managed dataset into the Elasticsearch connection, you can use this connection to create output datasets for recipes.
Creating such a dataset creates a new index on your Elasticsearch server with
the name of the dataset by default. For Elasticsearch 6 and below, a mapping type
is also created with the name of the dataset by default. For example, if your
Elasticsearch server is hosted on
localhost:9200, a managed dataset named
Articles stores its data into
For Elasticsearch 7, it will be stored into
This name will not change if you rename the dataset in case you are relying on
its presence, so if you rename the dataset and want those names to remain similar,
you should edit the index and type names after renaming the dataset,
then rebuild it and manually delete the previous index.
For Elasticsearch 6 and below, you should not create other types in the index that are managed by DSS, they might be deleted or altered.
By default, fields get the default Elasticsearch mapping, e.g. string are
analyzed and indexed (mapped to
text in Elasticsearch 5+). If you want
access to a non-analyzed version(mapped to
keyword in Elasticsearch 5+) of
some or all of your columns, you can list those columns (comma-separated, or
* for all string columns) in the dataset settings. You can also specify your
own complete type mapping.
If your dataset is partitioned, then one index per partition is created (prefixed by the index name) and the index name is actually an Elasticsearch alias that points to all the partition’s indices. You can still search or delete from the alias normally.
If you want the index to have non-default settings, you can use an index template before building the managed dataset for the first time.
External Elasticsearch datasets¶
You can also import existing data from Elasticsearch into DSS. Simply create an Elasticsearch dataset and specify the index of the data (and the type name for Elasticsearch 6 and below). If the connection is writable, DSS can also overwrite that data, but the type mapping will not be modified by DSS and the index/type will not be created if they don’t already exist.
Your index may be an alias if it’s only used for reading, or for writing if it only points to one index (otherwise Elasticsearch refuses the write operation).
You can partition your external dataset in DSS: simply specify the partitioning column and the type of partitioning (value or time-based). You can only partition on one column for external datasets.
The partitioning column must have fielddata
enabled, which is the case by default for
keyword fields in Elasticsearch 5+
but not for