Amazon Redshift

DSS supports the full range of features on Redshift:

  • Reading and writing datasets
  • Executing SQL recipes
  • Performing visual recipes in-database
  • Using live engine for charts

Note

We have a detailed howto for your first steps with SQL databases in DSS.

You might want to start with that Howto. The rest of this page is reference information for Redshift.

Installing the JDBC driver

The Redshift driver is pre-installed in DSS. You don’t need any further installation.

Writing data into Redshift

Loading data into a Redshift database using the regular SQL “INSERT” or “COPY” statements is extremely inefficient (a few dozens of records per second) and should only be used for extremely small datasets.

The recommended way to load data into a Redshift table is through a bulk COPY from files stored in Amazon S3. DSS will automaticallyuse this optimal S3-to-Redshift copy mechanism when using a Sync recipe. For more information, see other_recipes/sync

In other words:

  • you should never have a Flow with a recipe that writes from a non-Redshift non-S3 source to a Redshift dataset.
  • S3 to Redshift recipes should only be the “Sync” recipe
  • Redshift to Redshift recipes will be fast if and only if the “In-database (SQL)” engine is selected.

For example, if you have a table in Redshift, and want to use a prepare recipe, since the Prepare recipe has no “In-database (SQL) engine”, you should instead use two steps: * A first Redshift-to-S3 prepare recipe * A S3-to-Redshift sync recipe

Setting distribute and sort clauses

DSS does not have builtin support for setting Redshift “DISTRIBUTE BY” and “SORT BY” clauses. If you want or need to set it on a managed dataset written by DSS, go to the settings of the dataset, in the “Advanced” tab, and override the “Table creation SQL statement”

Limitations

  • DSS uses the PostgreSQL driver for connecting to Redshift. This driver limits the size of result sets to 2 billion records. You cannot read or write more than 2 billion records from/to a Redshift dataset (apart from using the In-database SQL engine)
  • SSL support is not tested by Dataiku