Snowflake (preview)

Note

preview: Subject to future enhancements

Note

You might want to start with our detailed tutorial for your first steps with SQL databases in DSS.

The rest of this page is reference information for Snowflake connections.

DSS supports the full range of features on Snowflake:

  • Reading and writing datasets
  • Executing SQL recipes
  • Performing visual recipes in-database
  • Using live engine for charts

Installing the JDBC driver

The Snowflake JDBC driver can be downloaded from Snowflake website (https://docs.snowflake.net/manuals/user-guide/jdbc.html)

The driver is made of a single JAR file snowflake-jdbc-VERSION.jar

To install:

  • Copy this JAR file to the lib/jdbc driver of DSS
  • Restart DSS

Writing data into Snowflake

Loading data into a Snowflake database using the regular SQL INSERT or COPY statements is very inefficient and should only be used for extremely small datasets.

The recommended way to load data into a Snowflake table is through a bulk COPY from files stored in Amazon S3. DSS will automatically use this optimal S3-to-Snowflake copy mechanism when using a Sync recipe. For more information, see below.

The exception to this rule is when executing a recipe from a Snowflake dataset to a Snowflake dataset using the in-database (SQL) engine.

In other words:

  • You should never have a Flow with a recipe that writes from a non-Snowflake non-S3 source to a Snowflake dataset.
  • S3 to Snowflake recipes should only be the “Sync” recipe
  • Snowflake to Snowflake recipes will be fast if and only if the “In-database (SQL)” engine is selected.

For example, if you have a table in Snowflake, and want to use a recipe that does not have an “In-database (SQL) engine”, you should instead use two steps: * A first Snowflake-to-S3 recipe * A S3-to-Snowflake sync recipe

Syncing from S3

Loading a Snowflake database using SQL INSERT statements is inefficient, and should be avoided except for small datasets. The recommended way to load data into a Snowflake table is through a bulk COPY from files stored in Amazon S3.

Due to Snowflake restrictions, the following set of conditions must be met for a sync recipe to be executed as direct copy:

  • The source dataset must be stored on Amazon S3
  • The destination dataset must be stored on Snowflake
  • The S3 bucket and the Snowflake cluster must be in the same Amazon AWS region
  • The S3 side must be stored with a CSV format, using the UTF-8 charset
  • The CSV separator must be a simple ASCII character
  • For the S3 side, the “Unix” CSV quoting style is not supported
  • For the S3 side, the “Escaping only” CSV quoting style is only supported if the quoting character is \

Additionally, the schema of the input dataset must match the schema of the output dataset, and values stored in fields must be valid with respect to the declared Snowflake column type.

Note that an intermediate preparation recipe can be used to normalize an input S3 dataset to the above conditions, if the dataset one wants to load into Snowflake does not natively fulfill them.

Limitations and known issues

Datasets

  • When creating the connection, you need to specify one database. Access to tables from other databases is not possible

Importing

  • When importing tables from a Snowflake connection, you’ll see all schemas from all databases. However, you can only import tables from the schemas of the same database as the one defined in the connection

Visual recipes

  • Filtering: “contains” operator is not supported
  • Grouping: aggregating on booleans is not supported
  • Window: cumulative average is not supported (Snowflake limitation)

Coding recipes

  • Execution plans are not supported

Charts

  • “Breakdown by day of week” is not supported

Fast-Paths

  • Fast-path from Azure WASB is not supported
  • Fast-path from Snowflake Spark driver is not supported