preview: Subject to future enhancements
You might want to start with our detailed tutorial for your first steps with SQL databases in DSS.
The rest of this page is reference information for Snowflake connections.
DSS supports the full range of features on Snowflake:
- Reading and writing datasets
- Executing SQL recipes
- Performing visual recipes in-database
- Using live engine for charts
Installing the JDBC driver¶
The Snowflake JDBC driver can be downloaded from Snowflake website (https://docs.snowflake.net/manuals/user-guide/jdbc.html)
The driver is made of a single JAR file
- Copy this JAR file to the
lib/jdbcdriver of DSS
- Restart DSS
Writing data into Snowflake¶
Loading data into a Snowflake database using the regular SQL
COPY statements is very inefficient and should only be used for extremely small datasets.
The recommended way to load data into a Snowflake table is through a bulk
COPY from files stored in Amazon S3. DSS will automatically use this optimal S3-to-Snowflake copy mechanism when using a Sync recipe. For more information, see below.
The exception to this rule is when executing a recipe from a Snowflake dataset to a Snowflake dataset using the in-database (SQL) engine.
In other words:
- You should never have a Flow with a recipe that writes from a non-Snowflake non-S3 source to a Snowflake dataset.
- S3 to Snowflake recipes should only be the “Sync” recipe
- Snowflake to Snowflake recipes will be fast if and only if the “In-database (SQL)” engine is selected.
For example, if you have a table in Snowflake, and want to use a recipe that does not have an “In-database (SQL) engine”, you should instead use two steps: * A first Snowflake-to-S3 recipe * A S3-to-Snowflake sync recipe
Syncing from S3¶
Loading a Snowflake database using SQL
INSERT statements is inefficient, and should be avoided except for small datasets. The recommended way to load data into a Snowflake table is through a bulk COPY from files stored in Amazon S3.
Due to Snowflake restrictions, the following set of conditions must be met for a sync recipe to be executed as direct copy:
- The source dataset must be stored on Amazon S3
- The destination dataset must be stored on Snowflake
- The S3 bucket and the Snowflake cluster must be in the same Amazon AWS region
- The S3 side must be stored with a CSV format, using the UTF-8 charset
- The CSV separator must be a simple ASCII character
- For the S3 side, the “Unix” CSV quoting style is not supported
- For the S3 side, the “Escaping only” CSV quoting style is only supported if the quoting character is
Additionally, the schema of the input dataset must match the schema of the output dataset, and values stored in fields must be valid with respect to the declared Snowflake column type.
Note that an intermediate preparation recipe can be used to normalize an input S3 dataset to the above conditions, if the dataset one wants to load into Snowflake does not natively fulfill them.
Limitations and known issues¶
- When creating the connection, you need to specify one database. Access to tables from other databases is not possible
- When importing tables from a Snowflake connection, you’ll see all schemas from all databases. However, you can only import tables from the schemas of the same database as the one defined in the connection
- Filtering: “contains” operator is not supported
- Grouping: aggregating on booleans is not supported
- Window: cumulative average is not supported (Snowflake limitation)
- Execution plans are not supported
- “Breakdown by day of week” is not supported
- Fast-path from Azure WASB is not supported
- Fast-path from Snowflake Spark driver is not supported