Dremio¶
Warning
Experimental: Connection to Dremio is Experimental
Dataiku has experimental support for interacting with Dremio:
Reading data from Dremio
SQL notebook
Performing in-database computation for Dremio->Dremio recipes
Performing in-database charts
Limited support for write-back to Dremio from non-Dremio data
Support has been tested on Dremio Cloud.
Setting up¶
Dataiku support for Dremio leverages the “Flight SQL” driver. You can download the driver from Dremio’s site: https://docs.dremio.com/current/client-applications/drivers/arrow-flight-sql-jdbc-driver
Put the driver in a subfolder of your lib/jdbc
driver, such as lib/jdbc/dremio
In Dataiku, create an “Other databases (JDBC)” connection
JDBC driver class:
org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver
JDBC URL:
jdbc:arrow-flight-sql://HOST:443/?token=ENCODED_TOKEN&catalog=CATALOG_GUID
, where:For Dremio Cloud, HOST is usually
data.dremio.com
ordata.eu.dremio.com
CATALOG_GUID is the project id of a project in your Dremio. It can be found in the Sonar “Settings” page for your project
ENCODED_TOKEN is a Personal Access Token, URL-encoded
Drivers jars directory:
lib/jdbc/dremio
(or another folder that you may have selected)SQL Dialect:
Dremio
User/Password: Leave empty
Can browse catalogs: Disable
Importing tables¶
You can import tables using the usual Connection Explorer.
A specificity of Dremio is that within dataset settings, table and schema names should be quoted. Unlike other databases, in Dremio, the schema can be made of several quoted chunks, representing the hierarchical nature of Dremio.
For example, the famous TPCDS datasets can be accessed with:
Table:
"call_center"
Schema:
"Samples"."samples.dremio.com"."tpcds_sf1000"
Limitations¶
Writing data into Dremio “from the outside” (which is done through JDBC inserts) is extremely slow (about one record every 2-5 seconds)
Dremio does not have a “timestamp with time zone” data type, so the “datetime with tz” type of DSS gets converted to a timestamp-unaware field in Dremio. Various timezone shift issues may be encountered.
In charts, support for dates / timestamps is limited