Connecting to data¶
The first task when using Data Science Studio is to define datasets to connect to your data sources.
A dataset is a series of records with the same schema. It is quite analogous to a table in the SQL world.
For a more global explanation about the different kinds of datasets, see the DSS concepts page.
See also
For more information, see the Concept | Data connections article in the Knowledge Base.
- Supported connections
- SQL databases
- Introduction
- Snowflake
- Connection setup (Dataiku Custom or Dataiku Cloud Stacks)
- Connection setup (Dataiku Cloud)
- Authenticate using OAuth2
- Writing data into Snowflake
- Unloading data from Snowflake to Cloud
- Extended push-down
- Spark native integration
- Snowpark integration
- Switching Role and Warehouse
- Limitations and known issues
- Advanced install of the JDBC driver
- Databricks
- Azure Synapse
- Microsoft OneLake
- Google BigQuery
- Amazon Redshift
- PostgreSQL
- MySQL
- Microsoft SQL Server
- Oracle
- Teradata
- Pivotal Greenplum
- Google AlloyDB
- Google Cloud SQL
- AWS Athena
- Trino/Starburst
- Vertica
- SAP HANA
- IBM Netezza
- Exasol
- IBM DB2
- kdb+
- Amazon S3
- Azure Blob Storage
- Google Cloud Storage
- Upload your files
- HDFS
- Cassandra
- MongoDB
- Elasticsearch
- File formats
- Managed folders
- “Files in folder” dataset
- Metrics dataset
- Internal stats dataset
- “Editable” dataset
- kdb+
- FTP
- SCP / SFTP (aka SSH)
- HTTP
- HTTP (with cache)
- Server filesystem
- Dataset plugins
- Making relocatable managed datasets
- Clearing non-managed Datasets
- Data ordering
- Dynamic dataset repeat
- PI System / PIWebAPI server
- Google Sheets
- Google Analytics
- Data transfer on Dataiku Cloud
- SharePoint Online
- Excel