Connecting to data¶
The first task when using Data Science Studio is to define datasets to connect to your data sources.
A dataset is a series of records with the same schema. It is quite analogous to a table in the SQL world.
For a more global explanation about the different kinds of datasets, see the DSS concepts page.
- Supported connections
- Upload your files
- Server filesystem
- HDFS
- Amazon S3
- Google Cloud Storage
- Azure Blob Storage
- FTP
- SCP / SFTP (aka SSH)
- HTTP
- SQL databases
- Supported databases
- Full support
- MySQL
- PostgreSQL
- Vertica
- Amazon Redshift
- Pivotal Greenplum
- Teradata
- Oracle
- Microsoft SQL Server
- Google Bigquery
- Snowflake
- Tier 2 support
- Other databases
- Full support
- Defining a connection
- Advanced connection settings
- Supported databases
- Cassandra
- MongoDB
- Elasticsearch
- Managed folders
- “Files in folder” dataset
- Metrics dataset
- Internal stats dataset
- HTTP (with cache)
- Dataset plugins
- Data connectivity macros
- Making relocatable managed datasets
- Data ordering