Supported connections

Data Science Studio can read and write data from a variety of sources

Connectors

Here is a list of the available connectors in DSS.

Type Read Write
Filesystem yes (see supported formats) yes (see supported formats)
Hadoop HDFS yes (see supported formats) yes (see supported formats)
Amazon S3 yes (see supported formats) yes (see supported formats)
HTTP yes (see supported formats) (data copied locally) no
FTP yes (see supported formats) (data optionally copied locally) yes (see supported formats)
SSH (SCP and SFTP) yes (see supported formats) (data copied locally) no
MySQL yes yes
PostgreSQL yes yes
Vertica yes yes
Amazon Redshift yes yes
Greenplum yes yes
Teradata yes yes
IBM Netezza yes yes
SAP HANA yes yes
Oracle yes yes
Exadata yes yes
Microsoft SQL Server yes yes
Google BigQuery yes (private beta from Google) no
Other SQL databases (JDBC driver) best effort, not guaranteed no, generally
MongoDB yes yes
Cassandra yes yes
ElasticSearch yes yes
Twitter (Streaming API) yes no
Generic APIs Custom Python or R code, plugins Custom Python or R code, plugins

File formats

Here is a list of the file formats that DSS can read and write for files-based connections (filesystem, HDFS, Amazon S3, HTTP, FTP, SSH)

Standard formats

Format Read Write
Delimited values (CSV, TSV, ...) yes yes
Fixed width yes no
Excel (from Excel 97) yes only via export
Avro yes yes
Custom format using regular expression yes no
XML yes no
JSON yes no
ESRI Shapefiles yes no
MySQL Dump yes no
Apache Combined log format yes no

Note that file-based formats can be read compressed: ZIP, GZIP, BZ2.

Hadoop-specific formats

The following formats can be read and written on HDFS only. If the data is on S3 or Azure Blob Storage, then access needs to be setup through Hadoop with HDFS connections

Format Read Write
Parquet yes yes
Hive SequenceFile yes yes
Hive RCFile yes yes
Hive ORCFile yes yes