Supported connections¶
DSS can read and write data from a variety of sources.
Connectors¶
Here is a list of the available connectors in DSS.
Type | Read | Write |
---|---|---|
Upload your files | yes (see supported formats) | yes (see supported formats) |
Server filesystem | yes (see supported formats) | yes (see supported formats) |
HDFS | yes (see supported formats) | yes (see supported formats) |
Amazon S3 | yes (see supported formats) | yes (see supported formats) |
Google Cloud Storage | yes (see supported formats) | yes (see supported formats) |
Azure Blob Storage | yes (see supported formats) | yes (see supported formats) |
FTP | yes (see supported formats) | yes (see supported formats) |
SSH (SCP and SFTP) | yes (see supported formats) | yes (see supported formats) |
HTTP | yes (see supported formats) | no |
MySQL | yes | yes |
PostgreSQL | yes | yes |
Vertica | yes | yes |
Amazon Redshift | yes | yes |
Greenplum | yes | yes |
Teradata | yes | yes |
IBM Netezza | yes | yes |
SAP HANA | yes | yes |
Oracle | yes | yes |
Exadata | yes | yes |
Microsoft SQL Server | yes | yes |
Google BigQuery | yes | yes |
IBM DB2 | yes (Tier 2) | yes (Tier 2) |
Snowflake | yes | yes |
Exasol | yes (Tier 2) | yes (Tier 2) |
MemSQL. | yes (Tier 2) | yes (Tier 2) |
Other SQL databases (JDBC driver) | best effort, not guaranteed | no, generally |
MongoDB | yes | yes |
Cassandra | yes | yes |
Elasticsearch | yes | yes |
Twitter (Streaming API) | yes | no |
Generic APIs | Custom Python or R code, plugins | Custom Python or R code, plugins |
File formats¶
Here is a list of the file formats that DSS can read and write for files-based connections (filesystem, HDFS, Amazon S3, HTTP, FTP, SSH).
Standard formats¶
Format | Read | Write |
---|---|---|
Delimited values (CSV, TSV, …) | yes | yes |
Fixed width | yes | no |
Excel (from Excel 97) | yes | only via export |
Avro | yes | yes |
Custom format using regular expression | yes | no |
XML | yes | no |
JSON | yes | no |
ESRI Shapefiles | yes | no |
MySQL Dump | yes | no |
Apache Combined log format | yes | no |
Note that file-based formats can be read compressed: ZIP, GZIP, BZ2.
Hadoop-specific formats¶
The following formats can be read and written on HDFS only. If the data is on S3 or Azure Blob Storage, then access needs to be setup through Hadoop with HDFS connections .
Format | Read | Write |
---|---|---|
Parquet | yes | yes |
Hive SequenceFile | yes | yes |
Hive RCFile | yes | yes |
Hive ORCFile | yes | yes |