Dataiku DSS¶
Welcome to the reference documentation for Dataiku Data Science Studio (DSS).
More learning resources are available at Dataiku Learn.
- Installing DSS
- Requirements
- Installing a new DSS instance
- Upgrading a DSS instance
- Updating a DSS license
- Other installation options
- Setting up Hadoop and Spark integration
- Setting up Dashboards and Flow export to PDF or images
- R integration
- Customizing DSS installation
- Installing database drivers
- Java runtime environment
- Python integration
- Installing a DSS plugin
- Configuring LDAP authentication
- Working with proxies
- Migration operations
- DSS concepts
- Connecting to data
- Supported connections
- Upload your files
- Server filesystem
- HDFS
- Amazon S3
- Google Cloud Storage
- Azure Blob Storage
- FTP
- SCP / SFTP (aka SSH)
- HTTP
- SQL databases
- Cassandra
- ElasticSearch
- Managed folders
- “Files in folder” dataset
- Metrics dataset
- Internal stats dataset
- HTTP (with cache)
- Dataset plugins
- Data connectivity macros
- Making relocatable managed datasets
- Data ordering
- Exploring your data
- Schemas, storage types and meanings
- Data preparation
- Charts
- Machine learning
- The Flow
- Visual recipes
- Recipes based on code
- Code notebooks
- Webapps
- Code reports
- Dashboards
- Working with partitions
- DSS and Hadoop
- Setting up Hadoop integration
- Connecting to secure clusters
- Hadoop filesystems connections (HDFS, S3, EMRFS, WASB, ADLS, GS)
- DSS and Hive
- DSS and Impala
- Hive datasets
- Multiple Hadoop clusters
- Dynamic AWS EMR clusters
- Hadoop multi-user security
- Distribution-specific notes
- Teradata Connector For Hadoop
- DSS and Spark
- DSS and Python
- DSS and R
- Code environments
- Running in containers
- Collaboration
- Automation scenarios, metrics, and checks
- Automation node and bundles
- API Node & API Deployer: Real-time APIs
- Plugins
- Python APIs
- Using the APIs inside of DSS
- Using the APIs outside of DSS
- API for interacting with datasets
- API for interacting with Pyspark
- API for managed folders
- API for interacting with saved models
- API for scenarios
- API for performing SQL, Hive and Impala queries
- API for performing SQL, Hive and Impala queries like the recipes
- API for metrics and checks
- API For creating static insights
- Reference API documentation of
dataiku
- API for plugin components
dataikuapi
: The REST API client
- R API
- Public REST API
- Additional APIs
- File formats
- Security
- Operating DSS
- Advanced topics
- Troubleshooting
- Release notes
- DSS 5.1 Release notes
- DSS 5.0 Release notes
- DSS 4.3 Release notes
- DSS 4.2 Release notes
- DSS 4.1 Release notes
- DSS 4.0 Release notes
- DSS 3.1 Release notes
- DSS 3.0 Relase notes
- DSS 2.3 Relase notes
- DSS 2.2 Relase notes
- DSS 2.1 Relase notes
- DSS 2.0 Relase notes
- DSS 1.4 Relase notes
- DSS 1.3 Relase notes
- DSS 1.2 Relase notes
- DSS 1.1 Release notes
- DSS 1.0 Release Notes
- Pre versions
- Other Documentation
- Third-party acknowledgements