Dataiku DSS¶
Welcome to the Product Documentation for Dataiku Data Science Studio (DSS). This site contains information on the details of installing and configuring Dataiku DSS in your environment, using the tool through the browser interface, and driving it through the API.
Is This the Help You’re Looking For?¶
You might also find these other resources useful:
- The Knowledge Base a variety of topics that can help you to learn more about Dataiku DSS, or find solutions to problems without having to ask for help.
- Dataiku Academy provides guided learning paths for you to follow, upskill, and gain certification on Dataiku DSS.
- Dataiku Community is a place where you can join the discussion, get support, share best practices and engage with other Dataiku users.
Reference Doc Contents¶
- Installing DSS
- Requirements
- Installing a new DSS instance
- Upgrading a DSS instance
- Updating a DSS license
- Other installation options
- Setting up Hadoop and Spark integration
- Setting up Dashboards and Flow export to PDF or images
- R integration
- Customizing DSS installation
- Installing database drivers
- Java runtime environment
- Python integration
- Installing a DSS plugin
- Configuring LDAP authentication
- Working with proxies
- Migration operations
- DSS concepts
- Homepage
- Projects
- Connecting to data
- Supported connections
- Upload your files
- Server filesystem
- HDFS
- Amazon S3
- Google Cloud Storage
- Azure Blob Storage
- FTP
- SCP / SFTP (aka SSH)
- HTTP
- SQL databases
- Cassandra
- MongoDB
- Elasticsearch
- Managed folders
- “Files in folder” dataset
- Metrics dataset
- Internal stats dataset
- HTTP (with cache)
- Dataset plugins
- Data connectivity macros
- Making relocatable managed datasets
- Data ordering
- Exploring your data
- Schemas, storage types and meanings
- Data preparation
- Charts
- Interactive statistics
- Machine learning
- Prediction (Supervised ML)
- Clustering (Unsupervised ML)
- Automated machine learning
- Model Settings Reusability
- Features handling
- Algorithms reference
- Advanced models optimization
- Models ensembling
- Model Document Generator
- Deep Learning
- Models lifecycle
- Scoring engines
- Writing custom models
- Exporting models
- Partitioned Models
- The Flow
- Visual recipes
- Recipes based on code
- Code notebooks
- Webapps
- Code reports
- Dashboards
- Dataiku Applications
- DSS in the cloud
- Working with partitions
- DSS and Hadoop
- Setting up Hadoop integration
- Connecting to secure clusters
- Hadoop filesystems connections (HDFS, S3, EMRFS, WASB, ADLS, GS)
- DSS and Hive
- DSS and Impala
- Hive datasets
- Multiple Hadoop clusters
- Dynamic AWS EMR clusters
- Hadoop user isolation
- Distribution-specific notes
- Teradata Connector For Hadoop
- Dynamic Google Dataproc clusters
- DSS and Spark
- DSS and SQL
- DSS and Python
- DSS and R
- Metastore catalog
- Code environments
- Running in containers
- Concepts
- Setting up (Kubernetes)
- Unmanaged Kubernetes clusters
- Managed Kubernetes clusters
- Using Amazon Elastic Kubernetes Service (EKS)
- Using Microsoft Azure Kubernetes Service (AKS)
- Using Google Kubernetes Engine (GKE)
- Using Openshift
- Using code envs with containerized execution
- Dynamic namespace management
- Customization of base images
- Troubleshooting
- Using Docker instead of Kubernetes
- Collaboration
- Automation scenarios, metrics, and checks
- Automation node and bundles
- API Node & API Deployer: Real-time APIs
- Time Series
- Unstructured data
- Plugins
- Python APIs
- Using the APIs inside of DSS
- Using the APIs outside of DSS
- Datasets (introduction)
- Datasets (reading and writing data)
- Datasets (other operations)
- Datasets (reference)
- Managed folders
- Interaction with Pyspark
- The main DSSClient class
- Projects
- Project folders
- Recipes
- Interaction with saved models
- Scenarios
- Scenarios (in a scenario)
- Flow creation and management
- Machine learning
- Statistics worksheets
- API Designer & Deployer
- Static insights
- Jobs
- Authentication information and impersonation
- Importing tables as datasets
- Wikis
- Discussions
- Performing SQL, Hive and Impala queries
- SQL Query
- Meanings
- Users and groups
- Connections
- Code envs
- Plugins
- Dataiku applications
- Metrics and checks
- Other administration tasks
- Reference API documentation of
dataiku - Reference API documentation of
dataikuapi - API for plugin components
- Clusters
- R API
- Public REST API
- Additional APIs
- File formats
- Security
- User Isolation
- Operating DSS
- Advanced topics
- Accessibility
- Troubleshooting
- Release notes
- DSS 8.0 Release notes
- DSS 7.0 Release notes
- DSS 6.0 Release notes
- DSS 5.1 Release notes
- DSS 5.0 Release notes
- DSS 4.3 Release notes
- DSS 4.2 Release notes
- DSS 4.1 Release notes
- DSS 4.0 Release notes
- DSS 3.1 Release notes
- DSS 3.0 Relase notes
- DSS 2.3 Relase notes
- DSS 2.2 Relase notes
- DSS 2.1 Relase notes
- DSS 2.0 Relase notes
- DSS 1.4 Relase notes
- DSS 1.3 Relase notes
- DSS 1.2 Relase notes
- DSS 1.1 Release notes
- DSS 1.0 Release Notes
- Pre versions
- Other Documentation
- Third-party acknowledgements