Dataiku DSS - Reference documentation¶
Welcome to the reference documentation for Dataiku DSS. This documentation contains:
The main documentation of the concepts, interfaces and features of Dataiku DSS
Information on how to install and configure Dataiku DSS
Information for administrators on how to operate Dataiku DSS
You might also find these other resources useful:
The Knowledge Base a variety of topics that can help you to learn more about Dataiku DSS, or find solutions to problems without having to ask for help.
The Developer Guide contains all information for developers using Dataiku: how to code in Dataiku, how to create applications, how to operate Dataiku through its APIs, numerous code samples and examples, and reference API documentation
The Dataiku Academy provides guided learning paths for you to follow, upskill, and gain certifications on Dataiku DSS.
The Dataiku Community is a place where you can join the discussion, get support, share best practices and engage with other Dataiku users.
- DSS concepts
- Connecting to data
- Supported connections
- SQL databases
- Amazon S3
- Azure Blob Storage
- Google Cloud Storage
- Upload your files
- HDFS
- Cassandra
- MongoDB
- Elasticsearch
- File formats
- Managed folders
- “Files in folder” dataset
- Metrics dataset
- Internal stats dataset
- “Editable” dataset
- kdb+
- FTP
- SCP / SFTP (aka SSH)
- HTTP
- HTTP (with cache)
- Server filesystem
- Dataset plugins
- Making relocatable managed datasets
- Clearing non-managed Datasets
- Data ordering
- PI System / PIWebAPI server
- Google Sheets
- Data transfer on Dataiku Cloud
- Exploring your data
- Schemas, storage types and meanings
- Data preparation
- Charts
- Interactive statistics
- Machine learning
- Prediction (Supervised ML)
- Clustering (Unsupervised ML)
- Automated machine learning
- Model Settings Reusability
- Features handling
- Algorithms reference
- Advanced models optimization
- Models ensembling
- Model Document Generator
- Time Series Forecasting
- Causal Prediction
- Deep Learning
- Models lifecycle
- Scoring engines
- Writing custom models
- Exporting models
- Partitioned Models
- ML Diagnostics
- Computer vision
- Labeling
- The Flow
- Visual recipes
- Prepare: Cleanse, Normalize, and Enrich
- Sync: copying datasets
- Grouping: aggregating data
- Window: analytics functions
- Distinct: get unique rows
- Join: joining datasets
- Fuzzy join: joining two datasets
- Geo join: joining datasets based on geospatial features
- Splitting datasets
- Top N: retrieve first N rows
- Stacking datasets
- Sampling datasets
- Sort: order values
- Pivot recipe
- Generate features
- Push to editable recipe
- Download recipe
- List Folder Contents
- Recipes based on code
- Code notebooks
- MLOps
- Webapps
- Code Studios
- Code reports
- Dashboards
- Workspaces
- Data Catalog
- Dataiku Applications
- Working with partitions
- DSS and SQL
- DSS and Python
- DSS and R
- DSS and Spark
- Code environments
- Collaboration
- Time Series
- Geographic data
- Generative AI and LLM Mesh
- Text & Natural Language Processing
- Language Detection
- Named Entities Extraction
- Sentiment Analysis
- Translation
- Text summarization
- Key phrase extraction
- Ontology Tagging
- Spell checking
- OpenAI GPT
- Machine Learning with Text features
- Text extraction
- OCR (Optical Character recognition)
- Speech-to-Text
- Text cleaning
- Text Embedding
- NLP using AWS APIs
- NLP using Azure APIs
- NLP with Crowlingo API
- NLP using Deepl API
- NLP using Google APIs
- NLP with MeaningCloud API
- Images
- Audio
- Video
- Metrics, checks and Data Quality
- Automation scenarios
- Production deployments and bundles
- API Node & API Deployer: Real-time APIs
- Introduction
- Concepts
- Installing API nodes
- Setting up the API Deployer and deployment infrastructures
- First API (with API Deployer)
- First API (without API Deployer)
- Deploying to an external platform
- Types of Endpoints
- Enriching prediction queries
- Documenting your API endpoints
- Security
- Managing versions of your endpoint
- Deploying on Kubernetes
- APINode APIs reference
- Operations reference
- Governance
- Python APIs
- R API
- Public REST API
- Additional APIs
- Installing and setting up
- Elastic AI computation
- Concepts
- Initial setup
- Managed Kubernetes clusters
- Using Amazon Elastic Kubernetes Service (EKS)
- Using Microsoft Azure Kubernetes Service (AKS)
- Using Google Kubernetes Engine (GKE)
- Using code envs with containerized execution
- Containerized DSS engine
- Dynamic namespace management
- Customization of base images
- Unmanaged Kubernetes clusters
- Using Openshift
- Using NVIDIA DGX Systems
- Troubleshooting
- Using Docker instead of Kubernetes
- DSS in the cloud
- DSS and Hadoop
- Setting up Hadoop integration
- Connecting to secure clusters
- Hadoop filesystems connections (HDFS, S3, EMRFS, WASB, ADLS, GS)
- Hive
- Impala
- Spark
- Hive datasets
- Hadoop user isolation
- Distribution-specific notes
- Teradata Connector For Hadoop
- Multiple Hadoop clusters
- Dynamic AWS EMR clusters
- Dynamic Google Dataproc clusters
- Metastore catalog
- Operating DSS
- dsscli tool
- The data directory
- Backing up
- Audit trail
- The runtime databases
- Logging in DSS
- DSS Macros
- Managing DSS disk usage
- Understanding and tracking DSS processes
- Tuning and controlling memory usage
- Using cgroups for resource control
- Monitoring DSS
- HTTP proxies
- DSS license
- Compute resource usage reporting
- Security
- User Isolation
- Plugins
- Streaming data
- Formula language
- Basic usage
- Reading column values
- Variables typing and autotyping
- Boolean values
- Operators
- Array and object operations
- Object notations
- DSS variables
- Array functions
- Boolean functions
- Date functions
- Math functions
- Object functions
- String functions
- Geometry functions
- Value access functions
- Control structures
- Tests
- Custom variables expansion
- Sampling methods
- Accessibility
- Troubleshooting
- Release notes
- DSS 12 Release notes
- DSS 11 Release notes
- DSS 10.0 Release notes
- DSS 9.0 Release notes
- DSS 8.0 Release notes
- DSS 7.0 Release notes
- DSS 6.0 Release notes
- DSS 5.1 Release notes
- DSS 5.0 Release notes
- DSS 4.3 Release notes
- DSS 4.2 Release notes
- DSS 4.1 Release notes
- DSS 4.0 Release notes
- DSS 3.1 Release notes
- DSS 3.0 Relase notes
- DSS 2.3 Relase notes
- DSS 2.2 Relase notes
- DSS 2.1 Relase notes
- DSS 2.0 Relase notes
- DSS 1.4 Relase notes
- DSS 1.3 Relase notes
- DSS 1.2 Relase notes
- DSS 1.1 Release notes
- DSS 1.0 Release Notes
- Pre versions
- Other Documentation
- Third-party acknowledgements