Dataiku Documentation
  • Discussions
    • Setup & Configuration
    • Using Dataiku DSS
    • Plugins & Extending Dataiku DSS
    • General Discussion
    • Job Board
    • Community Resources
  • Knowledge
    • Getting Started
    • Knowledge Base
    • Documentation
  • Academy
    • Course Catalog
    • Learning Paths
    • Resources
    • Academy Discussions
  • Community Programs
    • Upcoming User Events
    • Find a User Group
    • Past Events
    • Community Conundrums
    • Dataiku Neurons
    • Banana Data Podcast
  • What's New
  • Installing DSS
    • Requirements
    • Installing a new DSS instance
    • Upgrading a DSS instance
    • Updating a DSS license
    • Other installation options
      • Install on macOS
      • Install on AWS
      • Install on Azure
      • Install a virtual machine
      • Running DSS as a Docker container
      • Install on GCP
    • Setting up Hadoop and Spark integration
    • Setting up Dashboards and Flow export to PDF or images
    • R integration
    • SageMaker Integration
    • Customizing DSS installation
    • Installing database drivers
    • Java runtime environment
    • Python integration
    • Installing a DSS plugin
    • Configuring LDAP authentication
    • Working with proxies
    • Migration operations
  • DSS concepts
  • Homepage
    • My Items
    • Projects and Project Folders View
    • Projects, Folders, Dashboards, Wikis Views
      • Changing the Order of Sections on the Homepage
      • Projects and Project Folders View
    • Getting Started With DSS Panel
    • Changing the Order of Sections on the Homepage
  • Projects
    • How to Copy a Dataiku Project
    • Creating projects through macros
  • Connecting to data
    • Supported connections
    • Upload your files
    • Server filesystem
    • HDFS
    • Amazon S3
    • Google Cloud Storage
    • Azure Blob Storage
    • FTP
    • SCP / SFTP (aka SSH)
    • HTTP
    • SQL databases
      • MySQL
      • PostgreSQL
      • Vertica
      • Amazon Redshift
      • Pivotal Greenplum
      • Teradata
      • Oracle
      • Microsoft SQL Server
      • Google Bigquery
      • Snowflake
      • IBM DB2
      • SAP HANA
      • IBM Netezza
      • AWS Athena
      • Exasol
    • Cassandra
    • MongoDB
    • Elasticsearch
    • Managed folders
    • “Files in folder” dataset
    • Metrics dataset
    • Internal stats dataset
    • HTTP (with cache)
    • Dataset plugins
    • Data connectivity macros
    • Making relocatable managed datasets
    • Data ordering
  • Exploring your data
    • Sampling
    • Analyze
  • Schemas, storage types and meanings
    • Definitions
    • Basic usage
    • Schema for data preparation
    • Creating schemas of datasets
    • Handling of schemas by recipes
    • List of recognized meanings
    • User-defined meanings
    • Handling and display of dates
  • Data preparation
    • How to Copy Prepare Recipe Steps
    • Sampling
    • Execution engines
    • Processors reference
      • Extract from array
      • Fold an array
      • Sort array
      • Concatenate JSON arrays
      • Discretize (bin) numerical values
      • Change coordinates system
      • Copy column
      • Rename columns
      • Concatenate columns
      • Delete/Keep columns by name
      • Column Pseudonymization
      • Count occurrences
      • Convert currencies
      • Extract date elements
      • Compute difference between dates
      • Format date with custom format
      • Parse to standard date format
      • Split e-mail addresses
      • Enrich from French department
      • Enrich from French postcode
      • Enrich with record context
      • Extract ngrams
      • Extract numbers
      • Fill a column with a constant value
      • Fill empty cells with fixed value
      • Filter rows/cells on date range
      • Filter rows/cells with formula
      • Filter invalid rows/cells
      • Filter rows/cells on numerical range
      • Filter rows/cells on value
      • Find and replace
      • Flag rows/cells on date range
      • Flag rows with formula
      • Flag invalid rows
      • Flag rows on numerical range
      • Flag rows on value
      • Fold multiple columns
      • Fold multiple columns by pattern
      • Fold object keys
      • Formula
      • Fuzzy join with other dataset (memory-based)
      • Generate Big Data
      • Compute distance between geopoints
      • Extract from geo column
      • Geo-join
      • Resolve GeoIP
      • Create GeoPoint from lat/lon
      • Extract lat/lon from GeoPoint
      • Flag holidays
      • Split invalid cells into another column
      • Join with other dataset (memory-based)
      • Extract with JSONPath
      • Group long-tail values
      • Translate values using meaning
      • Normalize measure
      • Merge long-tail values
      • Move columns
      • Negate boolean value
      • Force numerical range
      • Generate numerical combinations
      • Convert number formats
      • Nest columns
      • Unnest object (flatten JSON)
      • Extract with regular expression
      • Pivot
      • Python function
      • Split HTTP Query String
      • Remove rows where cell is empty
      • Round numbers
      • Simplify text
      • Split and fold
      • Split and unfold
      • Split column
      • Transform string
      • Tokenize text
      • Transpose rows to columns
      • Triggered unfold
      • Unfold
      • Unfold an array
      • Convert a UNIX timestamp to a date
      • Fill empty cells with previous/next value
      • Split URL (into protocol, host, port, …)
      • Classify User-Agent
      • Generate a best-effort visitor id
      • Zip JSON arrays
    • Filtering and flagging rows
    • Managing dates
    • Reshaping
    • Geographic processing
  • Charts
    • The Charts Interface
    • Sampling & Engine
    • Basic Charts
    • Tables
    • Scatter Charts
    • Map Charts
    • Other Charts
    • Common Chart Elements
    • Color palettes
  • Interactive statistics
    • The Worksheet Interface
    • Univariate Analysis
    • Bivariate Analysis
    • Fit curves and distributions
    • Correlation matrix
    • Statistical Tests
    • Principal Component Analysis (PCA)
  • Machine learning
    • Prediction (Supervised ML)
      • Prediction settings
      • Prediction Results
      • Individual prediction explanations
    • Clustering (Unsupervised ML)
      • Clustering settings
      • Clustering results
    • Automated machine learning
    • Model Settings Reusability
    • Features handling
      • Features roles and types
      • Categorical variables
      • Numerical variables
      • Text variables
      • Vector variables
      • Image variables
      • Custom Preprocessing
    • Algorithms reference
      • In-memory Python (Scikit-learn / XGBoost)
      • MLLib (Spark) engine
      • H2O (Sparkling Water) engine
      • Vertica
    • Advanced models optimization
    • Models ensembling
    • Model Document Generator
    • Deep Learning
      • Introduction
      • Your first deep learning model
      • Model architecture
      • Training
      • Multiple inputs
      • Using image features
      • Using text features
      • Runtime and GPU support
      • Advanced topics
      • Troubleshooting
    • Models lifecycle
    • Scoring engines
    • Writing custom models
    • Exporting models
    • Partitioned Models
  • The Flow
    • Visual Grammar
    • Flow zones
    • Rebuilding Datasets
    • Limiting Concurrent Executions
    • Exporting the Flow to PDF or images
    • How to Manage Large Flows with Flow Folding
  • Visual recipes
    • Prepare: Cleanse, Normalize, and Enrich
    • Sync: copying datasets
    • Grouping: aggregating data
    • Window: analytics functions
    • Distinct: get unique rows
    • Join: joining datasets
    • Splitting datasets
    • Top N: retrieve first N rows
    • Stacking datasets
    • Sampling datasets
    • Sort: order values
    • Pivot recipe
    • Download recipe
  • Recipes based on code
    • The common editor layout
    • Python recipes
    • R recipes
    • SQL recipes
    • Hive recipes
    • Pig recipes
    • Impala recipes
    • Spark-Scala recipes
    • PySpark recipes
    • Spark / R recipes
    • SparkSQL recipes
    • Shell recipes
    • Variables expansion in code recipes
  • Code notebooks
    • SQL notebook
    • Python notebooks
    • Predefined notebooks
    • Containerized notebooks
  • Webapps
    • “Standard” web apps
    • Shiny web apps
    • Bokeh web apps
    • Publishing webapps on the dashboard
    • Public webapps
    • Webapps and security
    • Scaling webapps on Kubernetes
  • Code reports
    • R Markdown reports
  • Dashboards
    • Dashboard concepts
    • Display settings
    • Exporting dashboards to PDF or images
    • Insights reference
      • Chart
      • Dataset table
      • Model report
      • Managed folder
      • Jupyter Notebook
      • Webapp
      • Metric
      • Scenarios
      • Wiki article
  • Dataiku Applications
    • Application tiles
    • Application-as-recipe
  • DSS in the cloud
    • DSS in AWS
      • Reference architecture: managed compute on EKS with Glue and Athena
    • DSS in Azure
      • Reference architecture: manage compute on AKS and storage on ADLS gen2
    • DSS in GCP
      • Reference architecture: managed compute on GKE and storage on GCS
  • Working with partitions
    • Partitioning files-based datasets
    • Partitioned SQL datasets
    • Specifying partition dependencies
    • Partition identifiers
    • Recipes for partitioned datasets
    • Partitioned Hive recipes
    • Partitioned SQL recipes
    • Partitioning variables substitutions
    • Partitioned Models
  • DSS and Hadoop
    • Setting up Hadoop integration
    • Connecting to secure clusters
    • Hadoop filesystems connections (HDFS, S3, EMRFS, WASB, ADLS, GS)
    • DSS and Hive
    • DSS and Impala
    • Hive datasets
    • Multiple Hadoop clusters
    • Dynamic AWS EMR clusters
    • Hadoop user isolation
    • Distribution-specific notes
      • Cloudera CDH
      • Hortonworks HDP
      • MapR
      • Amazon Elastic MapReduce
      • Microsoft Azure HDInsight
      • Google Cloud Dataproc
    • Teradata Connector For Hadoop
    • Dynamic Google Dataproc clusters
  • DSS and Spark
    • Usage of Spark in DSS
    • Spark on Kubernetes
      • Managed Spark on K8S
      • Unmanaged Spark on Kubernetes
      • Advanced topics
    • Setting up (without Kubernetes)
    • Spark configurations
    • Interacting with DSS datasets
    • Spark pipelines
    • Limitations and attention points
  • DSS and SQL
    • SQL datasets
    • SQL write and execution
    • Partitioning
    • SQL pipelines in DSS
      • Using SQL pipelines
      • Views in SQL pipelines
      • Partitions and SQL pipelines
  • DSS and Python
    • Installing Python packages
    • Reusing Python code
    • Using Matplotlib
    • Using SpaCy
    • Using Bokeh
    • Using Plot.ly
    • Using Ggplot
    • Using Jupyter Widgets
  • DSS and R
    • Installing R packages
    • Reusing R code
    • Using ggplot2
    • Using Dygraphs
    • Using googleVis
    • Using ggvis
    • Installing STAN or Prophet
    • RStudio integration
  • Metastore catalog
    • Hive metastore (through HiveServer2)
    • Glue metastore
    • DSS as virtual metastore
  • Code environments
    • Operations (Python)
    • Operations (R)
    • Base packages
    • Using Conda
    • Automation nodes
    • Non-managed code environments
    • Plugins’ code environments
    • Custom options and environment
    • Troubleshooting
    • Code env permissions
  • Running in containers
    • Concepts
    • Setting up (Kubernetes)
    • Unmanaged Kubernetes clusters
    • Managed Kubernetes clusters
    • Using Amazon Elastic Kubernetes Service (EKS)
      • Using managed EKS clusters
      • Using unmanaged EKS clusters
    • Using Microsoft Azure Kubernetes Service (AKS)
      • Using managed AKS clusters
      • Using unmanaged AKS clusters
    • Using Google Kubernetes Engine (GKE)
      • Using managed GKE clusters
      • Using unmanaged GKE clusters
    • Using Openshift
    • Using code envs with containerized execution
    • Dynamic namespace management
    • Customization of base images
    • Troubleshooting
    • Using Docker instead of Kubernetes
  • Collaboration
    • Wikis
    • Discussions
    • Markdown
    • Tags
    • Working with Git
    • Version control of projects
    • Importing code from Git in project libraries
  • Automation scenarios, metrics, and checks
    • Definitions
    • Scenario steps
      • How to Copy Scenario Steps
      • How to Duplicate Scenario
    • Launching a scenario
    • Reporting on scenario runs
    • Custom scenarios
    • Variables in scenarios
    • Step-based execution control
    • Metrics
    • Checks
    • Custom probes and checks
  • Automation node and bundles
    • Installing the Automation node
    • Creating a bundle
    • Importing a bundle
  • API Node & API Deployer: Real-time APIs
    • Introduction
    • Concepts
    • Installing an API node
    • Installing the API Deployer
    • First API (without API Deployer)
    • First API (with API Deployer)
    • Types of Endpoints
      • Exposing a visual prediction model
      • Exposing a Python prediction model
      • Exposing a R prediction model
      • Exposing a Python function
      • Exposing a R function
      • Exposing a SQL query
      • Exposing a lookup in a dataset
    • Enriching prediction queries
    • Security
    • Managing versions of your endpoint
    • Deploying on Kubernetes
      • Setting up
      • Deployment on Google Kubernetes Engine
      • Deployment on Azure AKS
      • Deployment on AWS EKS
      • Deployment on Minikube
      • Managing SQL connections
    • APINode APIs reference
      • API node user API
      • API node administration API
      • Endpoint APIs
    • Operations reference
      • Using the apinode-admin tool
      • High availability and scalability
      • Logging and auditing
      • Health monitoring
  • Time Series
    • Understanding time series data
    • Format of time series data
    • Times series preparation
      • Resampling
      • Windowing
      • Extrema extraction
      • Interval extraction
    • Time series visualization
    • Time series forecasting
  • Unstructured data
    • Text
    • Images
    • Video
    • Graph
    • Audio
  • Plugins
    • Installing plugins
    • Managing installed plugins
    • Developing plugins
      • Plugin Components
      • Parameters
      • Component: Recipes
      • Component: Preparation Processor
      • Component: Macros
      • Component: Project creation macros
      • Component: Web Apps
      • Component: Filesystem providers
      • Component: Custom Fields
      • Component: Prediction algorithm
      • Components: Custom chart palettes and map backgrounds
      • Git integration in the plugin editor
      • Other topics
  • Python APIs
    • Using the APIs inside of DSS
    • Using the APIs outside of DSS
    • Datasets (introduction)
    • Datasets (reading and writing data)
    • Datasets (other operations)
    • Datasets (reference)
    • Managed folders
    • Interaction with Pyspark
    • The main DSSClient class
    • Projects
    • Project folders
    • Recipes
    • Interaction with saved models
    • Scenarios
    • Scenarios (in a scenario)
    • Flow creation and management
    • Machine learning
    • Statistics worksheets
    • API Designer & Deployer
    • Static insights
    • Jobs
    • Authentication information and impersonation
    • Importing tables as datasets
    • Wikis
    • Discussions
    • Performing SQL, Hive and Impala queries
    • SQL Query
    • Meanings
    • Users and groups
    • Connections
    • Code envs
    • Plugins
    • Dataiku applications
    • Metrics and checks
    • Other administration tasks
    • Reference API documentation of dataiku
    • Reference API documentation of dataikuapi
    • API for plugin components
      • API for plugin recipes
      • API for plugin datasets
      • API for plugin formats
      • API for plugin FS providers
  • R API
    • Using the R API inside of DSS
    • Using the R API outside of DSS
    • Reference documentation
    • Authentication information
    • Creating static insights
  • Public REST API
    • Features
    • Public API Keys
    • The REST API
  • Additional APIs
    • The Javascript API
    • The Scala API
  • File formats
    • Delimiter-separated values (CSV / TSV)
    • Fixed width
    • Parquet
    • Avro
    • Hive SequenceFile
    • Hive RCFile
    • Hive ORCFile
    • XML
    • JSON
    • Excel
    • ESRI Shapefiles
  • Security
    • Main project permissions
    • Connections security
    • User profiles
    • Exposed objects
    • Dashboard authorizations
    • User secrets
    • Audit Trail
    • Advanced security options
    • Single Sign-On
    • Multi-Factor Authentication
    • Passwords security
  • User Isolation
    • Capabilities of User Isolation Framework
    • Concepts
    • Prerequisites and limitations
    • Initial Setup
    • Reference architectures
      • Local-code only
      • Setup with Cloudera
      • Setup with Hortonworks Data Platform
      • Setup with Kubernetes
    • Details of UIF capabilities
      • Local code isolation
      • Hadoop Impersonation (HDFS, YARN, Hive, Impala)
      • Workload isolation on Kubernetes
      • Impersonation on Oracle
    • Advanced topics
      • Configuration of the local security
      • HDFS datasets data structure
  • Operating DSS
    • dsscli tool
    • The data directory
    • Backing up
    • Audit trail
      • Viewing the audit trail in DSS
      • Default storage of audit trail
      • Audit centralization and dispatch
      • The DSS Event Server
      • Configuration for API nodes
      • Audit data
      • Advanced topics
    • The runtime databases
    • Logging in DSS
    • DSS Macros
    • Managing DSS disk usage
    • Understanding and tracking DSS processes
    • Tuning and controlling memory usage
    • Using cgroups for resource control
    • Monitoring DSS
    • Compute resource usage reporting
  • Advanced topics
    • Sampling methods
    • Formula language
    • Custom variables expansion
  • Accessibility
  • Troubleshooting
    • Diagnosing and debugging issues
    • Obtaining support
    • Support tiers
    • Common issues
      • DSS does not start / Cannot connect
      • Cannot login to DSS
      • DSS crashes / The “Disconnected” overlay appears
      • Websockets problems
      • Cannot connect to a SQL database
      • A job fails
      • A scenario fails
      • A ML model training fails
      • “Your user profile does not allow” issues
    • Error codes
      • ERR_BUNDLE_ACTIVATE_CONNECTION_NOT_WRITABLE: Connection is not writable
      • ERR_CODEENV_CONTAINER_IMAGE_FAILED: Could not build container image for this code environment
      • ERR_CODEENV_CONTAINER_IMAGE_TAG_NOT_FOUND: Container image tag not found for this Code environment
      • ERR_CODEENV_CREATION_FAILED: Could not create this code environment
      • ERR_CODEENV_DELETION_FAILED: Could not delete this code environment
      • ERR_CODEENV_EXISTING_ENV: Code environment already exists
      • ERR_CODEENV_INCORRECT_ENV_TYPE: Wrong type of Code environment
      • ERR_CODEENV_INVALID_CODE_ENV_ARCHIVE: Invalid code environment archive
      • ERR_CODEENV_JUPYTER_SUPPORT_INSTALL_FAILED: Could not install Jupyter support in this code environment
      • ERR_CODEENV_JUPYTER_SUPPORT_REMOVAL_FAILED: Could not remove Jupyter support from this code environment
      • ERR_CODEENV_MISSING_ENV: Code environment does not exists
      • ERR_CODEENV_MISSING_ENV_VERSION: Code environment version does not exists
      • ERR_CODEENV_NO_CREATION_PERMISSION: User not allowed to create Code environments
      • ERR_CODEENV_NO_USAGE_PERMISSION: User not allowed to use this Code environment
      • ERR_CODEENV_UNSUPPORTED_OPERATION_FOR_ENV_TYPE: Operation not supported for this type of Code environment
      • ERR_CODEENV_UPDATE_FAILED: Could not update this code environment
      • ERR_CONNECTION_ALATION_REGISTRATION_FAILED: Failed to register Alation integration
      • ERR_CONNECTION_API_BAD_CONFIG: Bad configuration for connection
      • ERR_CONNECTION_AZURE_INVALID_CONFIG: Invalid Azure connection configuration
      • ERR_CONNECTION_DUMP_FAILED: Failed to dump connection tables
      • ERR_CONNECTION_INVALID_CONFIG: Invalid connection configuration
      • ERR_CONNECTION_LIST_HIVE_FAILED: Failed to list indexable Hive connections
      • ERR_CONNECTION_S3_INVALID_CONFIG: Invalid S3 connection configuration
      • ERR_CONNECTION_SQL_INVALID_CONFIG: Invalid SQL connection configuration
      • ERR_CONNECTION_SSH_INVALID_CONFIG: Invalid SSH connection configuration
      • ERR_CONTAINER_CONF_NO_USAGE_PERMISSION: User not allowed to use this containerized execution configuration
      • ERR_CONTAINER_CONF_NOT_FOUND: The selected container configuration was not found
      • ERR_CONTAINER_IMAGE_PUSH_FAILED: Container image push failed
      • ERR_DATASET_ACTION_NOT_SUPPORTED: Action not supported for this kind of dataset
      • ERR_DATASET_CSV_UNTERMINATED_QUOTE: Error in CSV file: Unterminated quote
      • ERR_DATASET_HIVE_INCOMPATIBLE_SCHEMA: Dataset schema not compatible with Hive
      • ERR_DATASET_INVALID_CONFIG: Invalid dataset configuration
      • ERR_DATASET_INVALID_FORMAT_CONFIG: Invalid format configuration for this dataset
      • ERR_DATASET_INVALID_METRIC_IDENTIFIER: Invalid metric identifier
      • ERR_DATASET_INVALID_PARTITIONING_CONFIG: Invalid dataset partitioning configuration
      • ERR_DATASET_PARTITION_EMPTY: Input partition is empty
      • ERR_DATASET_TRUNCATED_COMPRESSED_DATA: Error in compressed file: Unexpected end of file
      • ERR_ENDPOINT_INVALID_CONFIG: Invalid configuration for API Endpoint
      • ERR_FOLDER_INVALID_PARTITIONING_CONFIG: Invalid folder partitioning configuration
      • ERR_FSPROVIDER_CANNOT_CREATE_FOLDER_ON_DIRECTORY_UNAWARE_FS: Cannot create a folder on this type of file system
      • ERR_FSPROVIDER_DEST_PATH_ALREADY_EXISTS: Destination path already exists
      • ERR_FSPROVIDER_FSLIKE_REACH_OUT_OF_ROOT: Illegal attempt to access data out of connection root path
      • ERR_FSPROVIDER_HTTP_CONNECTION_FAILED: HTTP connection failed
      • ERR_FSPROVIDER_HTTP_INVALID_URI: Invalid HTTP URI
      • ERR_FSPROVIDER_HTTP_REQUEST_FAILED: HTTP request failed
      • ERR_FSPROVIDER_ILLEGAL_PATH: Illegal path for that file system
      • ERR_FSPROVIDER_INVALID_CONFIG: Invalid configuration
      • ERR_FSPROVIDER_INVALID_FILE_NAME: Invalid file name
      • ERR_FSPROVIDER_LOCAL_LIST_FAILED: Could not list local directory
      • ERR_FSPROVIDER_PATH_DOES_NOT_EXIST: Path in dataset or folder does not exist
      • ERR_FSPROVIDER_ROOT_PATH_DOES_NOT_EXIST: Root path of the dataset or folder does not exist
      • ERR_FSPROVIDER_SSH_CONNECTION_FAILED: Failed to establish SSH connection
      • ERR_HIVE_HS2_CONNECTION_FAILED: Failed to establish HiveServer2 connection
      • ERR_HIVE_LEGACY_UNION_SUPPORT: Your current Hive version doesn’t support UNION clause but only supports UNION ALL, which does not remove duplicates
      • ERR_METRIC_DATASET_COMPUTATION_FAILED: Metrics computation completely failed
      • ERR_METRIC_ENGINE_RUN_FAILED: One of the metrics engine failed to run
      • ERR_MISC_ENOSPC: Out of disk space
      • ERR_MISC_EOPENF: Too many open files
      • ERR_ML_MODEL_DETAILS_OVERFLOW: Model details exceed size limit
      • ERR_NOT_USABLE_FOR_USER: You may not use this connection
      • ERR_OBJECT_OPERATION_NOT_AVAILABLE_FOR_TYPE: Operation not supported for this kind of object
      • ERR_PLUGIN_CANNOT_LOAD: Plugin cannot be loaded
      • ERR_PLUGIN_COMPONENT_NOT_INSTALLED: Plugin component not installed or removed
      • ERR_PLUGIN_DEV_INVALID_COMPONENT_PARAMETER: Invalid parameter for plugin component creation
      • ERR_PLUGIN_DEV_INVALID_DEFINITION: The descriptor of the plugin is invalid
      • ERR_PLUGIN_INVALID_DEFINITION: The plugin’s definition is invalid
      • ERR_PLUGIN_NOT_INSTALLED: Plugin not installed or removed
      • ERR_PLUGIN_WITHOUT_CODEENV: The plugin has no code env specification
      • ERR_PLUGIN_WRONG_TYPE: Unexpected type of plugin
      • ERR_PROJECT_INVALID_ARCHIVE: Invalid project archive
      • ERR_PROJECT_INVALID_PROJECT_KEY: Invalid project key
      • ERR_PROJECT_UNKNOWN_PROJECT_KEY: Unknown project key
      • ERR_RECIPE_CANNOT_CHANGE_ENGINE: Cannot change engine
      • ERR_RECIPE_CANNOT_CHECK_SCHEMA_CONSISTENCY: Cannot check schema consistency
      • ERR_RECIPE_CANNOT_CHECK_SCHEMA_CONSISTENCY_EXPENSIVE: Cannot check schema consistency: expensive checks disabled
      • ERR_RECIPE_CANNOT_CHECK_SCHEMA_CONSISTENCY_NEEDS_BUILD: Cannot compute output schema with an empty input dataset. Build the input dataset first.
      • ERR_RECIPE_CANNOT_CHECK_SCHEMA_CONSISTENCY_ON_RECIPE_TYPE: Cannot check schema consistency on this kind of recipe
      • ERR_RECIPE_CANNOT_CHECK_SCHEMA_CONSISTENCY_WITH_RECIPE_CONFIG: Cannot check schema consistency because of recipe configuration
      • ERR_RECIPE_CANNOT_CHANGE_ENGINE: Not compatible with Spark
      • ERR_RECIPE_CANNOT_USE_ENGINE: Cannot use the selected engine for this recipe
      • ERR_RECIPE_ENGINE_NOT_DWH: Error in recipe engine: SQLServer is not Data Warehouse edition
      • ERR_RECIPE_INCONSISTENT_I_O: Inconsistent recipe input or output
      • ERR_RECIPE_SYNC_AWS_DIFFERENT_REGIONS: Error in recipe engine: Redshift and S3 are in different AWS regions
      • ERR_RECIPE_PDEP_UPDATE_REQUIRED: Partition dependecy update required
      • ERR_RECIPE_SPLIT_INVALID_COMPUTED_COLUMNS: Invalid computed column
      • ERR_SCENARIO_INVALID_STEP_CONFIG: Invalid scenario step configuration
      • ERR_SECURITY_CRUD_INVALID_SETTINGS: The user attributes submitted for a change are invalid
      • ERR_SECURITY_GROUP_EXISTS: The new requested group already exists
      • ERR_SECURITY_INVALID_NEW_PASSWORD: The new password is invalid
      • ERR_SECURITY_INVALID_PASSWORD: The password hash from the database is invalid
      • ERR_SECURITY_MUS_USER_UNMATCHED: The DSS user is not configured to be matched onto a system user
      • ERR_SECURITY_PATH_ESCAPE: The requested file is not within any allowed directory
      • ERR_SECURITY_USER_EXISTS: The requested user for creation already exists
      • ERR_SECURITY_WRONG_PASSWORD: The old password provided for password change is invalid
      • ERR_SPARK_FAILED_DRIVER_OOM: Spark failure: out of memory in driver
      • ERR_SPARK_FAILED_TASK_OOM: Spark failure: out of memory in task
      • ERR_SPARK_FAILED_YARN_KILLED_MEMORY: Spark failure: killed by YARN (excessive memory usage)
      • ERR_SPARK_PYSPARK_CODE_FAILED_UNSPECIFIED: Pyspark code failed
      • ERR_SPARK_SQL_LEGACY_UNION_SUPPORT: Your current Spark version doesn’t support UNION clause but only supports UNION ALL, which does not remove duplicates
      • ERR_SQL_CANNOT_LOAD_DRIVER: Failed to load database driver
      • ERR_SQL_DB_UNREACHABLE: Failed to reach database
      • ERR_SQL_IMPALA_MEMORYLIMIT: Impala memory limit exceeded
      • ERR_SQL_POSTGRESQL_TOOMANYSESSIONS: too many sessions open concurrently
      • ERR_SQL_TABLE_NOT_FOUND: SQL Table not found
      • ERR_SQL_VERTICA_TOOMANYROS: Error in Vertica: too many ROS
      • ERR_SQL_VERTICA_TOOMANYSESSIONS: Error in Vertica: too many sessions open concurrently
      • ERR_TRANSACTION_FAILED_ENOSPC: Out of disk space
      • ERR_TRANSACTION_GIT_COMMMIT_FAILED: Failed committing changes
      • ERR_USER_ACTION_FORBIDDEN_BY_PROFILE: Your user profile does not allow you to perform this action
      • WARN_RECIPE_SPARK_INDIRECT_HDFS: No direct access to read/write HDFS dataset
      • WARN_RECIPE_SPARK_INDIRECT_S3: No direct access to read/write S3 dataset
      • Undocumented error
  • Release notes
    • DSS 8.0 Release notes
    • DSS 7.0 Release notes
    • DSS 6.0 Release notes
    • DSS 5.1 Release notes
    • DSS 5.0 Release notes
    • DSS 4.3 Release notes
    • DSS 4.2 Release notes
    • DSS 4.1 Release notes
    • DSS 4.0 Release notes
    • DSS 3.1 Release notes
    • DSS 3.0 Relase notes
    • DSS 2.3 Relase notes
    • DSS 2.2 Relase notes
    • DSS 2.1 Relase notes
    • DSS 2.0 Relase notes
    • DSS 1.4 Relase notes
    • DSS 1.3 Relase notes
    • DSS 1.2 Relase notes
    • DSS 1.1 Release notes
    • DSS 1.0 Release Notes
    • Pre versions
  • Other Documentation
  • Third-party acknowledgements
 
Dataiku DSS
You are viewing the documentation for version 8.0 of DSS.
  • Docs »
  • Running in containers »
  • Using Microsoft Azure Kubernetes Service (AKS)

Using Microsoft Azure Kubernetes Service (AKS)¶

You can use containerized execution on AKS as a fully managed Kubernetes solution.

For a complete Elastic AI setup in Azure including elastic storage and elastic compute based on Kubernetes, we recommend that you read our dedicated Azure documentation

  • Using managed AKS clusters
    • Initial setup
      • Create your ACR registry
      • Install the AKS plugin
      • Prepare your local az, docker, and kubectl commands
      • Create base images
      • Create a new containerized execution configuration
    • Cluster configuration
    • Other
  • Using unmanaged AKS clusters
    • Setup
      • Create your ACR registry
      • Create your AKS cluster
      • Prepare your local az, docker, and kubectl commands
      • Create base images
      • Create a new containerized execution configuration
    • Using GPUs
      • Build a CUDA-enabled base image
      • Create configuration and add a custom reservation
      • Create a cluster with GPUs
      • Deploy
Next Previous

© Copyright 2021, Dataiku.

Sphinx theme provided by Read the Docs