Initial setup¶
Warning
When using Dataiku Cloud Stacks, all this setup is already handled as part of the Cloud Stacks capabilities. You do not need to go through this setup
Prerequisites¶
Note
Many Kubernetes setups will be based on managed Kubernetes clusters handled by your Cloud Provider. DSS provides deep integrations with these, and we recommend that you read our dedicated sections: Using Amazon Elastic Kubernetes Service (EKS), Using Microsoft Azure Kubernetes Service (AKS) and Using Google Kubernetes Engine (GKE)
Docker and kubectl setup¶
Warning
Dataiku DSS is not responsible for setting up your local Docker daemon
Warning
Dataiku DSS is not compatible with podman, the alternative container engine for Redhat 8 / CentOS 8 / AlmaLinux 8
The prerequisites for running workloads in Kubernetes are:
You must have an existing Docker daemon. The
dockercommand on the DSS machine must be fully functional and usable by the user running DSS. This includes the permission to build images, and thus access to a Docker socket.You must have an image registry, that will accessible by your Kubernetes cluster.
The local
dockercommand must have permission to push images to your image registry.The
kubectlcommand must be installed on the DSS machine and be usable by the user running DSS.The containers running on the cluster must be able to open TCP connections on the DSS host on any port.
Other prerequisites include¶
To install packages, your DSS machine must have direct outgoing Internet access.
To install packages, your containers must have direct outgoing Internet access.
DSS should be stopped prior to starting this procedure
(Optional) Setup Spark¶
Download the dataiku-dss-spark-standalone binary from your usual Dataiku DSS download site
Download the dataiku-dss-hadoop-standalone-libs-generic-hadoop3 binary from your usual Dataiku DSS download site
Setup setup of Hadoop and Spark (note that this is only about client libraries, no Hadoop cluster will be setup)
./bin/dssadmin install-hadoop-integration -standaloneArchive /PATH/TO/dataiku-dss-hadoop3-standalone-libs-generic...tar.gz
./bin/dssadmin install-spark-integration -standaloneArchive /PATH/TO/dataiku-dss-spark-standalone....tar.gz -forK8S
Build the base image¶
Before you can deploy to Kubernetes, at least one “base image” must be constructed.
Warning
After each upgrade of DSS, you must rebuild all base images
To build the base image, run the following command from the DSS data directory:
./bin/dssadmin build-base-image --type container-exec
(Optional) Build the Spark base image¶
For Spark workloads, then run:
./bin/dssadmin build-base-image --type spark
(Optional) Build the CDE base image¶
For CDE tasks, then run:
./bin/dssadmin build-base-image --type cde
Setting up image build configs¶
After building the base image, you need to create image build configurations.
In Administration > Settings > Containerized execution, click Add another config under the Image build configs section to create a new configuration.
- Configure one or several publishing destinations as needed
Enter the image registry URL (See Elastic AI computation for more details)
When deploying on AWS EKS the setting “Image pre-push hook” should be set to “Enable push to ECR”
Setting up containerized execution configs¶
You can now create containerized execution configurations.
In Administration > Settings > Containerized execution, click Add another config under the Containerized execution configs section to create a new configuration.
Select the image build configuration to use.
Dataiku recommends to create a namespace per user:
Set
dssns-${dssUserLogin}as namespaceEnable “auto-create namespace”
Save
Setting up spark image build configs¶
In Administration > Settings > Spark, click Add another config under the Image build configs section to create a new configuration.
- Configure one or several publishing destinations as needed
Enter the image registry URL (See Elastic AI computation for more details)
When deploying on AWS EKS the setting “Image pre-push hook” should be set to “Enable push to ECR”
Setting up Spark configurations¶
You can now create spark runtime configurations.
In Administration > Settings > Spark, click Add another config under the Runtime configurations section to create a new configuration.
Select the image build configuration to use.
Repeat the following operations for each named Spark configuration that you want to run on Kubernetes
Enable “Managed Spark on K8S”
Select the image build configuration to use
Dataiku recommends to create a namespace per user:
Set
dssns-${dssUserLogin}as namespaceEnable “auto-create namespace”
Set “Authentication mode” to “Create service accounts dynamically”
Save
Push Base images¶
In Administration > Settings > Containerized execution, click on the “Push base images” button
Use Kubernetes¶
The configurations for containerized execution can be chosen:
As a global default in Administration > Settings > Containerized execution
In the project settings — in which case the settings apply by default to all project activities that can run on containers
In a recipe’s advanced settings
In the “Execution environment” tab of in-memory machine learning Design screen
Each Spark activity which is configured to use one of the K8S-enabled Spark configurations will automatically use Kubernetes.