Using managed GKE clusters¶
Initial setup¶
Install the GKE plugin¶
To use Google Kubernetes Engine (GKE), begin by installing the “GKE clusters” plugin from the Plugins store in Dataiku DSS. For more details, see the instructions for installing plugins.
Prepare your local commands¶
Follow the Google Cloud Platform (GCP) documentation to ensure the following on your local machine (where DSS is installed):
The
gcloud
command is installed. See install documentation. Thegcloud
command has the appropriate permissions and scopes to:push to the Google Container Registry (GCR) service.
have full control on the GKE service.
The
gke-gcloud-auth-plugin
command is installed. See GCP documentation.The
kubectl
command is installed. See install documentation.The
docker
command is installed, can build images and push them to GCR. The latter can be enabled by running thegcloud auth configure-docker
command. See install documentation.
Note
- Cluster management has been tested with the following versions of Kubernetes:
1.23
1.24
1.25
1.26
1.27
1.28
1.29
There is no known issue with other Kubernetes versions.
Create base images¶
Build the base image by following these instructions.
Create a new execution configuration¶
Go to Administration > Settings > Containerized execution, and add a new execution configuration of type “Kubernetes.”
In GCP, there is only a single shared image repository URL,
gcr.io
. Access control is based on image names; therefore, the repository URL to use isgcr.io/your-gcp-project-name
.Finish by clicking Push base images.
Cluster configuration¶
Connection¶
The connection is where you define how to connect to GCP. This can be done either inline in each cluster (not recommended), or as a preset in the “GKE connection” plugin settings (recommended).
Network settings¶
The “network” field refers to the Virtual Private Cloud (VPC) where the cluster will be deployed. The “sub-network” field defines the IP space within that VPC where the pod IPs will be allocated. If left blank, those fields will use default network settings, the details of which are explained in the GCP documentation.
Cluster nodes¶
This is where you define the number and type of nodes that you want in your cluster. You can define the properties of a node pool either inline (not recommended) or as a preset in the “Node pools” plugin settings (recommended). You have the possibility to define multiple node pools, each with its own properties.
Using GPUs¶
GCP provides GPU-enabled instances with NVidia GPUs. Using GPUs for containerized execution requires the following steps.
Building an image with CUDA support¶
The base image that is built by default does not have CUDA support and cannot use NVidia GPUs.
You need to build a CUDA-enabled base image. To enable CUDA add the --with-cuda
option to the command line:
./bin/dssadmin build-base-image --type container-exec --with-cuda
We recommend that you give this image a specific tag using the --tag
option and keep the default base image “pristine”. We also recommend that you add the DSS version number in the image tag.
./bin/dssadmin build-base-image --type container-exec --with-cuda --tag dataiku-container-exec-base-cuda:X.Y.Z
where X.Y.Z is your DSS version number
Note
This image contains CUDA 10.0 and CuDNN 7.6. You can use
--cuda-version X.Y
to specify another DSS-provided version (9.0, 10.0, 10.1, 10.2, 11.0 and 11.2 are available). If you require other CUDA versions, you would have to create a custom image.Remember that depending on which CUDA version you build the base image (by default 10.0) you will need to use the corresponding tensorflow version.
Warning
After each upgrade of DSS, you must rebuild all base images and update code envs.
Thereafter, create a new container configuration dedicated to running GPU workloads. If you specified a tag for the base image, report it in the “Base image tag” field.
Enable GPU support on the cluster¶
When you create your cluster using the GKE plugin, be sure to enable the “With GPU” option in the node pool settings. Follow the GCP documentation on GPUs to select the GPU type.
At cluster creation, the plugin will run the NVidia driver “DaemonSet” installation procedure, which needs several minutes to complete.
Add a custom reservation¶
For your containerized execution task to run on nodes with GPUs, and for GKE to configure the CUDA driver on your containers, the corresponding pods must be created with a custom limit (in Kubernetes parlance). This indicates that you need a specific type of resource (standard resource types are CPU and memory).
You must configure this limit in the containerized execution configuration. To do this:
In the “Custom limits” section, add a new entry with key
nvidia.com/gpu
and value1
(to request 1 GPU).Add the new entry and save your settings.
Deploy¶
You can now deploy your GPU-based recipes and models.