Deployment on Google Kubernetes Engine¶

You can use the API Deployer Kubernetes integration to deploy your API Services on Google Kubernetes Engine.

Setup¶

Create your GKE cluster¶

Follow Google documentation on how to create your cluster. We recommend that you allocate at least 7 GB of memory for each cluster node.

Prepare your local `docker` and `kubectl` commands¶

Follow Google documentation to make sure that:

Your local (on the DSS machine) kubectl command can interact with the cluster. As of July 2018, this implies running gcloud container clusters get-credentials <cluster_id>
Your local (on the DSS machine) docker command can successfully push images to your GAR repository. As of March 2025, this implies running gcloud auth configure-docker

Note

Cluster management has been tested with the following versions of Kubernetes:

1.23
1.24
1.25
1.26
1.27
1.28
1.29
1.30
1.31
1.32

There is no known issue with other Kubernetes versions.

Setup the infrastructure¶

Follow the usual setup steps as indicated in Setting up.

Make sure you have Google Artifact Registry (GAR) set up with a repository in your project. We recommend that it be specific to API Deployer. It will be used to prefix your image paths.

For example, if your GCP project is called my-gke-project and you have a GAR repository called my-repository, all images must be prefixed by my-gke-project/my-repository/.

Go to the infrastructure settings > Kubernetes cluster
In the Registry host field, enter the region’s artifact registry hostname <region>-docker.pkg.dev
In the images prefix field, enter my-gke-project/my-repository

Deploy¶

You’re now ready to deploy your API Services to GKE

Using GPUs¶

Google Cloud Platform provides GPU-enabled instances with NVidia GPUs. Several steps are required in order to use them for API Deployer deployments

Building an image with CUDA support¶

The base image that is built by default does not have CUDA support and cannot use NVidia GPUs.

CUDA support can be added to an image by:

installing CUDA system-wide (in /usr/local/cuda/) in the base image (see below)
installing CUDA system-wide in the code env image using container runtime additions
installing CUDA in the code env (in /opt/dataiku/code-env/) by requiring CUDA libraries (including nvidia-cuda-runtime)

To enable CUDA system-wide in the base image add the --with-cuda option to the command line:

./bin/dssadmin build-base-image --type apideployer --with-cuda

We recommend that you give this image a specific tag using the --tag option and keep the default base image “pristine”. We also recommend that you add the DSS version number in the image tag.

./bin/dssadmin build-base-image --type apideployer --with-cuda --tag dataiku-apideployer-base-cuda:X.Y.Z

where X.Y.Z is your DSS version number

Note

This image contains CUDA 11.8 and CuDNN 8.7 by default on AlmaLinux 9. You can use --cuda-version X.Y to specify another DSS-provided version (9.0, 10.0, 10.1, 10.2, 11.0, 11.2 and 11.8 are available on AlmaLinux 8, 11.8 only on AlmaLinux 9). If you require other CUDA versions, you have to create a custom image.
Depending on which CUDA version is installed in the base image you will need to use the corresponding tensorflow version.

Warning

After each upgrade of DSS, you must rebuild all base images and update code envs.

If you used a specific tag, go to the infrastructure settings, and in the “Base image tag” field, enter dataiku-apideployer-base-cuda:X.Y.Z

Create a cluster with GPUs¶

Follow GCP’s documentation for how to create a cluster with GPU accelerators (Note: you can also create a GPU-enabled node group in an existing cluster)

Don’t forget to run the “daemonset” installation procedure. This procedure needs several minutes to complete.

Add a custom reservation¶

In order for your API Deployer deployments to be located on nodes with GPU accelerators, and for GKE to configure the CUDA driver on your containers, the corresponding GKE pods must be created with a custom “limit” (in Kubernetes parlance) to indicate that you need a specific type of resource (standard resource types are CPU and memory)

You can configure this limit either at the infrastructure level (all deployments on this infrastructure will use GPUs) or at the deployment level.

At the infrastructure level¶

Go to Infrastructure > Settings
Go to “Sizing and Scaling”
In the “Custom limits” section, add a new entry with key: nvidia.com/gpu and value: 1 (to request 1 GPU)
Don’t forget to add the new entry, save settings

At the deployment level¶

Go to Deployment > Settings
Go to “Sizing and Scaling”
Enable override of infrastructure settings in the “Container limits” section
In the “Custom limits” section, add a new entry with key: nvidia.com/gpu and value: 1 (to request 1 GPU)
Don’t forget to add the new entry, and save settings

Deploy¶

You can now deploy your GPU-requiring deployments

This applies to:

Python functions (your endpoint needs to use a code environment that includes a CUDA-using package like tensorflow-gpu)
Python predictions (ditto)

Deployment on Google Kubernetes Engine¶

Setup¶

Create your GKE cluster¶

Prepare your local docker and kubectl commands¶

Setup the infrastructure¶

Deploy¶

Using GPUs¶

Building an image with CUDA support¶

Create a cluster with GPUs¶

Add a custom reservation¶

At the infrastructure level¶

At the deployment level¶

Deploy¶

Prepare your local `docker` and `kubectl` commands¶