Using unmanaged AKS clusters

Setup

Create your ACR registry

If you already have an Azure Container Registry (ACR) up and ready, you can skip this section and go to Create your AKS cluster.

Otherwise, follow the Azure documentation on how to create your ACR registry.

Warning

We recommend that you pay extra attention to the Azure container registry pricing plan, as it is directly related to the registry storage capacity.

Create your AKS cluster

To create your Azure Kubernetes Service (AKS) cluster, follow the Azure documentation on how to create your AKS cluster. We recommend that you allocate at least 16GB of memory for each cluster node.

Once the cluster is created, you must modify its IAM credentials to grant it access to ACR (Kubernetes secret mode is not supported). This is required for the worker nodes to pull images from the registry.

Prepare your local az, docker, and kubectl commands

Follow the Azure documentation to ensure the following on your local machine (where Dataiku DSS is installed):

  • The az command is properly logged in. As of October 2019, this implies running the az login --service-principal --username client_d --password client_secret --tenant tenant_id command. You must use a service principal that has sufficient IAM permissions to write to ACR and full control on AKS.

  • The docker command can successfully push images to the ACR repository. As of October 2019, this implies running the az acr login --name your-registry-name command.

  • The kubectl command can interact with the cluster. As of October 2019, this implies running the az aks get-credentials --resource-group your-rg --name your-cluster-name command.

Note

Cluster management has been tested with the following versions of Kubernetes:
  • 1.23

  • 1.24

  • 1.25

  • 1.26

  • 1.27

  • 1.28

  • 1.29

  • 1.30

  • 1.31

  • 1.32

There is no known issue with other Kubernetes versions.

Create base images

Build the base image by following these instructions.

Create a new containerized execution configuration

Go to Administration > Settings > Containerized execution, and add a new execution configuration of type “Kubernetes”.

  • In particular, to set up the image registry, the URL must be of the form your-registry-name.azurecr.io.

  • Finish by clicking Push base images.

You’re now ready to run recipes, notebooks and ML models in AKS.

Using GPUs

Azure provides GPU-enabled instances with NVidia GPUs. Several steps are required in order to use them for containerized execution.

Building an image with CUDA support

The base image that is built by default does not have CUDA support and cannot use NVidia GPUs.

CUDA support can be added to an image by:

  • installing CUDA system-wide (in /usr/local/cuda/) in the base image (see below)

  • installing CUDA system-wide in the code env image using container runtime additions

  • installing CUDA in the code env (in /opt/dataiku/code-env/) by requiring CUDA libraries (including nvidia-cuda-runtime)

To enable CUDA system-wide in the base image add the --with-cuda option to the command line:

./bin/dssadmin build-base-image --type container-exec --with-cuda

We recommend that you give this image a specific tag using the --tag option and keep the default base image “pristine”. We also recommend that you add the DSS version number in the image tag.

./bin/dssadmin build-base-image --type container-exec --with-cuda --tag dataiku-container-exec-base-cuda:X.Y.Z

where X.Y.Z is your DSS version number

Note

  • This image contains CUDA 11.8 and CuDNN 8.7 by default on AlmaLinux 9. You can use --cuda-version X.Y to specify another DSS-provided version (9.0, 10.0, 10.1, 10.2, 11.0, 11.2 and 11.8 are available on AlmaLinux 8, 11.8 only on AlmaLinux 9). If you require other CUDA versions, you have to create a custom image.

  • Depending on which CUDA version is installed in the base image you will need to use the corresponding tensorflow version.

Warning

After each upgrade of DSS, you must rebuild all base images and update code envs.

Thereafter, create a new container configuration dedicated to running GPU workloads. If you specified a tag for the base image, report it in the “Base image tag” field.

Create configuration and add a custom reservation

Create a new containerized execution configuration dedicated to running GPU workloads. If you specified a tag for the base image, report it in the “Base image tag” field.

In order for your container execution to be located on nodes with GPU accelerators, and for AKS to configure the CUDA driver on your containers, the corresponding AKS pods must be created with a custom “limit” (in Kubernetes parlance) to indicate that you need a specific type of resource (standard resource types are CPU and Memory). Also, NVidia drivers should be mounted in the containers.

To do so:

  • in the “Custom limits” section, add a new entry with key: alpha.kubernetes.io/nvidia-gpu and value: 1 (to request 1 GPU). Don’t forget to effectively add the new entry.

  • in “HostPath volume configuration”, mount /usr/local/nvidia as /usr/local/nvidia. Don’t forget to effectively add the new entry, and save the settings.

Create a cluster with GPUs

Follow Azure documentation for how to create a cluster with GPU accelerators.

Deploy

You can now deploy your GPU-requiring recipes and models.