Running on Azure Kubernetes Service

You can use container execution on Azure Kubernetes Service as a fully managed Kubernetes solution.

Setup

Create your ACR registry

Follow the Azure documentation on how to create your ACR registry. We recommend that you pay extra attention to the pricing plan since it is directly related to the registry storage capacity.

Create your AKS cluster

Follow Azure documentation on how to create your AKS cluster. We recommend that you allocate at least 15 GB of memory for each cluster node. More memory may be required if you plan on running very large in-memory recipes. Once the cluster is created, you must modify the registry IAM credentials to grant AKS access to ACR (Kubernetes secret mode is not supported). This is required for the worker nodes to pull the images from the registry.

You’ll be able to configure the memory allocation for each container and per-namespace using multiple container execution configurations.

Prepare your local docker and kubectl commands

Follow Azure documentation to make sure that:

  • Your local (on the DSS machine) kubectl command can interact with the cluster. As of July 2018, this implies adding to the KUBECONFIG path the JSON file obtained with the az aks get-credentials --resource-group resource_group --name cluster_name command
  • Your local (on the DSS machine) docker command can successfully push images to the ACR repository. As of July 2018, this implies logging into ACR with az login --service-principal -p client_secret -u service_principal --tenant tenant_id then az acr login --name registry_name. If you use the same principal than the cluster principal, it must have write credentials onto the registry too.

Create the execution configuration

Build the base image as indicated in Setting up.

In Administration > Settings > Container exec, add a new execution config of type “Kubernetes”.

The image registry URL is registry_name.azurecr.io/PREFIX, without the image name, where PREFIX is an optional prefix to triage your repositories.

You’re now ready to run recipes and models on AKS

Using GPUs

Azure provides GPU-enabled instances with NVIDIA GPUs. Several steps are required in order to use them for container execution.

Build a CUDA-enabled base image

The base image that is built by default (see Setting up) does not have CUDA support and cannot use NVIDIA GPUs.

You need to build a CUDA-enabled base image.

Then create a new container configuration dedicated to running GPU workloads. If you specified a tag for the base image, report it in the “Base image tag” field.

Create a cluster with GPUs

Follow Azure documentation for how to create a cluster with GPU accelerators.

Add a custom reservation

In order for your container execution to be located on nodes with GPU accelerators, and for AKS to configure the CUDA driver on your containers, the corresponding AKS pods must be created with a custom “limit” (in Kubernetes parlance) to indicate that you need a specific type of resource (standard resource types are CPU and Memory). Also, NVIDIA dirvers should be mounted in the containers.

You must configure these in the container execution

  • In the “Custom limits” section, add a new entry with key: alpha.kubernetes.io/nvidia-gpu and value: 1 (to request 1 GPU)
  • Don’t forget to add the new entry
  • In HostPath volume configuration, mount /usr/local/nvidia as /usr/local/nvidia
  • Don’t forget to add the new entry, save settings

Deploy

You can now deploy your GPU-requiring recipes and models