Running on Azure Kubernetes Service¶
You can use container execution on Azure Kubernetes Service as a fully managed Kubernetes solution.
Create your ACR registry¶
Follow the Azure documentation on how to create your ACR registry. We recommend that you pay extra attention to the pricing plan since it is directly related to the registry storage capacity.
Create your AKS cluster¶
Follow Azure documentation on how to create your AKS cluster. We recommend that you allocate at least 15 GB of memory for each cluster node. More memory may be required if you plan on running very large in-memory recipes. Once the cluster is created, you must modify the registry IAM credentials to grant AKS access to ACR (Kubernetes secret mode is not supported). This is required for the worker nodes to pull the images from the registry.
You’ll be able to configure the memory allocation for each container and per-namespace using multiple container execution configurations.
Prepare your local
Follow Azure documentation to make sure that:
- Your local (on the DSS machine)
kubectlcommand can interact with the cluster. As of July 2018, this implies adding to the
KUBECONFIGpath the JSON file obtained with the
az aks get-credentials --resource-group resource_group --name cluster_namecommand
- Your local (on the DSS machine)
dockercommand can successfully push images to the ACR repository. As of July 2018, this implies logging into ACR with
az login --service-principal -p client_secret -u service_principal --tenant tenant_idthen
az acr login --name registry_name. If you use the same principal than the cluster principal, it must have write credentials onto the registry too.
Create the execution configuration¶
Build the base image as indicated in Setting up.
In Administration > Settings > Container exec, add a new execution config of type “Kubernetes”.
The image registry URL is
registry_name.azurecr.io/PREFIX, without the image name,
PREFIX is an optional prefix to triage your repositories.
You’re now ready to run recipes and models on AKS
Azure provides GPU-enabled instances with NVIDIA GPUs. Several steps are required in order to use them for container execution.
Build a CUDA-enabled base image¶
The base image that is built by default (see Setting up) does not have CUDA support and cannot use NVIDIA GPUs.
You need to build a CUDA-enabled base image.
Then create a new container configuration dedicated to running GPU workloads. If you specified a tag for the base image, report it in the “Base image tag” field.
Create a cluster with GPUs¶
Follow Azure documentation for how to create a cluster with GPU accelerators.
Add a custom reservation¶
In order for your container execution to be located on nodes with GPU accelerators, and for AKS to configure the CUDA driver on your containers, the corresponding AKS pods must be created with a custom “limit” (in Kubernetes parlance) to indicate that you need a specific type of resource (standard resource types are CPU and Memory). Also, NVIDIA dirvers should be mounted in the containers.
You must configure these in the container execution
- In the “Custom limits” section, add a new entry with key:
1(to request 1 GPU)
- Don’t forget to add the new entry
- In HostPath volume configuration, mount
- Don’t forget to add the new entry, save settings
You can now deploy your GPU-requiring recipes and models