Using EKS clusters as unmanaged clusters

Setup

Create your EKS cluster

Follow AWS documentation on how to create your EKS cluster. We recommend that you allocate at least 15 GB of memory for each cluster node. More memory may be required if you plan on running very large in-memory recipes.

You’ll be able to configure the memory allocation for each container and per-namespace using multiple ontainerized execution configurations.

Prepare your local aws, docker and kubectl commands

Follow AWS documentation to make sure that:

  • Your local (on the DSS machine) aws ecr command can list and create docker image repositories, and authenticate docker for image push.
  • Your local (on the DSS machine) kubectl command can interact with the cluster.
  • Your local (on the DSS machine) docker command can successfully push images to the ECR repository.

Create the execution configuration

Build the base image as indicated in Setting up (Kubernetes).

In Administration > Settings > Containerized execution, add a new execution config of type “Kubernetes”.

  • The image registry URL is the one given by aws ecr describe-repositories, without the image name. It typically looks like XXXXXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/PREFIX, where XXXXXXXXXXXX is your AWS account ID, us-east-1 is the AWS region for the repository and PREFIX is an optional prefix to triage your repositories.
  • Set “Image pre-push hook” to “Enable push to ECR”

You’re now ready to run recipes and models on EKS

Using GPUs

AWS provides GPU-enabled instances with NVIDIA GPUs. Several steps are required in order to use them for ontainerized execution.

Build a CUDA-enabled base image

The base image that is built by default (see Setting up (Kubernetes)) does not have CUDA support and cannot use NVIDIA GPUs.

You need to build a CUDA-enabled base image.

Then create a new container configuration dedicated to running GPU workloads. If you specified a tag for the base image, report it in the “Base image tag” field.

Enable GPU support on the cluster

To execute containers leveraging GPUs, your worker nodes as well as the control plane need to support them. The following instructions describe a simplified way to achieve this. It is subject to variations depending on the underlying hardware and software version requirements for your projects.

To make a worker node able to leverage its GPUs:

Finally, enable the cluster GPU support with the nvidia device plugin. Be careful to select the version that matches your Kubernetes version (v1.10 as of July 2018).

Add a custom reservation

In order for your container execution to be located on nodes with GPU accelerators, and for EKS to configure the CUDA driver on your containers, the corresponding EKS pods must be created with a custom “limit” (in Kubernetes parlance) to indicate that you need a specific type of resource (standard resource types are CPU and Memory)

You must configure this limit in the containerized execution configuration

  • In the “Custom limits” section, add a new entry with key: nvidia.com/gpu and value: 1 (to request 1 GPU)
  • Don’t forget to add the new entry, save settings

Deploy

You can now deploy your GPU-requiring recipes and models