Running on Amazon Elastic Kubernetes Service

You can use container execution on Amazom Elastic Kubernetes Service as a fully managed Kubernetes solution.

Setup

Create your EKS cluster

Follow AWS documentation on how to create your EKS cluster. We recommend that you allocate at least 15 GB of memory for each cluster node. More memory may be required if you plan on running very large in-memory recipes.

You’ll be able to configure the memory allocation for each container and per-namespace using multiple container execution configurations.

Prepare your local aws, docker and kubectl commands

Follow AWS documentation to make sure that:

  • Your local (on the DSS machine) aws ecr command can list and create docker image repositories, and authenticate docker for image push.
  • Your local (on the DSS machine) kubectl command can interact with the cluster.
  • Your local (on the DSS machine) docker command can successfully push images to the ECR repository.

Create the execution configuration

Build the base image as indicated in Setting up.

In Administration > Settings > Container exec, add a new execution config of type “Kubernetes”.

  • The image registry URL is the one given by aws ecr describe-repositories, without the image name. It typically looks like XXXXXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/PREFIX, where XXXXXXXXXXXX is your AWS account ID, us-east-1 is the AWS region for the repository and PREFIX is an optional prefix to triage your repositories.

  • The image pre-push hook should be a script that takes care of the Amazon ECR requirements when pushing images. ECR mandates that:

    • a aws ecr create-repository call is performed when pushing to a new repository (in Docker parlance, when using an image example.com/prefix/image-name:image-tag, the repository is example.com/prefix/image-name)
    • the docker client must be authenticated to push to this repository, which one can do using the aws ecr get-login command

    DSS comes with a sample script for simple EKS/ECR deployment, you can set the pre-push hook to INSTALL_DIR/resources/container-exec/kubernetes/aws-ecr-prepush.sh, where INSTALL_DIR is the full path of DSS installation directory (containing the installer.sh script).

You’re now ready to run recipes and models on EKS

Using GPUs

AWS provides GPU-enabled instances with NVIDIA GPUs. Several steps are required in order to use them for container execution

Build a CUDA-enabled base image

The base image that is built by default (see Setting up) does not have CUDA support and cannot use NVIDIA GPUs.

You need to build a CUDA-enabled base image.

Then create a new container configuration dedicated to running GPU workloads. If you specified a tag for the base image, report it in the “Base image tag” field.

Enable GPU support on the cluster

To execute containers leveraging GPUs, your worker nodes as well as the control plane need to support them. The following instructions describe a simplified way to achieve this. It is subject to variations depending on the underlying hardware and software version requirements for your projects.

To make a worker node able to leverage its GPUs:

Finally, enable the cluster GPU support with the nvidia device plugin. Be careful to select the version that matches your Kubernetes version (v1.10 as of July 2018).

Add a custom reservation

In order for your container execution to be located on nodes with GPU accelerators, and for EKS to configure the CUDA driver on your containers, the corresponding EKS pods must be created with a custom “limit” (in Kubernetes parlance) to indicate that you need a specific type of resource (standard resource types are CPU and Memory)

You must configure this limit in the container execution

  • In the “Custom limits” section, add a new entry with key: nvidia.com/gpu and value: 1 (to request 1 GPU)
  • Don’t forget to add the new entry, save settings

Deploy

You can now deploy your GPU-requiring recipes and models