Using GKE clusters as unmanaged clusters

Setup

Create your GKE cluster

Follow the GCP documentation on how to create your GKE cluster <https://cloud.google.com/kubernetes-engine/docs/quickstart>. We recommend that you allocate at least 16GB of memory for each cluster node. More memory may be required if you plan on running very large in-memory recipes.

You’ll be able to configure the memory allocation for each container and per-namespace in DSS using multiple containerized execution configurations.

Prepare your local gcloud, docker and kubectl commands

Follow the GCP documentation to make sure that:

  • Your local (on the DSS machine) gcloud command has the appropriate permission and scopes to:

    • push to the Google Container Registry (GCR) service
  • Your local (on the DSS machine) kubectl command is installed and can interact with the cluster. This can be achieved by running the gcloud container clusters get-credentials your-gke-cluster-name.

  • Your local (on the DSS machine) docker command is installed, can build images and push them to GCR. The latter can be enabled by running the gcloud auth configure-docker command.

Create the execution configuration

Build the base image as indicated in Setting up (Kubernetes).

In Administration > Settings > Containerized execution, add a new execution config of type “Kubernetes”.

  • In GCP, there is only a single shared image repository URL, gcr.io. Access control is based on image names. Therefore tha repository URL to use is gcr.io/your-gcp-project-name.

Finish by clicking on “Push base images”.

You’re now ready to run recipes and ML models in GKE.

Using GPUs

GCP provides GPU-enabled instances with NVIDIA GPUs. Several steps are required in order to use them for containerized execution.

Build a CUDA-enabled base image

The base image that is built by default (see Setting up (Kubernetes)) does not have CUDA support and cannot use NVIDIA GPUs.

You need to build a CUDA-enabled base image.

Then create a new container configuration dedicated to running GPU workloads. If you specified a tag for the base image, report it in the “Base image tag” field.

Enable GPU support on the cluster

Follow GCP’s documentation on how to create a GKE cluster with GPU accelerators <https://cloud.google.com/kubernetes-engine/docs/how-to/gpus>. You can also create a GPU-enabled node pool in an existing cluster.

Don’t forget to run the “DaemonSet” installation procedure, which needs several minutes to complete.

Add a custom reservation

In order for your containerized execution task to run on nodes with GPUs and for GKE to configure the CUDA driver on your containers, the corresponding pods must be created with a custom limit (in Kubernetes parlance) to indicate that you need a specific type of resource (standard resource types are CPU and memory).

You must configure this limit in the containerized execution configuration:

  • In the “Custom limits” section, add a new entry with key nvidia.com/gpu and value 1 (to request 1 GPU).
  • Don’t forget to add the new entry and save your settings.

Deploy

You can now deploy your GPU-based recipes and models.