Using managed GKE clusters

Initial setup

Install the GKE plugin

Install the GKE plugin from the store.

Prepare your local gcloud, docker and kubectl commands

Follow the GCP documentation to make sure that:

  • Your local (on the DSS machine) gcloud command has the appropriate permissions and scopes to:

    • push to the Google Container Registry (GCR) service
    • have full control on the GKE service.
  • Your local (on the DSS machine) kubectl command is installed

  • Your local (on the DSS machine) docker command is installed, can build images and push them to GCR. The latter can be enabled by running the gcloud auth configure-docker command.

Create base images

Build the base image as indicated in Setting up (Kubernetes).

Create a new execution config

In Administration > Settings > Containerized execution, add a new execution config of type “Kubernetes.”

  • In GCP, there is only a single shared image repository URL, gcr.io. Access control is based on image names. Therefore tha repository URL to use is gcr.io/your-gcp-project-name.

Click on “Push base images.”

Cluster configuration

Connection

The connection is where you define how to connect to GCP. This can be done either inline in each cluster (not recommended), or as a preset in the “GKE connection” plugin settings (recommended).

Network settings

The network field refers to the Virtual Private Network (VPC) where the cluster will be deployed. The sub-network field defines the IP space within that VPC where the pod IPs will be allocated. If left blank, those fields will use default network settings which specifics are explained in the GCP documentation.

Cluster nodes

This is where you define the number and type of nodes you want in your cluster. You can define the properties of a node pool either inline (not recommended) or as a preset in the “Node pools” plugin settings (recommended). You have the possibility to define multiple node pools, each with its own properties.

Using GPUs

GCP provides GPU-enabled instance with NVIDIA GPUs. Several steps are required in order to use them for containerized execution.

Build a CUDA-enabled base image

The base image that is built by default (see Setting up (Kubernetes)) does not have CUDA support and cannot use NVIDIA GPUs.

You need to build a CUDA-enabled base image.

Then create a new container configuration dedicated to running GPU workloads. If you specified a tag for the base image, report it in the “Base image tag” field.

Enable GPU support on the cluster

When you create your cluster using the GKE plugin, in the node pool settings make sure that you have the “With GPU” option enabled. Follow the GCP documentation to select the GPU type.

At cluster creation, the plugin will run the NVIDIA driver “DaemonSet” installation procedure, which needs several minutes to complete.

Add a custom reservation

In order for your containerized execution task to run on nodes with GPUs and for GKE to configure the CUDA driver on your containers, the corresponding pods must be created with a custom limit (in Kubernetes parlance) to indicate that you need a specific type of resource (standard resource types are CPU and memory).

You must configure this limit in the containerized execution configuration:

  • In the “Custom limits” section, add a new entry with key nvidia.com/gpu and value 1 (to request 1 GPU).
  • Don’t forget to add the new entry and save your settings.

Deploy

You can now deploy your GPU-based recipes and models.