Managed Kubernetes clusters

In addition to the ability to leverage existing (unmanaged) clusters, DSS can automatically start, stop and manage for you multiple clusters running on the major cloud providers.

DSS provides managed Kubernetes capabilities on:

  • Amazon Web Services through EKS
  • Azure through AKS
  • Google Cloud Platform through GKE

Creating clusters

Before you can create managed clusters, you must install the DSS plugin corresponding to your cloud provider. Please refer to the cloud-specific documentation linked above

  • Go to Administration > Clusters
  • Click on Create Cluster, and in “Type”, select the appropriate cluster type
  • Fill in the required parameter
  • Click on “Start/Attach”

Using managed clusters

Each project defines whether recipes/notebooks/… of this project run against unmanaged clusters, or one of the managed clusters

Importantly, you do not need to use per-cluster container runtime configurations, nor per-cluster Spark configurations (for Managed Spark on K8S). DSS will automatically use the requested cluster and the limits defined in the container runtime configuration.

Use a specific or dynamic cluster for scenarios

A common use case is to use, for running one or multiple scenarios:

  • either a specific named cluster (i.e. a cluster already defined in the DSS settings, but not the default cluster of the project)
  • or a dynamic cluster, created for the scenario and shutdown after the end of the scenario for fully elastic approaches

Use a specific static cluster

For this, you’ll use the variables expansion mechanism of DSS.

Instead of writing a cluster identifier as the contextual cluster to use at the project level, you can use the syntax ${variable_name}. At runtime, DSS will use the cluster denoted by the variable_name variable.

Your scenario will then use a scenario-scoped variable to define the cluster to use for the scenario.

For example, if you want to use the cluster regular1 for the “design” of the project, and all non-scenario-related activities, and the fast2 cluster for a scenario.

Setup your project as such:

  • Cluster: ${clusterForScenario}
  • Default cluster: regular1

With this setup, when the clusterForScenario variable is not defined (which will be the case outside of the scenario), DSS will fallback to regular1

In your scenario, add an initial step “Define scenario variables”, and use the following JSON definition:

{
        "clusterForScenario" : "fast2"
}

The steps of the scenario will execute on the fast2 cluster

Use a dynamic cluster

The idea here is to:

  • Create a dynamic cluster
  • Put the identifier of the dynamically-created cluster into a variable
  • Then use the variables expansion mechanism defined above

For example, if you want to use the cluster regular1 for the “design” of the project, and all non-scenario-related activities, and a dynamically-created cluster for a scenario.

Setup your project as such:

  • Cluster: ${clusterForScenario}
  • Default cluster: regular1

With this setup, when the clusterForScenario variable is not defined (which will be the case outside of the scenario), DSS will fallback to regular1

In your scenario, add an initial step “Setup cluster”:

  • Select the cluster type you want to create (depending on the plugin you are using)
  • Fill in the configuration form (depending on the plugin you are using)
  • Set clusterForScenario as the “Target variable”

When the step runs, DSS creates the cluster, and sets the id of the newly created cluster in the clusterForScenario variable. Given the project config, the steps of the scenario will automatically execute on the dynamically-created cluster.

At the end of the scenario (regardless of whether scenario succeeded or failed), DSS automatically stops the dynamic cluster (you can override this behavior in the scenario settings)

Warning

If DSS unexpectedly stops while the scenario is running, the cluster resources will keep running on your cloud provider. We recommend that you setup monitoring of your cloud resources created by DSS.

Automate start and stop of clusters

There are available scenario steps to start and stop clusters. This allows you for example to automatically start a cluster in the morning, that will be used by your users during the day, and then automatically shut it down at night, to save on Cloud consumption.

Permissions

Each cluster has an owner and groups who are granted access levels on the cluster:

  • Use cluster to be able to select the cluster and use it in a project
  • Operate cluster to be able to modify cluster settings
  • Manage cluster users to be able to manage the permissions of the cluster

In addition, each group can be granted global permissions to:

  • Create clusters and manage the clusters they created
  • Manage all clusters, including the ones they are not explicitely granted access to