Clusters

The API offers methods to:

  • Start, stop or delete clusters

  • Read and write settings of clusters

  • Get the status of clusters

Clusters may be listed, created and obtained using methods of the DSSClient:

DSSClusterSettings is an opaque type and its content is specific to each cluster provider. It is therefore strongly advised to use scenario steps to create/start/delete clusters, as this will greatly help define a consistent configuration.

Starting a managed cluster

import logging
logger = logging.getLogger("my.package")

client = dataiku.api_client()

cluster_id = "my_cluster_id"

# Obtain a handle on the cluster
my_cluster = client.get_cluster(cluster_id)

# Start the cluster. This operation is synchronous. An exception is thrown in case of error
try:
    my_cluster.start()
    logger.info("Cluster {} started".format(cluster_id))
except:
    logger.exception("Could not start cluster: {}".format(e))

Getting the status of a cluster

import logging
logger = logging.getLogger("my.package")

client = dataiku.api_client()

cluster_id = "my_cluster_id"

# Obtain a handle on the cluster
my_cluster = client.get_cluster(cluster_id)

# Get status
status = my_cluster.get_status()

logger.info("Cluster status is {}".format(status))

Reference documentation

class dataikuapi.dss.admin.DSSCluster(client, cluster_id)

A handle to interact with a cluster on the DSS instance. Do not create this object directly, use dataikuapi.DSSClient.get_cluster() instead.

delete()

Deletes the cluster. This does not previously stop it.

get_settings()

Get the cluster’s settings. This includes opaque data for the cluster if this is a started managed cluster.

The returned object can be used to save settings.

Returns

a DSSClusterSettings object to interact with cluster settings

Return type

DSSClusterSettings

set_definition(cluster)

Set the cluster’s definition. The definition should come from a call to the get_definition() method.

Parameters

cluster – a cluster definition

Returns:

the updated cluster definition, as a JSON object

get_status()

Get the cluster’s status and usage

Returns

The cluster status, as a DSSClusterStatus object

Return type

DSSClusterStatus

start()

Starts or attaches the cluster.

This operation is only valid for a managed cluster.

stop(terminate=True, force_stop=False)

Stops or detaches the cluster

This operation is only valid for a managed cluster.

Parameters
  • terminate (bool) – whether to delete the cluster after stopping it

  • force_stop (bool) – whether to try to force stop the cluster, useful if DSS expects the cluster to already be stopped

run_kubectl(args)

Runs an arbitrary kubectl command on the cluster.

This operation is only valid for a Kubernetes cluster.

Note: this call requires an API key with DSS instance admin rights

Parameters

args (str) – the arguments to pass to kubectl (without the “kubectl”)

Returns

a dict containing the return value, standard output, and standard error of the command

Return type

dict

delete_finished_jobs(delete_failed=False, namespace=None, label_filter=None, dry_run=False)

Runs a kubectl command to delete finished jobs.

This operation is only valid for a Kubernetes cluster.

Parameters
  • delete_failed (bool) – if True, delete both completed and failed jobs, otherwise only delete completed jobs

  • namespace (str) – the namespace in which to delete the jobs, if None, uses the namespace set in kubectl’s current context

  • label_filter (str) – delete only jobs matching a label filter

  • dry_run (bool) – if True, execute the command as a “dry run”

Returns

a dict containing whether the deletion succeeded, a list of deleted job names, and debug info for the underlying kubectl command

Return type

dict

delete_finished_pods(namespace=None, label_filter=None, dry_run=False)

Runs a kubectl command to delete finished (succeeded and failed) pods.

This operation is only valid for a Kubernetes cluster.

Parameters
  • namespace (str) – the namespace in which to delete the pods, if None, uses the namespace set in kubectl’s current context

  • label_filter (str) – delete only pods matching a label filter

  • dry_run (bool) – if True, execute the command as a “dry run”

Returns

a dict containing whether the deletion succeeded, a list of deleted pod names, and debug info for the underlying kubectl command

Return type

dict

delete_all_pods(namespace=None, label_filter=None, dry_run=False)

Runs a kubectl command to delete all pods.

This operation is only valid for a Kubernetes cluster.

Parameters
  • namespace (str) – the namespace in which to delete the pods, if None, uses the namespace set in kubectl’s current context

  • label_filter (str) – delete only pods matching a label filter

  • dry_run (bool) – if True, execute the command as a “dry run”

Returns

a dict containing whether the deletion succeeded, a list of deleted pod names, and debug info for the underlying kubectl command

Return type

dict

class dataikuapi.dss.admin.DSSClusterSettings(client, cluster_id, settings)

The settings of a cluster. Do not create this object directly, use DSSCluster.get_settings() instead.

get_raw()

Gets all settings as a raw dictionary. This returns a reference to the raw settings, not a copy, so changes made to the returned object will be reflected when saving.

Fields that can be updated:
  • permissions, usableByAll, owner

  • params

get_plugin_data()

If this is a managed attached cluster, returns the opaque data returned by the cluster’s start operation. Else, returns None.

You should generally not modify this

save()

Saves back the settings to the cluster

class dataikuapi.dss.admin.DSSClusterStatus(client, cluster_id, status)

The status of a cluster. Do not create this object directly, use DSSCluster.get_status() instead.

get_raw()

Gets the whole status as a raw dictionary.