Reference API documentation

This is the reference API documentation of the Python API client.

Classes Summary

Core classes

dataikuapi.dssclient.DSSClient(host[, …]) Entry point for the DSS API client
dataikuapi.dss.project.DSSProject(client, …) A handle to interact with a project on the DSS instance.
dataikuapi.dss.dataset.DSSDataset(client, …) A dataset on the DSS instance
dataikuapi.dss.managedfolder.DSSManagedFolder(…) A managed folder on the DSS instance
dataikuapi.dss.job.DSSJob(client, …) A job on the DSS instance
dataikuapi.dss.meaning.DSSMeaning(client, id) A user-defined meaning on the DSS instance
dataikuapi.dss.metrics.ComputedMetrics(raw)

Administration

dataikuapi.dss.admin.DSSUser(client, login) A handle for a user on the DSS instance
dataikuapi.dss.admin.DSSGroup(client, name) A group on the DSS instance
dataikuapi.dss.admin.DSSConnection(client, name) A connection on the DSS instance
dataikuapi.dss.admin.DSSGeneralSettings(client) The general settings of the DSS instance
dataikuapi.dss.admin.DSSUserImpersonationRule([raw]) Helper to build user-level rule items for the impersonation settings
dataikuapi.dss.admin.DSSGroupImpersonationRule([raw]) Helper to build group-level rule items for the impersonation settings

Scenarios

dataikuapi.dss.scenario.DSSScenario(client, …) A handle to interact with a scenario on the DSS instance
dataikuapi.dss.scenario.DSSScenarioRun(…) A handle containing basic info about a past run of a scenario.
dataikuapi.dss.scenario.DSSTriggerFire(…) The activation of a trigger on the DSS instance

API node services

dataikuapi.dss.apiservice.DSSAPIService(…) An API Service on the DSS instance

Core classes

class dataikuapi.dssclient.DSSClient(host, api_key=None, internal_ticket=None)

Entry point for the DSS API client

list_futures(as_objects=False, all_users=False)

List the currently-running long tasks (a.k.a futures)

Returns:
list of futures. Each object contains at least a ‘jobId’ field
list_running_scenarios(all_users=False)

List the running scenarios

Returns:
the list of scenarios, each one as a JSON object containing a jobId field for the future hosting the scenario run, and a payload field with scenario identifiers
get_future(job_id)

Get a handle to interact with a specific long task (a.k.a future).

Args:
job_id: the job_id key of the desired future
Returns:
A dataikuapi.dss.future.DSSFuture
list_running_notebooks(as_objects=True)

List the currently-running notebooks

Returns:
list of notebooks. Each object contains at least a ‘name’ field
list_project_keys()

List the project keys (=project identifiers).

Returns:
list of identifiers (=strings)
list_projects()

List the projects

Returns:
list of objects. Each object contains at least a ‘projectKey’ field
get_project(project_key)

Get a handle to interact with a specific project.

Args:
project_key: the project key of the desired project
Returns:
A dataikuapi.dss.project.DSSProject
create_project(project_key, name, owner, description=None, settings=None)

Create a project, and return a project handle to interact with it.

Note: this call requires an API key with admin rights

Args:
project_key: the identifier to use for the project. name: the name for the project. owner: the owner of the project. description: a short description for the project.
Returns:
A dataikuapi.dss.project.DSSProject project handle
list_plugins()

List the installed plugins

Returns:
list of objects. Each object contains at least a ‘projectKey’ field
get_plugin(plugin_id)

Get a handle to interact with a specific dev plugin.

Args:
plugin_id: the identifier of the desired plugin
Returns:
A dataikuapi.dss.project.DSSPlugin
sql_query(query, connection=None, database=None, dataset_full_name=None, pre_queries=None, post_queries=None, type='sql', extra_conf={})

Initiate a SQL, Hive or Impala query and get a handle to retrieve the results of the query. Internally, the query is run by DSS. The database to run the query on is specified either by passing a connection name, or by passing a database name, or by passing a dataset full name (whose connection is then used to retrieve the database)

Args:
query: the query to run connection: the connection on which the query should be run (exclusive of database and dataset_full_name) database: the database on which the query should be run (exclusive of connection and dataset_full_name) dataset_full_name: the dataset on the connection of which the query should be run (exclusive of connection and database) pre_queries: (optional) array of queries to run before the query post_queries: (optional) array of queries to run after the query type: the type of query : either ‘sql’, ‘hive’ or ‘impala’
Returns:
A dataikuapi.dss.sqlquery.DSSSQLQuery query handle
list_users()

List all users setup on the DSS instance

Note: this call requires an API key with admin rights

Returns:A list of users, as an array of dicts.
get_user(login)

Get a handle to interact with a specific user

Parameters:login (str) – the login of the desired user
Returns:A dataikuapi.dss.admin.DSSUser user handle
create_user(login, password, display_name='', source_type='LOCAL', groups=[], profile='DATA_SCIENTIST')

Create a user, and return a handle to interact with it

Note: this call requires an API key with admin rights

Parameters:
  • login (str) – the login of the new user
  • password (str) – the password of the new user
  • display_name (str) – the displayed name for the new user
  • source_type (str) – the type of new user. Admissible values are ‘LOCAL’ or ‘LDAP’
  • groups (list) – the names of the groups the new user belongs to
  • profile (str) – The profile for the new user, can be one of READER, DATA_ANALYST or DATA_SCIENTIST
Returns:

A dataikuapi.dss.admin.DSSUser user handle

list_groups()

List all groups setup on the DSS instance

Note: this call requires an API key with admin rights

Returns:
A list of groups, as an array of JSON objects
get_group(name)

Get a handle to interact with a specific group

Args:
name: the name of the desired group
Returns:
A dataikuapi.dss.admin.DSSGroup group handle
create_group(name, description=None, source_type='LOCAL')

Create a group, and return a handle to interact with it

Note: this call requires an API key with admin rights

Args:
name: the name of the new group description: a description of the new group source_type: the type of the new group. Admissible values are ‘LOCAL’, ‘LDAP’, ‘SAAS’
Returns:
A dataikuapi.dss.admin.DSSGroup group handle
list_connections()

List all connections setup on the DSS instance

Note: this call requires an API key with admin rights

Returns:
All connections, as a map of connection name to connection definition
get_connection(name)

Get a handle to interact with a specific connection

Args:
name: the name of the desired connection
Returns:
A dataikuapi.dss.admin.DSSConnection connection handle
create_connection(name, type, params={}, usable_by='ALL', allowed_groups=[])

Create a connection, and return a handle to interact with it

Note: this call requires an API key with admin rights

Parameters:
  • name – the name of the new connection
  • type – the type of the new connection
  • params – the parameters of the new connection, as a JSON object
  • usable_by – the type of access control for the connection. Either ‘ALL’ (=no access control) or ‘ALLOWED’ (=access restricted to users of a list of groups)
  • allowed_groups – when using access control (that is, setting usable_by=’ALLOWED’), the list of names of the groups whose users are allowed to use the new connection
Returns:

A dataikuapi.dss.admin.DSSConnection connection handle

list_code_envs()

List all code envs setup on the DSS instance

Note: this call requires an API key with admin rights

Returns:
List code envs (name, type, language)
get_code_env(env_lang, env_name)

Get a handle to interact with a specific code env

Args:
name: the name of the desired code env
Returns:
A dataikuapi.dss.admin.DSSCodeEnv code env handle
create_code_env(env_lang, env_name, deployment_mode, params=None)

Create a code env, and return a handle to interact with it

Note: this call requires an API key with admin rights

Parameters:
  • env_lang – the language (Python, R) of the new code env
  • env_name – the name of the new code env
  • deployment_mode – the type of the new code env
  • params – the parameters of the new code env, as a JSON object
Returns:

A dataikuapi.dss.admin.DSSCodeEnv code env handle

list_global_api_keys()

List all global API keys set up on the DSS instance

Note: this call requires an API key with admin rights

Returns:
All global API keys, as a list
get_global_api_key(key)

Get a handle to interact with a specific Global API key

Args:
key: the secret key of the desired API key
Returns:
A dataikuapi.dss.admin.DSSGlobalApiKey API key handle
create_global_api_key(label=None, description=None, admin=False)

Create a Global API key, and return a handle to interact with it

Note: this call requires an API key with admin rights

Args:
label: the label of the new API key description: the description of the new API key admin: has the new API key admin rights (True or False)
Returns:
A dataikuapi.dss.admin.DSSGlobalApiKey API key handle
list_meanings()

List all user-defined meanings on the DSS instance

Note: this call requires an API key with admin rights

Returns:
A list of meanings, as an array of JSON objects
get_meaning(id)

Get a handle to interact with a specific user-defined meaning

Note: this call requires an API key with admin rights

Args:
id: the ID of the desired meaning
Returns:
A dataikuapi.dss.meaning.DSSMeaning meaning handle
create_meaning(id, label, type, description=None, values=None, mappings=None, pattern=None, normalizationMode=None, detectable=False)

Create a meaning, and return a handle to interact with it

Note: this call requires an API key with admin rights

Parameters:
  • id – the ID of the new meaning
  • type – the type of the new meaning. Admissible values are ‘DECLARATIVE’, ‘VALUES_LIST’, ‘VALUES_MAPPING’ and ‘PATTERN’
  • (optional) (detectable) – the description of the new meaning
  • (optional) – when type is ‘VALUES_LIST’, the list of values
  • (optional) – when type is ‘VALUES_MAPPING’, the mapping, as a list of objects with this structure: {‘from’: ‘value_1’, ‘to’: ‘value_a’}
  • (optional) – when type is ‘PATTERN’, the pattern
  • (optional) – when type is ‘VALUES_LIST’, ‘VALUES_MAPPING’ or ‘PATTERN’, the normalization mode to use for value matching. One of ‘EXACT’, ‘LOWERCASE’, or ‘NORMALIZED’ (not available for ‘PATTERN’ type). Defaults to ‘EXACT’.
  • (optional) – whether DSS should consider assigning the meaning to columns set to ‘Auto-detect’. Defaults to False.
Returns:

A dataikuapi.dss.meaning.DSSMeaning meaning handle

list_logs()

List all available log files on the DSS instance This call requires an API key with admin rights

Returns:A list of log names
get_log(name)

Get the contents of a specific log file This call requires an API key with admin rights

Parameters:name (str) – the name of the desired log file (obtained with list_logs())
Returns:The full content of the log file, as a string
get_variables()

Get the DSS instance’s variables, as a Python dictionary

This call requires an API key with admin rights

Returns:a Python dictionary of the instance-level variables
set_variables(variables)

Updates the DSS instance’s variables

This call requires an API key with admin rights

It is not possible to update a single variable, you must set all of them at once. Thus, you should only use a variables parameter that has been obtained using get_variables().

Parameters:variables (dict) – the new dictionary of all variables of the instance
get_general_settings()

Gets a handle to interact with the general settings.

This call requires an API key with admin rights

Returns:a dataikuapi.dss.admin.DSSGeneralSettings handle
create_project_from_bundle_local_archive(archive_path)
create_project_from_bundle_archive(fp)
prepare_project_import(f)

Prepares import of a project archive

Parameters:fp (file-like) – the input stream, as a file-like object
Returns:a TemporaryImportHandle to interact with the prepared import
catalog_index_connections(connection_names=[], all_connections=False, indexing_mode='FULL')
class dataikuapi.dssclient.TemporaryImportHandle(client, import_id)
execute(settings={})

Executes the import with provided settings. @warning: You must check the ‘success’ flag

class dataikuapi.dss.project.DSSProject(client, project_key)

A handle to interact with a project on the DSS instance. Do not create this class directly, instead use client.api_client where client is a DSSClient

delete()

Delete the project

This call requires an API key with admin rights

get_export_stream(options={})

Return a stream of the exported project You need to close the stream after download. Failure to do so will reuse in the DSSClient becoming unusable.

export_to_file(path, options={})

Export the project to a file

Parameters:path – the path of the file in which the exported project should be saved
get_metadata()

Get the metadata attached to this project. The metadata contains label, description checklists, tags and custom metadata of the project

Returns:
a dict object. For more information on available metadata, please see https://doc.dataiku.com/dss/api/latest
set_metadata(metadata)

Set the metadata on this project.

Args:
metadata: the new state of the metadata for the project. You should only set a metadata object that has been retrieved using the get_metadata call.
get_permissions()

Get the permissions attached to this project

Returns:
a JSON object, containing the owner and the permissions, as a list of pairs of group name and permission type
set_permissions(permissions)

Set the permissions on this project

Args:
permissions: a JSON object of the same structure as the one returned by get_permissions call
list_datasets()

List the datasets in this project

Returns:
the list of the datasets, each one as a JSON object
get_dataset(dataset_name)

Get a handle to interact with a specific dataset

Args:
dataset_name: the name of the desired dataset
Returns:
A dataikuapi.dss.dataset.DSSDataset dataset handle
create_dataset(dataset_name, type, params={}, formatType=None, formatParams={})

Create a new dataset in the project, and return a handle to interact with it

Args:
dataset_name: the name for the new dataset type: the type of the dataset params: the parameters for the type, as a JSON object formatType: an optional format to create the dataset with formatParams: the parameters to the format, as a JSON object
Returns:
A dataikuapi.dss.dataset.DSSDataset dataset handle
list_saved_models()

List the saved models in this project

Returns:
the list of the saved models, each one as a JSON object
get_saved_model(sm_id)

Get a handle to interact with a specific saved model

Args:
sm_id: the identifier of the desired saved model
Returns:
A dataikuapi.dss.savedmodel.DSSSavedModel saved model handle
list_managed_folders()

List the managed folders in this project

Returns:
the list of the managed folders, each one as a JSON object
get_managed_folder(odb_id)

Get a handle to interact with a specific managed folder

Args:
odb_id: the identifier of the desired managed folder
Returns:
A dataikuapi.dss.managedfolder.DSSManagedFolder managed folder handle
create_managed_folder(name)

Create a new managed folder in the project, and return a handle to interact with it

Args:
name: the name of the managed folder
Returns:
A dataikuapi.dss.managedfolder.DSSManagedFolder managed folder handle
list_jobs()

List the jobs in this project

Returns:
a list of the jobs, each one as a JSON object, containing both the definition and the state
get_job(id)

Get a handler to interact with a specific job

Returns:
A dataikuapi.dss.job.DSSJob job handle
start_job(definition)

Create a new job, and return a handle to interact with it

Args:
definition: the definition for the job to create. The definition must contain the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD) and a list of outputs to build. Optionally, a refreshHiveMetastore field can specify whether to re-synchronize the Hive metastore for recomputed HDFS datasets.
Returns:
A dataikuapi.dss.job.DSSJob job handle
start_job_and_wait(definition, no_fail=False)

Create a new job. Wait the end of the job to complete.

Args:
definition: the definition for the job to create. The definition must contain the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD) and a list of outputs to build. Optionally, a refreshHiveMetastore field can specify whether to re-synchronize the Hive metastore for recomputed HDFS datasets.
new_job_definition_builder(job_type='NON_RECURSIVE_FORCED_BUILD')
get_variables()

Gets the variables of this project.

Returns:
a dictionary containing two dictionaries : “standard” and “local”. “standard” are regular variables, exported with bundles. “local” variables are not part of the bundles for this project
set_variables(obj)

Sets the variables of this project. @param obj: must be a modified version of the object returned by get_variables

list_api_services()

List the API services in this project

Returns:
the list of API services, each one as a JSON object
get_api_service(service_id)

Get a handle to interact with a specific API service

Args:
service_id: the ID of the desired API service
Returns:
A dataikuapi.dss.dataset.DSSAPIService API Service handle
list_exported_bundles()
export_bundle(bundle_id)
get_exported_bundle_archive_stream(bundle_id)

Download a bundle archive that can be deployed in a DSS automation Node, as a binary stream. Warning: this stream will monopolize the DSSClient until closed.

download_exported_bundle_archive_to_file(bundle_id, path)

Download a bundle archive that can be deployed in a DSS automation Node into the given output file. @param path if “-“, will write to /dev/stdout

list_imported_bundles()
import_bundle_from_archive(archive_path)
import_bundle_from_stream(fp)
activate_bundle(bundle_id)
preload_bundle(bundle_id)
list_scenarios()

List the scenarios in this project.

This method returns a list of Python dictionaries. Each dictionary represents a scenario. Each dictionary contains at least a “id” field, that you can then pass to the get_scenario()

Returns:the list of scenarios, each one as a Python dictionary
get_scenario(scenario_id)

Get a handle to interact with a specific scenario

Parameters:str – scenario_id: the ID of the desired scenario
Returns:a dataikuapi.dss.scenario.DSSScenario scenario handle
create_scenario(scenario_name, type, definition={})

Create a new scenario in the project, and return a handle to interact with it

Parameters:
  • scenario_name (str) – The name for the new scenario. This does not need to be unique (although this is strongly recommended)
  • type (str) – The type of the scenario. MUst be one of ‘step_based’ or ‘custom_python’
  • definition (object) – the JSON definition of the scenario. Use get_definition on an existing DSSScenario object in order to get a sample definition object
Returns:

a scenario.DSSScenario handle to interact with the newly-created scenario

list_recipes()

List the recipes in this project

Returns:
the list of the recipes, each one as a JSON object
get_recipe(recipe_name)

Get a handle to interact with a specific recipe

Args:
recipe_name: the name of the desired recipe
Returns:
A dataikuapi.dss.recipe.DSSRecipe recipe handle
create_recipe(recipe_proto, creation_settings)

Create a new recipe in the project, and return a handle to interact with it. We strongly recommend that you use the creator helpers instead of calling this directly.

Some recipe types require additional parameters in creation_settings:

  • ‘grouping’ : a ‘groupKey’ column name
  • ‘python’, ‘sql_query’, ‘hive’, ‘impala’ : the code of the recipe as a ‘payload’ string
Args:
recipe_proto: a prototype for the recipe object. Must contain at least ‘type’ and ‘name’ creation_settings: recipe-specific creation settings
Returns:
A dataikuapi.dss.recipe.DSSRecipe recipe handle
sync_datasets_acls()

Resync permissions on HDFS datasets in this project

Returns:
a DSSFuture handle to the task of resynchronizing the permissions

Note: this call requires an API key with admin rights

list_running_notebooks(as_objects=True)

List the currently-running notebooks

Returns:
list of notebooks. Each object contains at least a ‘name’ field
get_tags()

List the tags of this project.

Returns:
a dictionary containing the tags with a color
set_tags(tags={})

Set the tags of this project. @param obj: must be a modified version of the object returned by list_tags

list_macros(as_objects=False)

List the macros accessible in this project

Parameters:as_objects – if True, return the macros as dataikuapi.dss.macro.DSSMacro macro handles instead of raw JSON
Returns:the list of the macros
get_macro(runnable_type)

Get a handle to interact with a specific macro

Parameters:runnable_type – the identifier of a macro
Returns:A dataikuapi.dss.macro.DSSMacro macro handle
class dataikuapi.dss.project.JobDefinitionBuilder(project_key, job_type='NON_RECURSIVE_FORCED_BUILD')
with_type(job_type)

Sets the build type

Parameters:job_type – the build type for the job RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD
with_refresh_metastore(refresh_metastore)

Sets whether the hive tables built by the job should have their definitions refreshed after the corresponding dataset is built

with_output(name, object_type=None, object_project_key=None, partition=None)

Adds an item to build in the definition

get_definition()
class dataikuapi.dss.dataset.DSSDataset(client, project_key, dataset_name)

A dataset on the DSS instance

delete(drop_data=False)

Delete the dataset

Parameters:drop_data – Should the data of the dataset be dropped
get_definition()

Get the definition of the dataset

Returns:
the definition, as a JSON object
set_definition(definition)

Set the definition of the dataset

Args:
definition: the definition, as a JSON object. You should only set a definition object that has been retrieved using the get_definition call.
get_schema()

Get the schema of the dataset

Returns:
a JSON object of the schema, with the list of columns
set_schema(schema)

Set the schema of the dataset

Args:
schema: the desired schema for the dataset, as a JSON object. All columns have to provide their name and type
get_metadata()

Get the metadata attached to this dataset. The metadata contains label, description checklists, tags and custom metadata of the dataset

Returns:
a dict object. For more information on available metadata, please see https://doc.dataiku.com/dss/api/latest
set_metadata(metadata)

Set the metadata on this dataset.

Args:
metadata: the new state of the metadata for the dataset. You should only set a metadata object that has been retrieved using the get_metadata call.
iter_rows(partitions=None)

Get the dataset’s data

Return:
an iterator over the rows, each row being a tuple of values. The order of values in the tuples is the same as the order of columns in the schema returned by get_schema
list_partitions()

Get the list of all partitions of this dataset

Returns:
the list of partitions, as a list of strings
clear(partitions=None)

Clear all data in this dataset

Args:
partitions: (optional) a list of partitions to clear. When not provided, the entire dataset is cleared
synchronize_hive_metastore()

Synchronize this dataset with the Hive metastore

compute_metrics(partition='', metric_ids=None, probes=None)

Compute metrics on a partition of this dataset. If neither metric ids nor custom probes set are specified, the metrics setup on the dataset are used.

run_checks(partition='', checks=None)

Run checks on a partition of this dataset. If the checks are not specified, the checks setup on the dataset are used.

get_last_metric_values(partition='')

Get the last values of the metrics on this dataset

Returns:
a list of metric objects and their value
get_metric_history(metric, partition='')

Get the history of the values of the metric on this dataset

Returns:
an object containing the values of the metric, cast to the appropriate type (double, boolean,…)
get_usages()

Get the recipes or analyses referencing this dataset

Returns:
a list of usages
class dataikuapi.dss.managedfolder.DSSManagedFolder(client, project_key, odb_id)

A managed folder on the DSS instance

delete()

Delete the managed folder

get_definition()

Get the definition of the managed folder

Returns:
the definition, as a JSON object
set_definition(definition)

Set the definition of the managed folder

Args:
definition: the definition, as a JSON object. You should only set a definition object that has been retrieved using the get_definition call.
list_contents()

Get the list of files in the managed folder

Returns:
the list of files, as a JSON object
get_file(path)

Get a file from the managed folder

Returns:
the file’s content, as a stream
delete_file(path)

Delete a file from the managed folder

put_file(name, f)

Upload the file to the managed folder

Args:
f: the file contents, as a stream name: the name of the file
compute_metrics(metric_ids=None, probes=None)

Compute metrics on this managed folder. If the metrics are not specified, the metrics setup on the managed folder are used.

get_last_metric_values()

Get the last values of the metrics on this managed folder

Returns:
a list of metric objects and their value
get_metric_history(metric)

Get the history of the values of the metric on this dataset

Returns:
an object containing the values of the metric, cast to the appropriate type (double, boolean,…)
get_usages()

Get the recipes referencing this folder

Returns:
a list of usages
class dataikuapi.dss.job.DSSJob(client, project_key, id)

A job on the DSS instance

abort()

Aborts the job

get_status()

Get the current status of the job

Returns:
the state of the job, as a JSON object
get_log(activity=None)

Get the logs of the job

Args:
activity: (optional) the name of the activity in the job whose log is requested
Returns:
the log, as a string
class dataikuapi.dss.job.DSSJobWaiter(job)

Helper to wait for a job’s completion

wait(no_fail=False)
class dataikuapi.dss.meaning.DSSMeaning(client, id)

A user-defined meaning on the DSS instance

get_definition()

Get the meaning’s definition.

Note: this call requires an API key with admin rights

Returns:
the meaning definition, as a JSON object
set_definition(definition)

Set the meaning’s definition.

Note: this call requires an API key with admin rights

Args:
definition: the definition for the meaning, as a JSON object
class dataikuapi.dss.metrics.ComputedMetrics(raw)
get_metric_by_id(id)
get_global_data(metric_id)
get_global_value(metric_id)
get_partition_data(metric_id, partition)
get_partition_value(metric_id, partition)
get_first_partition_data(metric_id)
get_all_ids()
static get_value_from_data(data)

Recipes

class dataikuapi.dss.recipe.DSSRecipe(client, project_key, recipe_name)

A handle to an existing recipe on the DSS instance

delete()

Delete the recipe

get_definition_and_payload()

Get the definition of the recipe

Returns:
the definition, as a DSSRecipeDefinitionAndPayload object, containing the recipe definition itself and its payload
set_definition_and_payload(definition)

Set the definition of the recipe

Args:
definition: the definition, as a DSSRecipeDefinitionAndPayload object. You should only set a definition object that has been retrieved using the get_definition call.
get_metadata()

Get the metadata attached to this recipe. The metadata contains label, description checklists, tags and custom metadata of the recipe

Returns:
a dict object. For more information on available metadata, please see https://doc.dataiku.com/dss/api/latest
set_metadata(metadata)

Set the metadata on this recipe.

Args:
metadata: the new state of the metadata for the recipe. You should only set a metadata object that has been retrieved using the get_metadata call.
class dataikuapi.dss.recipe.DSSRecipeDefinitionAndPayload(data)

Definition for a recipe, that is, the recipe definition itself and its payload

get_recipe_raw_definition()
get_recipe_inputs()
get_recipe_outputs()
get_recipe_params()
get_payload()
get_json_payload()
set_payload(payload)
set_json_payload(payload)
class dataikuapi.dss.recipe.DSSRecipeCreator(type, name, project)

Helper to create new recipes

with_input(dataset_name, project_key=None, role='main')
with_output(dataset_name, append=False, role='main')

The output dataset must already exist. If you are creating a visual recipe with a single output, use with_existing_output

build()

Create a new recipe in the project, and return a handle to interact with it.

Returns:
A dataikuapi.dss.recipe.DSSRecipe recipe handle
class dataikuapi.dss.recipe.SingleOutputRecipeCreator(type, name, project)
with_existing_output(dataset_name, append=False)
with_new_output(name, connection_id, typeOptionId=None, format_option_id=None, override_sql_schema=None, partitioning_option_id=None, append=False, object_type='DATASET')
with_output(dataset_name, append=False)
class dataikuapi.dss.recipe.VirtualInputsSingleOutputRecipeCreator(type, name, project)
with_input(dataset_name, project_key=None)
class dataikuapi.dss.recipe.WindowRecipeCreator(name, project)
class dataikuapi.dss.recipe.SyncRecipeCreator(name, project)
class dataikuapi.dss.recipe.SortRecipeCreator(name, project)
class dataikuapi.dss.recipe.TopNRecipeCreator(name, project)
class dataikuapi.dss.recipe.DistinctRecipeCreator(name, project)
class dataikuapi.dss.recipe.GroupingRecipeCreator(name, project)
with_group_key(group_key)
class dataikuapi.dss.recipe.JoinRecipeCreator(name, project)
class dataikuapi.dss.recipe.StackRecipeCreator(name, project)
class dataikuapi.dss.recipe.SamplingRecipeCreator(name, project)
class dataikuapi.dss.recipe.CodeRecipeCreator(name, type, project)
with_script(script)
class dataikuapi.dss.recipe.SQLQueryRecipeCreator(name, project)
class dataikuapi.dss.recipe.SplitRecipeCreator(name, project)
class dataikuapi.dss.recipe.DownloadRecipeCreator(name, project)

Administration

class dataikuapi.dss.admin.DSSConnection(client, name)

A connection on the DSS instance

delete()

Delete the connection

Note: this call requires an API key with admin rights

get_definition()

Get the connection’s definition (type, name, params, usage restrictions)

Note: this call requires an API key with admin rights

Returns:
the connection definition, as a JSON object
set_definition(description)

Set the connection’s definition.

Note: this call requires an API key with admin rights

Args:
definition: the definition for the connection, as a JSON object.
sync_root_acls()

Resync root permissions on this connection path

Returns:
a DSSFuture handle to the task of resynchronizing the permissions

Note: this call requires an API key with admin rights

sync_datasets_acls()

Resync permissions on datasets in this connection path

Returns:
a DSSFuture handle to the task of resynchronizing the permissions

Note: this call requires an API key with admin rights

class dataikuapi.dss.admin.DSSUser(client, login)

A handle for a user on the DSS instance

delete()

Deletes the user

Note: this call requires an API key with admin rights

get_definition()

Get the user’s definition (login, type, display name, permissions, …)

Note: this call requires an API key with admin rights

Returns:the user’s definition, as a dict
set_definition(definition)

Set the user’s definition.

Note: this call requires an API key with admin rights

Parameters:definition (dict) –

the definition for the user, as a dict. You should obtain the definition using get_definition, not create one. The fields that can be changed are:

  • email
  • displayName
  • groups
  • userProfile
  • password
class dataikuapi.dss.admin.DSSGroup(client, name)

A group on the DSS instance

delete()

Delete the group

Note: this call requires an API key with admin rights

get_definition()

Get the group’s definition (name, description, admin abilities, type, ldap name mapping)

Note: this call requires an API key with admin rights

Returns:
the group definition, as a JSON object
set_definition(definition)

Set the group’s definition.

Note: this call requires an API key with admin rights

Args:
definition: the definition for the group, as a JSON object.
class dataikuapi.dss.admin.DSSGeneralSettings(client)

The general settings of the DSS instance

save()

Save the changes that were made to the settings on the DSS instance

Note: this call requires an API key with admin rights

get_raw()

Get the settings as a dictionary

add_impersonation_rule(rule, is_user_rule=True)

Add a rule to the impersonation settings

Parameters:
get_impersonation_rules(dss_user=None, dss_group=None, unix_user=None, hadoop_user=None, project_key=None, scope=None, rule_type=None, is_user=None)

Retrieve the user or group impersonation rules that matches the parameters

Parameters:
  • dss_user – a DSS user or regular expression to match DSS user names
  • dss_group – a DSS group or regular expression to match DSS groups
  • unix_user – a name to match the target UNIX user
  • hadoop_user – a name to match the target Hadoop user
  • project_key – a project_key
  • scope – project-scoped (‘PROJECT’) or global (‘GLOBAL’)
  • type – the rule user or group matching method (‘IDENTITY’, ‘SINGLE_MAPPING’, ‘REGEXP_RULE’)
  • is_user – True if only user-level rules should be considered, False for only group-level rules, None to consider both
remove_impersonation_rules(dss_user=None, dss_group=None, unix_user=None, hadoop_user=None, project_key=None, scope=None, rule_type=None, is_user=None)

Remove the user or group impersonation rules that matches the parameters from the settings

Parameters:
  • dss_user – a DSS user or regular expression to match DSS user names
  • dss_group – a DSS group or regular expression to match DSS groups
  • unix_user – a name to match the target UNIX user
  • hadoop_user – a name to match the target Hadoop user
  • project_key – a project_key
  • scope – project-scoped (‘PROJECT’) or global (‘GLOBAL’)
  • type – the rule user or group matching method (‘IDENTITY’, ‘SINGLE_MAPPING’, ‘REGEXP_RULE’)
  • is_user – True if only user-level rules should be considered, False for only group-level rules, None to consider both
class dataikuapi.dss.admin.DSSUserImpersonationRule(raw=None)

Helper to build user-level rule items for the impersonation settings

scope_global()

Make the rule apply to all projects

scope_project(project_key)

Make the rule apply to a given project

Args:
project_key : the project this rule applies to
user_identity()

Make the rule map each DSS user to a UNIX user of the same name

user_single(dss_user, unix_user, hadoop_user=None)

Make the rule map a given DSS user to a given UNIX user

Args:
dss_user : a DSS user unix_user : a UNIX user hadoop_user : a Hadoop user (optional, defaults to unix_user)
user_regexp(regexp, unix_user, hadoop_user=None)

Make the rule map a DSS users matching a given regular expression to a given UNIX user

Args:
regexp : a regular expression to match DSS user names unix_user : a UNIX user hadoop_user : a Hadoop user (optional, defaults to unix_user)
class dataikuapi.dss.admin.DSSGroupImpersonationRule(raw=None)

Helper to build group-level rule items for the impersonation settings

group_identity()

Make the rule map each DSS user to a UNIX user of the same name

group_single(dss_group, unix_user, hadoop_user=None)

Make the rule map a given DSS user to a given UNIX user

Args:
dss_group : a DSS group unix_user : a UNIX user hadoop_user : a Hadoop user (optional, defaults to unix_user)
group_regexp(regexp, unix_user, hadoop_user=None)

Make the rule map a DSS users matching a given regular expression to a given UNIX user

Args:
regexp : a regular expression to match DSS groups unix_user : a UNIX user hadoop_user : a Hadoop user (optional, defaults to unix_user)
class dataikuapi.dss.admin.DSSCodeEnv(client, env_lang, env_name)

A code env on the DSS instance

delete()

Delete the connection

Note: this call requires an API key with admin rights

get_definition()

Get the code env’s definition

Note: this call requires an API key with admin rights

Returns:
the code env definition, as a JSON object
set_definition(env)

Set the code env’s definition. The definition should come from a call to the get_definition() method.

Fields that can be updated in design node:

  • env.permissions, env.usableByAll, env.desc.owner
  • env.specCondaEnvironment, env.specPackageList, env.externalCondaEnvName, env.desc.installCorePackages, env.desc.installJupyterSupport, env.desc.yarnPythonBin

Fields that can be updated in automation node (where {version} is the updated version):

  • env.permissions, env.usableByAll, env.owner
  • env.{version}.specCondaEnvironment, env.{version}.specPackageList, env.{version}.externalCondaEnvName, env.{version}.desc.installCorePackages, env.{version}.desc.installJupyterSupport, env.{version}.desc.yarnPythonBin

Note: this call requires an API key with admin rights

Parameters:data – a code env definition
Returns:
the updated code env definition, as a JSON object
set_jupyter_support(active)

Update the code env jupyter support

Note: this call requires an API key with admin rights

Parameters:active – True to activate jupyter support, False to deactivate
update_packages()

Update the code env packages so that it matches its spec

Note: this call requires an API key with admin rights

class dataikuapi.dss.admin.DSSGlobalApiKey(client, key)

A global API key on the DSS instance

delete()

Delete the api key

Note: this call requires an API key with admin rights

get_definition()

Get the API key’s definition

Note: this call requires an API key with admin rights

Returns:
the code env definition, as a JSON object
set_definition(definition)

Set the API key’s definition.

Note: this call requires an API key with admin rights

Args:
definition: the definition for the API key, as a JSON object.

Scenarios

class dataikuapi.dss.scenario.DSSScenario(client, project_key, id)

A handle to interact with a scenario on the DSS instance

abort()

Aborts the scenario if it currently running

run(params={})

Requests a run of the scenario, which will start after a few seconds.

Params dict params:
 additional parameters that will be passed to the scenario through trigger params
get_trigger_fire(trigger_id, trigger_run_id)

Requests a trigger of the run of a scenario

Args:
trigger_id: Id of trigger trigger_run_id: Id of the run of the trigger
Returns:
A dataikuapi.dss.admin.DSSTriggerFire trigger handle
run_and_wait(params={}, no_fail=False)

Requests a run of the scenario, which will start after a few seconds. Wait the end of the run to complete.

Args:
params: additional parameters that will be passed to the scenario through trigger params
Returns:
A dataikuapi.dss.admin.DSSScenarioRun run handle
get_last_runs(limit=10, only_finished_runs=False)

Get the list of the last runs of the scenario.

Returns:A list of dataikuapi.dss.scenario.DSSScenarioRun
get_current_run()

Get the current run of the scenario, or None if it is not running at the moment

Returns:A dataikuapi.dss.scenario.DSSScenarioRun
get_run(run_id)

Get a handle to a run of the scenario

Returns:A dataikuapi.dss.scenario.DSSScenarioRun
get_definition(with_status=True)

Returns the definition of the scenario

Args:
with_status: if True, the definition contains the run status of the scenario but not its
actions’ definition. If False, the definition doesn’t contain the run status but has the scenario’s actions definition
set_definition(definition, with_status=True)

Updates the definition of this scenario

Args:
with_status: should be the same as the value passed to get_definition(). If True, the params,
triggers and reporters fields of the scenario are ignored,
get_payload(extension='py')

Returns the payload of the scenario

Parameters:extension (str) – the type of script. Default is ‘py’ for python
set_payload(script, with_status=True)

Updates the payload of this scenario

Parameters:extension (str) – the type of script. Default is ‘py’ for python
get_average_duration(limit=3)

Get the average duration (in fractional seconds) of the last runs of this scenario that finished, where finished means it ended with SUCCESS or WARNING. If there are not enough runs to perform the average, returns None

Args:
limit: number of last runs to average on
class dataikuapi.dss.scenario.DSSScenarioRun(client, run)

A handle containing basic info about a past run of a scenario.

This handle can also be used to fetch additional information about the urn

get_info()

Get the basic information of the scenario run

get_details()

Get the full details of the scenario run, including its step runs.

Note: this perform another API call

get_start_time()

Get the start time of the scenario run

get_duration()

Get the duration of this run (in fractional seconds).

If the run is still running, get the duration since it started

class dataikuapi.dss.scenario.DSSScenarioRunWaiter(scenario_run, trigger_fire)

Helper to wait for a job’s completion

wait(no_fail=False)
class dataikuapi.dss.scenario.DSSTriggerFire(scenario, trigger_fire)

The activation of a trigger on the DSS instance

get_scenario_run()

Get the run of the scenario that this trigger activation launched

is_cancelled(refresh=False)

Whether the trigger has been cancelled

Parameters:refresh – get the state of the trigger from the backend
wait_for_scenario_run(no_fail=False)

API Node services

class dataikuapi.dss.apiservice.DSSAPIService(client, project_key, service_id)

An API Service on the DSS instance

list_packages()

List the packages of this API services

Returns:
the list of API service packages, each one as a JSON object
create_package(package_id)

Prepare a package of this API service

delete_package(package_id)

Delete a package of this API service

download_package_stream(package_id)

Download a package archive that can be deployed in a DSS API Node, as a binary stream.

Warning: this stream will monopolize the DSSClient until closed.

download_package_to_file(package_id, path)

Download a package archive that can be deployed in a DSS API Node, into the given output file.