Managing projects

Basic operations

The list of projects in the DSS instance can be retrieved with the list_project_keys method.

client = DSSClient(host, apiKey)
dss_projects = client.list_project_keys()
print(dss_projects)

outputs

['IMPALA', 'MYSQL', 'PARTITIONED', 'PLUGINS']

Projects can be created:

new_project = client.create_project('TEST_PROJECT', 'test project', 'tester', description='a simple description')
print(client.list_project_keys()

outputs

['IMPALA', 'MYSQL', 'PARTITIONED', 'PLUGINS', 'TEST_PROJECT']

Or an existing project can be used for later manipulation:

project = client.get_project(ProjectKey)

Creating, listing and getting handles to project items

Through various method on the DSSProject class, you can:

  • Create most types of project items (datasets, recipes, managed folders, …)
  • List project items
  • Get structured handles to interact with each type of project item

Modifying project settings

Two parts of the project’s settings can be modified directly: the metadata and the permissions. In both cases, it is advised to first retrieve the current settings state with the get_metadata and get_permissions call, modify the returned object, and then set it back on the DSS instance.

project = client.get_project(ProjectKey)

project_metadata = project.get_metadata()
project_metadata['tag'] = ['tag1','tag2']
project.set_metadata(project_metadata)

project_permissions = project.get_permissions()
project_permissions['permissions'].append({'group':'data_scientists','readProjectContent': True, 'readDashboards': True})
project.set_permissions(project_permissions)

Available permissions to be set:

{
        'group': u'data_team',
        'admin': False,
        'exportDatasetsData': True,
        'manageAdditionalDashboardUsers': False,
        'manageDashboardAuthorizations': False,
        'manageExposedElements': False,
        'moderateDashboards': False,
        'readDashboards': True,
        'readProjectContent': True,
        'runScenarios': False,
        'writeDashboards': False,
        'writeProjectContent': False
}

Deleting

Projects can also be deleted:

project = client.get_project('TEST_PROJECT')
project.delete()

Exporting

Project export is available through the python API in two forms: either as a stream, or exported directly to a file. The data is sent zipped.

project = client.get_project('TEST_PROJECT')

project.export_to_file('exported_project.zip')

with project.get_export_stream() as s:
        ...

Reference documentation

class dataikuapi.dss.project.DSSProject(client, project_key)

A handle to interact with a project on the DSS instance.

Do not create this class directly, instead use dataikuapi.DSSClient.get_project`()

delete(drop_data=False)

Delete the project

This call requires an API key with admin rights

Parameters:drop_data (bool) – Should the data of managed datasets be dropped
get_export_stream(options={})

Return a stream of the exported project You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.

Returns:a file-like obbject that is a stream of the export archive
Return type:file-like
export_to_file(path, options={})

Export the project to a file

Parameters:path (str) – the path of the file in which the exported project should be saved
get_metadata()

Get the metadata attached to this project. The metadata contains label, description checklists, tags and custom metadata of the project.

For more information on available metadata, please see https://doc.dataiku.com/dss/api/latest

Returns:a dict object containing the project metadata.
Return type:dict
set_metadata(metadata)

Set the metadata on this project.

Parameters:dict (metadata) – the new state of the metadata for the project. You should only set a metadata object that has been retrieved using the get_metadata() call.
get_permissions()

Get the permissions attached to this project

returns:A dict containing the owner and the permissions, as a list of pairs of group name and permission type
set_permissions(permissions)

Sets the permissions on this project

Parameters:dict (permissions) – a permissions object with the same structure as the one returned by get_permissions() call
list_datasets()

List the datasets in this project

Returns:The list of the datasets, each one as a dictionary. Each dataset dict contains at least a name field which is the name of the dataset
Return type:list of dicts
get_dataset(dataset_name)

Get a handle to interact with a specific dataset

Parameters:dataset_name (string) – the name of the desired dataset
Returns:A dataikuapi.dss.dataset.DSSDataset dataset handle
create_dataset(dataset_name, type, params={}, formatType=None, formatParams={})

Create a new dataset in the project, and return a handle to interact with it.

The precise structure of params and formatParams depends on the specific dataset type and dataset format type. To know which fields exist for a given dataset type and format type, create a dataset from the UI, and use get_dataset() to retrieve the configuration of the dataset and inspect it. Then reproduce a similar structure in the create_dataset() call.

Not all settings of a dataset can be set at creation time (for example partitioning). After creation, you’ll have the ability to modify the dataset

Parameters:
  • dataset_name (string) – the name for the new dataset
  • type (string) – the type of the dataset
  • params (dict) – the parameters for the type, as a JSON object
  • formatType (string) – an optional format to create the dataset with (only for file-oriented datasets)
  • formatParams (string) – the parameters to the format, as a JSON object (only for file-oriented datasets)
Returns:
A dataikuapi.dss.dataset.DSSDataset dataset handle
create_prediction_ml_task(input_dataset, target_variable, ml_backend_type='PY_MEMORY', guess_policy='DEFAULT', wait_guess_complete=True)

Creates a new prediction task in a new visual analysis lab for a dataset.

Parameters:
  • ml_backend_type (string) – ML backend to use, one of PY_MEMORY, MLLIB or H2O
  • guess_policy (string) – Policy to use for setting the default parameters. Valid values are: DEFAULT, SIMPLE_FORMULA, DECISION_TREE, EXPLANATORY and PERFORMANCE
  • wait_guess_complete (boolean) – if False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms. You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)
create_clustering_ml_task(input_dataset, ml_backend_type='PY_MEMORY', guess_policy='KMEANS')

Creates a new clustering task in a new visual analysis lab for a dataset.

The returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms.

You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)

Parameters:
  • ml_backend_type (string) – ML backend to use, one of PY_MEMORY, MLLIB or H2O
  • guess_policy (string) – Policy to use for setting the default parameters. Valid values are: KMEANS and ANOMALY_DETECTION
list_ml_tasks()

List the ML tasks in this project

Returns:
the list of the ML tasks summaries, each one as a JSON object
get_ml_task(analysis_id, mltask_id)

Get a handle to interact with a specific ML task

Args:
analysis_id: the identifier of the visual analysis containing the desired ML task mltask_id: the identifier of the desired ML task
Returns:
A dataikuapi.dss.ml.DSSMLTask ML task handle
create_analysis(input_dataset)

Creates a new visual analysis lab for a dataset.

list_analyses()

List the visual analyses in this project

Returns:
the list of the visual analyses summaries, each one as a JSON object
get_analysis(analysis_id)

Get a handle to interact with a specific visual analysis

Args:
analysis_id: the identifier of the desired visual analysis
Returns:
A dataikuapi.dss.analysis.DSSAnalysis visual analysis handle
list_saved_models()

List the saved models in this project

Returns:
the list of the saved models, each one as a JSON object
get_saved_model(sm_id)

Get a handle to interact with a specific saved model

Args:
sm_id: the identifier of the desired saved model
Returns:
A dataikuapi.dss.savedmodel.DSSSavedModel saved model handle
list_managed_folders()

List the managed folders in this project

Returns:
the list of the managed folders, each one as a JSON object
get_managed_folder(odb_id)

Get a handle to interact with a specific managed folder

Args:
odb_id: the identifier of the desired managed folder
Returns:
A dataikuapi.dss.managedfolder.DSSManagedFolder managed folder handle
create_managed_folder(name, folder_type=None, connection_name='filesystem_folders')

Create a new managed folder in the project, and return a handle to interact with it

Args:
name: the name of the managed folder
Returns:
A dataikuapi.dss.managedfolder.DSSManagedFolder managed folder handle
list_jobs()

List the jobs in this project

Returns:
a list of the jobs, each one as a JSON object, containing both the definition and the state
get_job(id)

Get a handler to interact with a specific job

Returns:
A dataikuapi.dss.job.DSSJob job handle
start_job(definition)

Create a new job, and return a handle to interact with it

Args:
definition: the definition for the job to create. The definition must contain the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD) and a list of outputs to build. Optionally, a refreshHiveMetastore field can specify whether to re-synchronize the Hive metastore for recomputed HDFS datasets.
Returns:
A dataikuapi.dss.job.DSSJob job handle
start_job_and_wait(definition, no_fail=False)

Create a new job. Wait the end of the job to complete.

Args:
definition: the definition for the job to create. The definition must contain the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD) and a list of outputs to build. Optionally, a refreshHiveMetastore field can specify whether to re-synchronize the Hive metastore for recomputed HDFS datasets.
new_job_definition_builder(job_type='NON_RECURSIVE_FORCED_BUILD')
get_variables()

Gets the variables of this project.

Returns:
a dictionary containing two dictionaries : “standard” and “local”. “standard” are regular variables, exported with bundles. “local” variables are not part of the bundles for this project
set_variables(obj)

Sets the variables of this project. @param obj: must be a modified version of the object returned by get_variables

list_api_services()

List the API services in this project

Returns:
the list of API services, each one as a JSON object
create_api_service(service_id)

Create a new API service, and returns a handle to interact with it. The newly-created service does not have any endpoint.

Parameters:service_id (str) – the ID of the API service to create
Returns:A DSSAPIService API Service handle
get_api_service(service_id)

Get a handle to interact with a specific API Service from the API Designer

Parameters:service_id (str) – The identifier of the API Designer API Service to retrieve
Returns:A handle to interact with this API Service
Return type:DSSAPIService API Service handle
list_exported_bundles()
export_bundle(bundle_id)
get_exported_bundle_archive_stream(bundle_id)

Download a bundle archive that can be deployed in a DSS automation Node, as a binary stream. Warning: this stream will monopolize the DSSClient until closed.

download_exported_bundle_archive_to_file(bundle_id, path)

Download a bundle archive that can be deployed in a DSS automation Node into the given output file. @param path if “-“, will write to /dev/stdout

list_imported_bundles()
import_bundle_from_archive(archive_path)
import_bundle_from_stream(fp)
activate_bundle(bundle_id)
preload_bundle(bundle_id)
list_scenarios()

List the scenarios in this project.

This method returns a list of Python dictionaries. Each dictionary represents a scenario. Each dictionary contains at least a “id” field, that you can then pass to the get_scenario()

Returns:the list of scenarios, each one as a Python dictionary
get_scenario(scenario_id)

Get a handle to interact with a specific scenario

Parameters:str – scenario_id: the ID of the desired scenario
Returns:a dataikuapi.dss.scenario.DSSScenario scenario handle
create_scenario(scenario_name, type, definition={})

Create a new scenario in the project, and return a handle to interact with it

Parameters:
  • scenario_name (str) – The name for the new scenario. This does not need to be unique (although this is strongly recommended)
  • type (str) – The type of the scenario. MUst be one of ‘step_based’ or ‘custom_python’
  • definition (object) – the JSON definition of the scenario. Use get_definition on an existing DSSScenario object in order to get a sample definition object
Returns:

a scenario.DSSScenario handle to interact with the newly-created scenario

list_recipes()

List the recipes in this project

Returns:
the list of the recipes, each one as a JSON object
get_recipe(recipe_name)

Get a handle to interact with a specific recipe

Args:
recipe_name: the name of the desired recipe
Returns:
A dataikuapi.dss.recipe.DSSRecipe recipe handle
create_recipe(recipe_proto, creation_settings)

Create a new recipe in the project, and return a handle to interact with it. We strongly recommend that you use the creator helpers instead of calling this directly.

Some recipe types require additional parameters in creation_settings:

  • ‘grouping’ : a ‘groupKey’ column name
  • ‘python’, ‘sql_query’, ‘hive’, ‘impala’ : the code of the recipe as a ‘payload’ string
Args:
recipe_proto: a prototype for the recipe object. Must contain at least ‘type’ and ‘name’ creation_settings: recipe-specific creation settings
Returns:
A dataikuapi.dss.recipe.DSSRecipe recipe handle
sync_datasets_acls()

Resync permissions on HDFS datasets in this project

Returns:
a DSSFuture handle to the task of resynchronizing the permissions

Note: this call requires an API key with admin rights

list_running_notebooks(as_objects=True)

List the currently-running notebooks

Returns:
list of notebooks. Each object contains at least a ‘name’ field
get_tags()

List the tags of this project.

Returns:
a dictionary containing the tags with a color
set_tags(tags={})

Set the tags of this project. @param obj: must be a modified version of the object returned by list_tags

list_macros(as_objects=False)

List the macros accessible in this project

Parameters:as_objects – if True, return the macros as dataikuapi.dss.macro.DSSMacro macro handles instead of raw JSON
Returns:the list of the macros
get_macro(runnable_type)

Get a handle to interact with a specific macro

Parameters:runnable_type – the identifier of a macro
Returns:A dataikuapi.dss.macro.DSSMacro macro handle