Projects

Basic operations

The list of projects in the DSS instance can be retrieved with the list_project_keys method.

client = DSSClient(host, apiKey)
dss_projects = client.list_project_keys()
print(dss_projects)

outputs

['IMPALA', 'MYSQL', 'PARTITIONED', 'PLUGINS']

Projects can be created:

new_project = client.create_project('TEST_PROJECT', 'test project', 'tester', description='a simple description')
print(client.list_project_keys()

outputs

['IMPALA', 'MYSQL', 'PARTITIONED', 'PLUGINS', 'TEST_PROJECT']

Or an existing project can be used for later manipulation:

project = client.get_project(ProjectKey)

Creating, listing and getting handles to project items

Through various method on the DSSProject class, you can:

  • Create most types of project items (datasets, recipes, managed folders, …)

  • List project items

  • Get structured handles to interact with each type of project item

Modifying project settings

Two parts of the project’s settings can be modified directly: the metadata and the permissions. In both cases, it is advised to first retrieve the current settings state with the get_metadata and get_permissions call, modify the returned object, and then set it back on the DSS instance.

project = client.get_project(ProjectKey)

project_metadata = project.get_metadata()
project_metadata['tags'] = ['tag1','tag2']
project.set_metadata(project_metadata)

project_permissions = project.get_permissions()
project_permissions['permissions'].append({'group':'data_scientists','readProjectContent': True, 'readDashboards': True})
project.set_permissions(project_permissions)

Available permissions to be set:

{
        'group': u'data_team',
        'admin': False,
        'exportDatasetsData': True,
        'manageAdditionalDashboardUsers': False,
        'manageDashboardAuthorizations': False,
        'manageExposedElements': False,
        'moderateDashboards': False,
        'readDashboards': True,
        'readProjectContent': True,
        'runScenarios': False,
        'writeDashboards': False,
        'writeProjectContent': False,
        'shareToWorkspaces': False
}

Deleting

Projects can also be deleted:

project = client.get_project('TEST_PROJECT')
project.delete()

Exporting

Project export is available through the python API in two forms: either as a stream, or exported directly to a file. The data is sent zipped.

project = client.get_project('TEST_PROJECT')

project.export_to_file('exported_project.zip')

with project.get_export_stream() as s:
        ...

Importing

Project can be imported directly from a zip file:

with open("myproject.zip", "rb") as f:
        client.prepare_project_import(f).execute()

Duplicating

Projects can be duplicated:

project = client.get_project('TEST_PROJECT')
project.duplicate('COPY_TEST_PROJECT', 'Copy of the Test Project')

Reference documentation

dataikuapi package API

class dataikuapi.dss.project.DSSProject(client, project_key)

A handle to interact with a project on the DSS instance.

Important

Do not create this class directly, instead use dataikuapi.DSSClient.get_project()

get_summary()

Returns a summary of the project. The summary is a read-only view of some of the state of the project. You cannot edit the resulting dict and use it to update the project state on DSS, you must use the other more specific methods of this dataikuapi.dss.project.DSSProject object

Returns

a dict containing a summary of the project. Each dict contains at least a projectKey field

Return type

dict

get_project_folder()

Get the folder containing this project

Return type

dataikuapi.dss.projectfolder.DSSProjectFolder

move_to_folder(folder)

Moves this project to a project folder

Parameters

folder (dataikuapi.dss.projectfolder.DSSProjectFolder) – destination folder

delete(clear_managed_datasets=False, clear_output_managed_folders=False, clear_job_and_scenario_logs=True, **kwargs)

Delete the project

Attention

This call requires an API key with admin rights

Parameters
  • clear_managed_datasets (bool) – Should the data of managed datasets be cleared (defaults to False)

  • clear_output_managed_folders (bool) – Should the data of managed folders used as outputs of recipes be cleared (defaults to False)

  • clear_job_and_scenario_logs (bool) – Should the job and scenario logs be cleared (defaults to True)

get_export_stream(options=None)

Return a stream of the exported project

Warning

You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.

Parameters

options (dict) –

Dictionary of export options (defaults to {}). The following options are available:

  • exportUploads (boolean): Exports the data of Uploaded datasets (default to False)

  • exportManagedFS (boolean): Exports the data of managed Filesystem datasets (default to False)

  • exportAnalysisModels (boolean): Exports the models trained in analysis (default to False)

  • exportSavedModels (boolean): Exports the models trained in saved models (default to False)

  • exportManagedFolders (boolean): Exports the data of managed folders (default to False)

  • exportAllInputDatasets (boolean): Exports the data of all input datasets (default to False)

  • exportAllDatasets (boolean): Exports the data of all datasets (default to False)

  • exportAllInputManagedFolders (boolean): Exports the data of all input managed folders (default to False)

  • exportGitRepository (boolean): Exports the Git repository history (default to False)

  • exportInsightsData (boolean): Exports the data of static insights (default to False)

Returns

a stream of the export archive

Return type

file-like object

export_to_file(path, options=None)

Export the project to a file

Parameters
  • path (str) – the path of the file in which the exported project should be saved

  • options (dict) –

    Dictionary of export options (defaults to {}). The following options are available:

    • exportUploads (boolean): Exports the data of Uploaded datasets (default to False)

    • exportManagedFS (boolean): Exports the data of managed Filesystem datasets (default to False)

    • exportAnalysisModels (boolean): Exports the models trained in analysis (default to False)

    • exportSavedModels (boolean): Exports the models trained in saved models (default to False)

    • exportModelEvaluationStores (boolean): Exports the evaluation stores (default to False)

    • exportManagedFolders (boolean): Exports the data of managed folders (default to False)

    • exportAllInputDatasets (boolean): Exports the data of all input datasets (default to False)

    • exportAllDatasets (boolean): Exports the data of all datasets (default to False)

    • exportAllInputManagedFolders (boolean): Exports the data of all input managed folders (default to False)

    • exportGitRepository (boolean): Exports the Git repository history (default to False)

    • exportInsightsData (boolean): Exports the data of static insights (default to False)

duplicate(target_project_key, target_project_name, duplication_mode='MINIMAL', export_analysis_models=True, export_saved_models=True, export_git_repository=True, export_insights_data=True, remapping=None, target_project_folder=None)

Duplicate the project

Parameters
  • target_project_key (str) – The key of the new project

  • target_project_name (str) – The name of the new project

  • duplication_mode (str) – can be one of the following values: MINIMAL, SHARING, FULL, NONE (defaults to MINIMAL)

  • export_analysis_models (bool) – (defaults to True)

  • export_saved_models (bool) – (defaults to True)

  • export_git_repository (bool) – (defaults to True)

  • export_insights_data (bool) – (defaults to True)

  • remapping (dict) – dict of connections to be remapped for the new project (defaults to {})

  • target_project_folder (A dataikuapi.dss.projectfolder.DSSProjectFolder) – the project folder where to put the duplicated project (defaults to None)

Returns

A dict containing the original and duplicated project’s keys

Return type

dict

get_metadata()

Get the metadata attached to this project. The metadata contains label, description checklists, tags and custom metadata of the project.

Note

For more information on available metadata, please see https://doc.dataiku.com/dss/api/6.0/rest/

Returns

the project metadata.

Return type

dict

set_metadata(metadata)

Set the metadata on this project.

Usage example:

project_metadata = project.get_metadata()
project_metadata['tags'] = ['tag1','tag2']
project.set_metadata(project_metadata)
Parameters

metadata (dict) – the new state of the metadata for the project. You should only set a metadata object that has been retrieved using the get_metadata() call.

get_settings()

Gets the settings of this project. This does not contain permissions. See get_permissions()

Returns

a handle to read, modify and save the settings

Return type

dataikuapi.dss.project.DSSProjectSettings

get_permissions()

Get the permissions attached to this project

Returns

A dict containing the owner and the permissions, as a list of pairs of group name and permission type

Return type

dict

set_permissions(permissions)

Sets the permissions on this project

Usage example:

project_permissions = project.get_permissions()
project_permissions['permissions'].append({'group':'data_scientists',
                                            'readProjectContent': True,
                                            'readDashboards': True})
project.set_permissions(project_permissions)
Parameters

permissions (dict) – a permissions object with the same structure as the one returned by get_permissions() call

get_interest()

Get the interest of this project. The interest means the number of watchers and the number of stars.

Returns

a dict object containing the interest of the project with two fields:

  • starCount: number of stars for this project

  • watchCount: number of users watching this project

Return type

dict

get_timeline(item_count=100)

Get the timeline of this project. The timeline consists of information about the creation of this project (by whom, and when), the last modification of this project (by whom and when), a list of contributors, and a list of modifications. This list of modifications contains a maximum of item_count elements (default to 100). If item_count is greater than the real number of modification, item_count is adjusted.

Parameters

item_count (int) – maximum number of modifications to retrieve in the items list

Returns

a timeline where the top-level fields are :

  • allContributors: all contributors who have been involved in this project

  • items: a history of the modifications of the project

  • createdBy: who created this project

  • createdOn: when the project was created

  • lastModifiedBy: who modified this project for the last time

  • lastModifiedOn: when this modification took place

Return type

dict

list_datasets(as_type='listitems')

List the datasets in this project.

Parameters

as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).

Returns

The list of the datasets. If “as_type” is “listitems”, each one as a dataikuapi.dss.dataset.DSSDatasetListItem. If “as_type” is “objects”, each one as a dataikuapi.dss.dataset.DSSDataset

Return type

list

get_dataset(dataset_name)

Get a handle to interact with a specific dataset

Parameters

dataset_name (str) – the name of the desired dataset

Returns

A dataset handle

Return type

dataikuapi.dss.dataset.DSSDataset

create_dataset(dataset_name, type, params=None, formatType=None, formatParams=None)

Create a new dataset in the project, and return a handle to interact with it.

The precise structure of params and formatParams depends on the specific dataset type and dataset format type. To know which fields exist for a given dataset type and format type, create a dataset from the UI, and use get_dataset() to retrieve the configuration of the dataset and inspect it. Then reproduce a similar structure in the create_dataset() call.

Not all settings of a dataset can be set at creation time (for example partitioning). After creation, you’ll have the ability to modify the dataset

Parameters
  • dataset_name (str) – the name of the dataset to create. Must not already exist

  • type (str) – the type of the dataset

  • params (dict) – the parameters for the type, as a python dict (defaults to {})

  • formatType (str) – an optional format to create the dataset with (only for file-oriented datasets)

  • formatParams (dict) – the parameters to the format, as a python dict (only for file-oriented datasets, default to {})

Returns

A dataset handle

Return type

dataikuapi.dss.dataset.DSSDataset

create_upload_dataset(dataset_name, connection=None)

Create a new dataset of type ‘UploadedFiles’ in the project, and return a handle to interact with it.

Parameters
  • dataset_name (str) – the name of the dataset to create. Must not already exist

  • connection (str) – the name of the upload connection (defaults to None)

Returns

A dataset handle

Return type

dataikuapi.dss.dataset.DSSDataset

create_filesystem_dataset(dataset_name, connection, path_in_connection)

Create a new filesystem dataset in the project, and return a handle to interact with it.

Parameters
  • dataset_name (str) – the name of the dataset to create. Must not already exist

  • connection (str) – the name of the connection

  • path_in_connection (str) – the path of the dataset in the connection

Returns

A dataset handle

Return type

dataikuapi.dss.dataset.DSSDataset

create_s3_dataset(dataset_name, connection, path_in_connection, bucket=None)

Creates a new external S3 dataset in the project and returns a dataikuapi.dss.dataset.DSSDataset to interact with it.

The created dataset does not have its format and schema initialized, it is recommended to use autodetect_settings() on the returned object

Parameters
  • dataset_name (str) – the name of the dataset to create. Must not already exist

  • connection (str) – the name of the connection

  • path_in_connection (str) – the path of the dataset in the connection

  • bucket (str) – the name of the s3 bucket (defaults to None)

Returns

A dataset handle

Return type

dataikuapi.dss.dataset.DSSDataset

create_fslike_dataset(dataset_name, dataset_type, connection, path_in_connection, extra_params=None)

Create a new file-based dataset in the project, and return a handle to interact with it.

Parameters
  • dataset_name (str) – the name of the dataset to create. Must not already exist

  • dataset_type (str) – the type of the dataset

  • connection (str) – the name of the connection

  • path_in_connection (str) – the path of the dataset in the connection

  • extra_params (dict) – a python dict of extra parameters (defaults to None)

Returns

A dataset handle

Return type

dataikuapi.dss.dataset.DSSDataset

create_sql_table_dataset(dataset_name, type, connection, table, schema)

Create a new SQL table dataset in the project, and return a handle to interact with it.

Parameters
  • dataset_name (str) – the name of the dataset to create. Must not already exist

  • type (str) – the type of the dataset

  • connection (str) – the name of the connection

  • table (str) – the name of the table in the connection

  • schema (str) – the schema of the table

Returns

A dataset handle

Return type

dataikuapi.dss.dataset.DSSDataset

new_managed_dataset_creation_helper(dataset_name)

Caution

Deprecated. Please use new_managed_dataset()

new_managed_dataset(dataset_name)

Initializes the creation of a new managed dataset. Returns a dataikuapi.dss.dataset.DSSManagedDatasetCreationHelper or one of its subclasses to complete the creation of the managed dataset.

Usage example:

builder = project.new_managed_dataset("my_dataset")
builder.with_store_into("target_connection")
dataset = builder.create()
Parameters

dataset_name (str) – Name of the dataset to create

Returns

An object to create the managed dataset

Return type

dataikuapi.dss.dataset.DSSManagedDatasetCreationHelper

list_streaming_endpoints(as_type='listitems')

List the streaming endpoints in this project.

Parameters

as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).

Returns

The list of the streaming endpoints. If “as_type” is “listitems”, each one as a dataikuapi.dss.streaming_endpoint.DSSStreamingEndpointListItem. If “as_type” is “objects”, each one as a dataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint

Return type

list

get_streaming_endpoint(streaming_endpoint_name)

Get a handle to interact with a specific streaming endpoint

Parameters

streaming_endpoint_name (str) – the name of the desired streaming endpoint

Returns

A streaming endpoint handle

Return type

dataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint

create_streaming_endpoint(streaming_endpoint_name, type, params=None)

Create a new streaming endpoint in the project, and return a handle to interact with it.

The precise structure of params depends on the specific streaming endpoint type. To know which fields exist for a given streaming endpoint type, create a streaming endpoint from the UI, and use get_streaming_endpoint() to retrieve the configuration of the streaming endpoint and inspect it. Then reproduce a similar structure in the create_streaming_endpoint() call.

Not all settings of a streaming endpoint can be set at creation time (for example partitioning). After creation, you’ll have the ability to modify the streaming endpoint.

Parameters
  • streaming_endpoint_name (str) – the name for the new streaming endpoint

  • type (str) – the type of the streaming endpoint

  • params (dict) – the parameters for the type, as a python dict (defaults to {})

Returns

A streaming endpoint handle

Return type

dataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint

create_kafka_streaming_endpoint(streaming_endpoint_name, connection=None, topic=None)

Create a new kafka streaming endpoint in the project, and return a handle to interact with it.

Parameters
  • streaming_endpoint_name (str) – the name for the new streaming endpoint

  • connection (str) – the name of the kafka connection (defaults to None)

  • topic (str) – the name of the kafka topic (defaults to None)

Returns

A streaming endpoint handle

Return type

dataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint

create_httpsse_streaming_endpoint(streaming_endpoint_name, url=None)

Create a new https streaming endpoint in the project, and return a handle to interact with it.

Parameters
  • streaming_endpoint_name (str) – the name for the new streaming endpoint

  • url (str) – the url of the endpoint (defaults to None)

Returns

A streaming endpoint handle

Return type

dataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint

new_managed_streaming_endpoint(streaming_endpoint_name, streaming_endpoint_type=None)

Initializes the creation of a new streaming endpoint. Returns a dataikuapi.dss.streaming_endpoint.DSSManagedStreamingEndpointCreationHelper to complete the creation of the streaming endpoint

Parameters
  • streaming_endpoint_name (str) – Name of the new streaming endpoint - must be unique in the project

  • streaming_endpoint_type (str) – Type of the new streaming endpoint (optional if it can be inferred from a connection type)

Returns

An object to create the streaming endpoint

Return type

DSSManagedStreamingEndpointCreationHelper

create_prediction_ml_task(input_dataset, target_variable, ml_backend_type='PY_MEMORY', guess_policy='DEFAULT', prediction_type=None, wait_guess_complete=True)

Creates a new prediction task in a new visual analysis lab for a dataset.

Parameters
  • input_dataset (str) – the dataset to use for training/testing the model

  • target_variable (str) – the variable to predict

  • ml_backend_type (str) – ML backend to use, one of PY_MEMORY, MLLIB or H2O (defaults to PY_MEMORY)

  • guess_policy (str) – Policy to use for setting the default parameters. Valid values are: DEFAULT, SIMPLE_FORMULA, DECISION_TREE, EXPLANATORY and PERFORMANCE (defaults to DEFAULT)

  • prediction_type (str) – The type of prediction problem this is. If not provided the prediction type will be guessed. Valid values are: BINARY_CLASSIFICATION, REGRESSION, MULTICLASS (defaults to None)

  • wait_guess_complete (boolean) – if False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms (defaults to True). You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)

Returns

A ML task handle of type ‘PREDICTION’

Return type

dataikuapi.dss.ml.DSSMLTask

create_clustering_ml_task(input_dataset, ml_backend_type='PY_MEMORY', guess_policy='KMEANS', wait_guess_complete=True)

Creates a new clustering task in a new visual analysis lab for a dataset.

The returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms.

You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)

Parameters
  • ml_backend_type (str) – ML backend to use, one of PY_MEMORY, MLLIB or H2O (defaults to PY_MEMORY)

  • guess_policy (str) – Policy to use for setting the default parameters. Valid values are: KMEANS and ANOMALY_DETECTION (defaults to KMEANS)

  • wait_guess_complete (boolean) – if False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms (defaults to True). You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)

Returns

A ML task handle of type ‘CLUSTERING’

Return type

dataikuapi.dss.ml.DSSMLTask

create_timeseries_forecasting_ml_task(input_dataset, target_variable, time_variable, timeseries_identifiers=None, guess_policy='TIMESERIES_DEFAULT', wait_guess_complete=True)

Creates a new time series forecasting task in a new visual analysis lab for a dataset.

Parameters
  • input_dataset (string) – The dataset to use for training/testing the model

  • target_variable (string) – The variable to forecast

  • time_variable (string) – Column to be used as time variable. Should be a Date (parsed) column.

  • timeseries_identifiers (list) – List of columns to be used as time series identifiers (when the dataset has multiple series)

  • guess_policy (string) – Policy to use for setting the default parameters. Valid values are: TIMESERIES_DEFAULT, TIMESERIES_STATISTICAL, and TIMESERIES_DEEP_LEARNING

  • wait_guess_complete (boolean) – If False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms. You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)

:return :class dataiku.dss.ml.DSSMLTask

list_ml_tasks()

List the ML tasks in this project

Returns

the list of the ML tasks summaries, each one as a python dict

Return type

list

get_ml_task(analysis_id, mltask_id)

Get a handle to interact with a specific ML task

Parameters
  • analysis_id (str) – the identifier of the visual analysis containing the desired ML task

  • mltask_id (str) – the identifier of the desired ML task

Returns

A ML task handle

Return type

dataikuapi.dss.ml.DSSMLTask

list_mltask_queues()

List non-empty ML task queues in this project

Returns

an iterable listing of MLTask queues (each a dict)

Return type

dataikuapi.dss.ml.DSSMLTaskQueues

create_analysis(input_dataset)

Creates a new visual analysis lab for a dataset.

Parameters

input_dataset (str) – the dataset to use for the analysis

Returns

A visual analysis handle

Return type

dataikuapi.dss.analysis.DSSAnalysis

list_analyses()

List the visual analyses in this project

Returns

the list of the visual analyses summaries, each one as a python dict

Return type

list

get_analysis(analysis_id)

Get a handle to interact with a specific visual analysis

Parameters

analysis_id (str) – the identifier of the desired visual analysis

Returns

A visual analysis handle

Return type

dataikuapi.dss.analysis.DSSAnalysis

list_saved_models()

List the saved models in this project

Returns

the list of the saved models, each one as a python dict

Return type

list

get_saved_model(sm_id)

Get a handle to interact with a specific saved model

Parameters

sm_id (str) – the identifier of the desired saved model

Returns

A saved model handle

Return type

dataikuapi.dss.savedmodel.DSSSavedModel

create_mlflow_pyfunc_model(name, prediction_type=None)

Creates a new external saved model for storing and managing MLFlow models

Parameters
  • name (str) – Human readable name for the new saved model in the flow

  • prediction_type (str) – Optional (but needed for most operations). One of BINARY_CLASSIFICATION, MULTICLASS or REGRESSION

Returns

The created saved model handle

Return type

dataikuapi.dss.savedmodel.DSSSavedModel

create_proxy_model(name, prediction_type=None)

EXPERIMENTAL. Creates a new external saved model that can contain proxy model as versions.

This is an experimental API, subject to change. :param string name: Human readable name for the new saved model in the flow :param string prediction_type: Optional (but needed for most operations). One of BINARY_CLASSIFICATION, MULTICLASS or REGRESSION

list_managed_folders()

List the managed folders in this project

Returns

the list of the managed folders, each one as a python dict

Return type

list

get_managed_folder(odb_id)

Get a handle to interact with a specific managed folder

Parameters

odb_id (str) – the identifier of the desired managed folder

Returns

A managed folder handle

Return type

dataikuapi.dss.managedfolder.DSSManagedFolder

create_managed_folder(name, folder_type=None, connection_name='filesystem_folders')

Create a new managed folder in the project, and return a handle to interact with it

Parameters
  • name (str) – the name of the managed folder

  • folder_type (str) – type of storage (defaults to None)

  • connection_name (str) – the connection name (defaults to filesystem_folders)

Returns

A managed folder handle

Return type

dataikuapi.dss.managedfolder.DSSManagedFolder

list_model_evaluation_stores()

List the model evaluation stores in this project.

Returns

The list of the model evaluation stores

Return type

list of dataikuapi.dss.modelevaluationstore.DSSModelEvaluationStore

get_model_evaluation_store(mes_id)

Get a handle to interact with a specific model evaluation store

Parameters

mes_id (str) – the id of the desired model evaluation store

Returns

A model evaluation store handle

Return type

dataikuapi.dss.modelevaluationstore.DSSModelEvaluationStore

create_model_evaluation_store(name)

Create a new model evaluation store in the project, and return a handle to interact with it.

Parameters

name (str) – the name for the new model evaluation store

Returns

A model evaluation store handle

Return type

dataikuapi.dss.modelevaluationstore.DSSModelEvaluationStore

list_model_comparisons()

List the model comparisons in this project.

Returns

The list of the model comparisons

Return type

list

get_model_comparison(mec_id)

Get a handle to interact with a specific model comparison

Parameters

mec_id (str) – the id of the desired model comparison

Returns

A model comparison handle

Return type

dataikuapi.dss.modelcomparison.DSSModelComparison

create_model_comparison(name, prediction_type)

Create a new model comparison in the project, and return a handle to interact with it.

Parameters
  • name (str) – the name for the new model comparison

  • prediction_type (str) – one of BINARY_CLASSIFICATION, REGRESSION, MULTICLASS, and TIMESERIES_FORECAST

Returns

A new model comparison handle

Return type

dataikuapi.dss.modelcomparison.DSSModelComparison

list_jobs()

List the jobs in this project

Returns

a list of the jobs, each one as a python dict, containing both the definition and the state

Return type

list

get_job(id)

Get a handler to interact with a specific job

Parameters

id (str) – the id of the desired job

Returns

A job handle

Return type

dataikuapi.dss.job.DSSJob

start_job(definition)

Create a new job, and return a handle to interact with it

Parameters

definition (dict) –

The definition should contain:

  • the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD)

  • a list of outputs to build from the available types: (DATASET, MANAGED_FOLDER, SAVED_MODEL, STREAMING_ENDPOINT)

  • (Optional) a refreshHiveMetastore field (True or False) to specify whether to re-synchronize the Hive metastore for recomputed HDFS datasets.

Returns

A job handle

Return type

dataikuapi.dss.job.DSSJob

start_job_and_wait(definition, no_fail=False)

Starts a new job and waits for it to complete.

Parameters
  • definition (dict) –

    The definition should contain:

    • the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD)

    • a list of outputs to build from the available types: (DATASET, MANAGED_FOLDER, SAVED_MODEL, STREAMING_ENDPOINT)

    • (Optional) a refreshHiveMetastore field (True or False) to specify whether to re-synchronize the Hive metastore for recomputed HDFS datasets.

  • no_fail (bool) – if true, the function won’t fail even if the job fails or aborts (defaults to False)

Returns

the final status of the job

Return type

str

new_job(job_type='NON_RECURSIVE_FORCED_BUILD')

Create a job to be run. You need to add outputs to the job (i.e. what you want to build) before running it.

job_builder = project.new_job()
job_builder.with_output("mydataset")
complete_job = job_builder.start_and_wait()
print("Job %s done" % complete_job.id)
Parameters

job_type (str) – the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD) (defaults to NON_RECURSIVE_FORCED_BUILD)

Returns

A job handle

Return type

dataikuapi.dss.project.JobDefinitionBuilder

new_job_definition_builder(job_type='NON_RECURSIVE_FORCED_BUILD')

Caution

Deprecated. Please use new_job()

list_jupyter_notebooks(active=False, as_type='object')

List the jupyter notebooks of a project.

Parameters
  • active (bool) – if True, only return currently running jupyter notebooks (defaults to active).

  • as_type (bool) – How to return the list. Supported values are “listitems” and “object” (defaults to object).

Returns

The list of the notebooks. If “as_type” is “listitems”, each one as a dataikuapi.dss.jupyternotebook.DSSJupyterNotebookListItem, if “as_type” is “objects”, each one as a dataikuapi.dss.jupyternotebook.DSSJupyterNotebook

Return type

list of dataikuapi.dss.jupyternotebook.DSSJupyterNotebook or list of dataikuapi.dss.jupyternotebook.DSSJupyterNotebookListItem

get_jupyter_notebook(notebook_name)

Get a handle to interact with a specific jupyter notebook

Parameters

notebook_name (str) – The name of the jupyter notebook to retrieve

Returns

A handle to interact with this jupyter notebook

Return type

dataikuapi.dss.jupyternotebook.DSSJupyterNotebook jupyter notebook handle

create_jupyter_notebook(notebook_name, notebook_content)

Create a new jupyter notebook and get a handle to interact with it

Parameters
  • notebook_name (str) – the name of the notebook to create

  • notebook_content (dict) – the data of the notebook to create, as a dict. The data will be converted to a JSON string internally. Use DSSJupyterNotebook.get_content() on a similar existing DSSJupyterNotebook object in order to get a sample definition object.

Returns

A handle to interact with the newly created jupyter notebook

Return type

dataikuapi.dss.jupyternotebook.DSSJupyterNotebook jupyter notebook handle

list_continuous_activities(as_objects=True)

List the continuous activities in this project

Parameters

as_objects (bool) – if True, returns a list of dataikuapi.dss.continuousactivity.DSSContinuousActivity objects, else returns a list of python dicts (defaults to True)

Returns

a list of the continuous activities, each one as a python dict, containing both the definition and the state

Return type

list

get_continuous_activity(recipe_id)

Get a handler to interact with a specific continuous activities

Parameters

recipe_id (str) – the identifier of the recipe controlled by the continuous activity

Returns

A job handle

Return type

dataikuapi.dss.continuousactivity.DSSContinuousActivity

get_variables()

Gets the variables of this project.

Returns

a dictionary containing two dictionaries : “standard” and “local”. “standard” are regular variables, exported with bundles. “local” variables are not part of the bundles for this project

Return type

dict

set_variables(obj)

Sets the variables of this project.

Warning

If executed from a python recipe, the changes made by set_variables will not be “seen” in that recipe. Use the internal API dataiku.get_custom_variables() instead if this behavior is needed

Parameters

obj (dict) – must be a modified version of the object returned by get_variables

update_variables(variables, type='standard')

Updates a set of variables for this project

Parameters
  • dict (variables) – a dict of variable name -> value to set. Keys of the dict must be strings. Values in the dict can be strings, numbers, booleans, lists or dicts

  • str (type) – Can be “standard” to update regular variables or “local” to update local-only variables that are not part of bundles for this project (defaults to standard)

list_api_services()

List the API services in this project

Returns

the list of API services, each one as a python dict

Return type

list

create_api_service(service_id)

Create a new API service, and returns a handle to interact with it. The newly-created service does not have any endpoint.

Parameters

service_id (str) – the ID of the API service to create

Returns

A API Service handle

Return type

dataikuapi.dss.apiservice.DSSAPIService

get_api_service(service_id)

Get a handle to interact with a specific API Service from the API Designer

Parameters

service_id (str) – The identifier of the API Designer API Service to retrieve

Returns

A handle to interact with this API Service

Return type

dataikuapi.dss.apiservice.DSSAPIService

list_exported_bundles()

List all the bundles created in this project on the Design Node.

Returns

A dictionary of all bundles for a project on the Design node.

Return type

dict

export_bundle(bundle_id)

Creates a new project bundle on the Design node

Parameters

bundle_id (str) – bundle id tag

get_exported_bundle_archive_stream(bundle_id)

Download a bundle archive that can be deployed in a DSS automation Node, as a binary stream.

Warning

The stream must be closed after use. Use a with statement to handle closing the stream at the end of the block by default. For example:

with project.get_exported_bundle_archive_stream('v1') as fp:
    # use fp

# or explicitly close the stream after use
fp = project.get_exported_bundle_archive_stream('v1')
# use fp, then close
fp.close()
Parameters

bundle_id (str) – the identifier of the bundle

download_exported_bundle_archive_to_file(bundle_id, path)

Download a bundle archive that can be deployed in a DSS automation Node into the given output file.

Parameters
  • bundle_id (str) – the identifier of the bundle

  • path (str) – if “-“, will write to /dev/stdout

publish_bundle(bundle_id, published_project_key=None)

Publish a bundle on the Project Deployer.

Parameters
  • bundle_id (str) – The identifier of the bundle

  • published_project_key (str) – The key of the project on the Project Deployer where the bundle will be published.A new published project will be created if none matches the key. If the parameter is not set, the key from the current DSSProject is used.

Returns

a dict with info on the bundle state once published. It contains the keys “publishedOn” for the publish date, “publishedBy” for the user who published the bundle, and “publishedProjectKey” for the key of the Project Deployer project used.

Return type

dict

list_imported_bundles()

List all the bundles imported for this project, on the Automation node.

Returns

a dict containing bundle imports for a project, on the Automation node.

Return type

dict

import_bundle_from_archive(archive_path)

Imports a bundle from a zip archive path on the Automation node.

Parameters

archive_path (str) – A full path to a zip archive, for example /home/dataiku/my-bundle-v1.zip

import_bundle_from_stream(fp)

Imports a bundle from a file stream, on the Automation node.

Usage example:

project = client.get_project('MY_PROJECT')
with open('/home/dataiku/my-bundle-v1.zip', 'rb') as f:
    project.import_bundle_from_stream(f)
Parameters

fp (file-like) – file handler.

activate_bundle(bundle_id, scenarios_to_enable=None)

Activates a bundle in this project.

Parameters
  • bundle_id (str) – The ID of the bundle to activate

  • scenarios_to_enable (dict) – An optional dict of scenarios to enable or disable upon bundle activation. The format of the dict should be scenario IDs as keys with values of True or False (defaults to {}).

Returns

A report containing any error or warning messages that occurred during bundle activation

Return type

dict

preload_bundle(bundle_id)

Preloads a bundle that has been imported on the Automation node

Parameters

bundle_id (str) – the bundle_id for an existing imported bundle

list_scenarios(as_type='listitems')

List the scenarios in this project.

Parameters

as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).

Returns

The list of the datasets. If “rtype” is “listitems”, each one as a dataikuapi.dss.scenario.DSSScenarioListItem. If “rtype” is “objects”, each one as a dataikuapi.dss.scenario.DSSScenario

Return type

list

get_scenario(scenario_id)

Get a handle to interact with a specific scenario

Parameters

str – scenario_id: the ID of the desired scenario

Returns

A scenario handle

Return type

dataikuapi.dss.scenario.DSSScenario

create_scenario(scenario_name, type, definition=None)

Create a new scenario in the project, and return a handle to interact with it

Parameters
  • scenario_name (str) – The name for the new scenario. This does not need to be unique (although this is strongly recommended)

  • type (str) – The type of the scenario. MUst be one of ‘step_based’ or ‘custom_python’

  • definition (dict) – the JSON definition of the scenario. Use get_definition(with_status=False) on an existing DSSScenario object in order to get a sample definition object (defaults to {‘params’: {}})

Returns

a dataikuapi.dss.scenario.DSSScenario handle to interact with the newly-created scenario

list_recipes(as_type='listitems')

List the recipes in this project

Parameters

as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).

Returns

The list of the recipes. If “as_type” is “listitems”, each one as a dataikuapi.dss.recipe.DSSRecipeListItem. If “as_type” is “objects”, each one as a dataikuapi.dss.recipe.DSSRecipe

Return type

list

get_recipe(recipe_name)

Gets a dataikuapi.dss.recipe.DSSRecipe handle to interact with a recipe

Parameters

recipe_name (str) – The name of the recipe

Returns

A recipe handle

Return type

dataikuapi.dss.recipe.DSSRecipe

create_recipe(recipe_proto, creation_settings)

Create a new recipe in the project, and return a handle to interact with it. We strongly recommend that you use the creator helpers instead of calling this directly.

Some recipe types require additional parameters in creation_settings:

  • ‘grouping’ : a ‘groupKey’ column name

  • ‘python’, ‘sql_query’, ‘hive’, ‘impala’ : the code of the recipe as a ‘payload’ string

Parameters
  • recipe_proto (dict) – a prototype for the recipe object. Must contain at least ‘type’ and ‘name’

  • creation_settings (dict) – recipe-specific creation settings

Returns

A recipe handle

Return type

dataikuapi.dss.recipe.DSSRecipe

new_recipe(type, name=None)

Initializes the creation of a new recipe. Returns a dataikuapi.dss.recipe.DSSRecipeCreator or one of its subclasses to complete the creation of the recipe.

Usage example:

grouping_recipe_builder = project.new_recipe("grouping")
grouping_recipe_builder.with_input("dataset_to_group_on")
# Create a new managed dataset for the output in the "filesystem_managed" connection
grouping_recipe_builder.with_new_output("grouped_dataset", "filesystem_managed")
grouping_recipe_builder.with_group_key("column")
recipe = grouping_recipe_builder.build()

# After the recipe is created, you can edit its settings
recipe_settings = recipe.get_settings()
recipe_settings.set_column_aggregations("value", sum=True)
recipe_settings.save()

# And you may need to apply new schemas to the outputs
recipe.compute_schema_updates().apply()
Parameters
  • type (str) – Type of the recipe

  • name (str) – Optional, base name for the new recipe.

Returns

A new DSS Recipe Creator handle

Return type

dataikuapi.dss.recipe.DSSRecipeCreator

get_flow()
Returns

A Flow handle

Return type

A dataikuapi.dss.flow.DSSProjectFlow

sync_datasets_acls()

Resync permissions on HDFS datasets in this project

Attention

This call requires an API key with admin rights

Returns

a handle to the task of resynchronizing the permissions

Return type

dataikuapi.dss.future.DSSFuture

list_running_notebooks(as_objects=True)

Caution

Deprecated. Use DSSProject.list_jupyter_notebooks()

List the currently-running notebooks

Returns

list of notebooks. Each object contains at least a ‘name’ field

Return type

list

get_tags()

List the tags of this project.

Returns

a dictionary containing the tags with a color

Return type

dict

set_tags(tags=None)

Set the tags of this project. :param dict tags: must be a modified version of the object returned by list_tags (defaults to {})

list_macros(as_objects=False)

List the macros accessible in this project

Parameters

as_objects – if True, return the macros as dataikuapi.dss.macro.DSSMacro macro handles instead of a list of python dicts (defaults to False)

Returns

the list of the macros

Return type

list

get_macro(runnable_type)

Get a handle to interact with a specific macro

Parameters

runnable_type (str) – the identifier of a macro

Returns

A macro handle

Return type

dataikuapi.dss.macro.DSSMacro

get_wiki()

Get the wiki

Returns

the wiki associated to the project

Return type

dataikuapi.dss.wiki.DSSWiki

get_object_discussions()

Get a handle to manage discussions on the project

Returns

the handle to manage discussions

Return type

dataikuapi.dss.discussion.DSSObjectDiscussions

init_tables_import()

Start an operation to import Hive or SQL tables as datasets into this project

Returns

a dataikuapi.dss.project.TablesImportDefinition to add tables to import

Return type

dataikuapi.dss.project.TablesImportDefinition

list_sql_schemas(connection_name)

Lists schemas from which tables can be imported in a SQL connection

Parameters

connection_name (str) – name of the SQL connection

Returns

an array of schemas names

Return type

list

list_hive_databases()

Lists Hive databases from which tables can be imported

Returns

an array of databases names

Return type

list

list_sql_tables(connection_name, schema_name=None)

Lists tables to import in a SQL connection

Parameters
  • connection_name (str) – name of the SQL connection

  • schema_name (str) – Optional, name of the schema in the SQL connection in which to list tables.

Returns

an array of tables

Return type

list

list_hive_tables(hive_database)

Lists tables to import in a Hive database

Parameters

hive_database (str) – name of the Hive database

Returns

an array of tables

Return type

list

list_elasticsearch_indices_or_aliases(connection_name)
get_app_manifest()

Gets the manifest of the application if the project is an app template or an app instance, fails otherwise.

Returns

the manifest of the application associated to the project

Return type

dataikuapi.dss.app.DSSAppManifest

setup_mlflow(managed_folder, host=None)

Set up the dss-plugin for MLflow

Parameters
get_mlflow_extension()

Get a handle to interact with the extension of MLflow provided by DSS

Returns

A Mlflow Extension handle

Return type

dataikuapi.dss.mlflow.DSSMLflowExtension

list_code_studios(as_type='listitems')

List the code studio objects in this project

Parameters

as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).

Returns

the list of the code studio objects, each one as a python dict

Return type

list

get_code_studio(code_studio_id)

Get a handle to interact with a specific code studio object

Parameters

code_studio_id (str) – the identifier of the desired code studio object

Returns

A code studio object handle

Return type

dataikuapi.dss.codestudio.DSSCodeStudioObject

create_code_studio(name, template_id)

Create a new code studio object in the project, and return a handle to interact with it

Parameters
  • name (str) – the name of the code studio object

  • template_id (str) – the identifier of a code studio template

Returns

A code studio object handle

Return type

dataikuapi.dss.codestudio.DSSCodeStudioObject

get_library()

Get a handle to manage the project library

Returns

A dataikuapi.dss.projectlibrary.DSSLibrary handle

Return type

dataikuapi.dss.projectlibrary.DSSLibrary

list_webapps(as_type='listitems')

List the webapp heads of this project

Parameters

as_type (str) – How to return the list. Supported values are “listitems” and “objects”.

Returns

The list of the webapps. If “as_type” is “listitems”, each one as a scenario.DSSWebAppListItem. If “as_type” is “objects”, each one as a scenario.DSSWebApp

Return type

list

get_webapp(webapp_id)

Get a handle to interact with a specific webapp :param webapp_id: the identifier of a webapp :returns: A dataikuapi.dss.webapp.DSSWebApp webapp handle

dataiku package API

class dataiku.Project(project_key=None)

This is a handle to interact with the current project

Note: this class is also available as dataiku.Project

get_last_metric_values()

Get the set of last values of the metrics on this project, as a dataiku.ComputedMetrics object

get_metric_history(metric_lookup)

Get the set of all values a given metric took on this project :param metric_lookup: metric name or unique identifier

save_external_metric_values(values_dict)

Save metrics on this project. The metrics are saved with the type “external”

Parameters

values_dict – the values to save, as a dict. The keys of the dict are used as metric names

get_last_check_values()

Get the set of last values of the checks on this project, as a dataiku.ComputedChecks object

get_check_history(check_lookup)

Get the set of all values a given check took on this project :param check_lookup: check name or unique identifier

set_variables(variables)

Set all variables of the current project

Parameters

variables (dict) – must be a modified version of the object returned by get_variables

get_variables()

Get project variables :param bool typed: typed true to try to cast the variable into its original type (eg. int rather than string)

Returns:

A dictionary containing two dictionaries : “standard” and “local”. “standard” are regular variables, exported with bundles. “local” variables are not part of the bundles for this project

save_external_check_values(values_dict)

Save checks on this project. The checks are saved with the type “external”

Parameters

values_dict – the values to save, as a dict. The keys of the dict are used as check names