Projects¶

Basic operations
Creating, listing and getting handles to project items
Modifying project settings
Deleting
Exporting
Importing
Duplicating
Reference documentation
- dataikuapi package API
- dataiku package API

Basic operations ¶

The list of projects in the DSS instance can be retrieved with the list_project_keys method.

client = DSSClient(host, apiKey)
dss_projects = client.list_project_keys()
print(dss_projects)

outputs

['IMPALA', 'MYSQL', 'PARTITIONED', 'PLUGINS']

Projects can be created:

new_project = client.create_project('TEST_PROJECT', 'test project', 'tester', description='a simple description')
print(client.list_project_keys()

outputs

['IMPALA', 'MYSQL', 'PARTITIONED', 'PLUGINS', 'TEST_PROJECT']

Or an existing project can be used for later manipulation:

project = client.get_project(ProjectKey)

Creating, listing and getting handles to project items ¶

Through various method on the DSSProject class, you can:

Create most types of project items (datasets, recipes, managed folders, …)
List project items
Get structured handles to interact with each type of project item

Two parts of the project’s settings can be modified directly: the metadata and the permissions. In both cases, it is advised to first retrieve the current settings state with the get_metadata and get_permissions call, modify the returned object, and then set it back on the DSS instance.

project = client.get_project(ProjectKey)

project_metadata = project.get_metadata()
project_metadata['tags'] = ['tag1','tag2']
project.set_metadata(project_metadata)

project_permissions = project.get_permissions()
project_permissions['permissions'].append({'group':'data_scientists','readProjectContent': True, 'readDashboards': True})
project.set_permissions(project_permissions)

Available permissions to be set:

{
        'group': u'data_team',
        'admin': False,
        'exportDatasetsData': True,
        'manageAdditionalDashboardUsers': False,
        'manageDashboardAuthorizations': False,
        'manageExposedElements': False,
        'moderateDashboards': False,
        'readDashboards': True,
        'readProjectContent': True,
        'runScenarios': False,
        'writeDashboards': False,
        'writeProjectContent': False,
        'shareToWorkspaces': False
}

Deleting ¶

Projects can also be deleted:

project = client.get_project('TEST_PROJECT')
project.delete()

Exporting ¶

Project export is available through the python API in two forms: either as a stream, or exported directly to a file. The data is sent zipped.

project = client.get_project('TEST_PROJECT')

project.export_to_file('exported_project.zip')

with project.get_export_stream() as s:
        ...

Importing ¶

Project can be imported directly from a zip file:

with open("myproject.zip", "rb") as f:
        client.prepare_project_import(f).execute()

Duplicating ¶

Projects can be duplicated:

project = client.get_project('TEST_PROJECT')
project.duplicate('COPY_TEST_PROJECT', 'Copy of the Test Project')

Reference documentation ¶

dataikuapi package API ¶

class dataikuapi.dss.project.DSSProject(client, project_key)¶

A handle to interact with a project on the DSS instance.

Important

Do not create this class directly, instead use dataikuapi.DSSClient.get_project()

get_summary()¶

Returns a summary of the project. The summary is a read-only view of some of the state of the project. You cannot edit the resulting dict and use it to update the project state on DSS, you must use the other more specific methods of this dataikuapi.dss.project.DSSProject object

Returns: a dict containing a summary of the project. Each dict contains at least a projectKey field
Return type: dict

get_project_folder()¶

Get the folder containing this project

Return type: dataikuapi.dss.projectfolder.DSSProjectFolder

move_to_folder(folder)¶

Moves this project to a project folder

Parameters: folder (dataikuapi.dss.projectfolder.DSSProjectFolder) – destination folder

delete(clear_managed_datasets=False, clear_output_managed_folders=False, clear_job_and_scenario_logs=True, **kwargs)¶

Delete the project

Attention

This call requires an API key with admin rights

Parameters

clear_managed_datasets (bool) – Should the data of managed datasets be cleared (defaults to False)
clear_output_managed_folders (bool) – Should the data of managed folders used as outputs of recipes be cleared (defaults to False)
clear_job_and_scenario_logs (bool) – Should the job and scenario logs be cleared (defaults to True)

get_export_stream(options=None)¶

Return a stream of the exported project

Warning

You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.

Parameters

options (dict) –

Dictionary of export options (defaults to {}). The following options are available:

exportUploads (boolean): Exports the data of Uploaded datasets (default to False)

exportManagedFS (boolean): Exports the data of managed Filesystem datasets (default to False)

exportAnalysisModels (boolean): Exports the models trained in analysis (default to False)

exportSavedModels (boolean): Exports the models trained in saved models (default to False)

exportManagedFolders (boolean): Exports the data of managed folders (default to False)

exportAllInputDatasets (boolean): Exports the data of all input datasets (default to False)

exportAllDatasets (boolean): Exports the data of all datasets (default to False)

exportAllInputManagedFolders (boolean): Exports the data of all input managed folders (default to False)

exportGitRepository (boolean): Exports the Git repository history (default to False)

exportInsightsData (boolean): Exports the data of static insights (default to False)

Returns

a stream of the export archive

Return type

file-like object

export_to_file(path, options=None)¶

Export the project to a file

Parameters

path (str) – the path of the file in which the exported project should be saved
options (dict) –
Dictionary of export options (defaults to {}). The following options are available:
- exportUploads (boolean): Exports the data of Uploaded datasets (default to False)
- exportManagedFS (boolean): Exports the data of managed Filesystem datasets (default to False)
- exportAnalysisModels (boolean): Exports the models trained in analysis (default to False)
- exportSavedModels (boolean): Exports the models trained in saved models (default to False)
- exportModelEvaluationStores (boolean): Exports the evaluation stores (default to False)
- exportManagedFolders (boolean): Exports the data of managed folders (default to False)
- exportAllInputDatasets (boolean): Exports the data of all input datasets (default to False)
- exportAllDatasets (boolean): Exports the data of all datasets (default to False)
- exportAllInputManagedFolders (boolean): Exports the data of all input managed folders (default to False)
- exportGitRepository (boolean): Exports the Git repository history (default to False)
- exportInsightsData (boolean): Exports the data of static insights (default to False)

duplicate(target_project_key, target_project_name, duplication_mode='MINIMAL', export_analysis_models=True, export_saved_models=True, export_git_repository=True, export_insights_data=True, remapping=None, target_project_folder=None)¶

Duplicate the project

Parameters

target_project_key (str) – The key of the new project
target_project_name (str) – The name of the new project
duplication_mode (str) – can be one of the following values: MINIMAL, SHARING, FULL, NONE (defaults to MINIMAL)
export_analysis_models (bool) – (defaults to True)
export_saved_models (bool) – (defaults to True)
export_git_repository (bool) – (defaults to True)
export_insights_data (bool) – (defaults to True)
remapping (dict) – dict of connections to be remapped for the new project (defaults to {})
target_project_folder (A dataikuapi.dss.projectfolder.DSSProjectFolder) – the project folder where to put the duplicated project (defaults to None)

Returns

A dict containing the original and duplicated project’s keys

Return type

dict

get_metadata()¶

Get the metadata attached to this project. The metadata contains label, description checklists, tags and custom metadata of the project.

Note

For more information on available metadata, please see https://doc.dataiku.com/dss/api/6.0/rest/

Returns: the project metadata.
Return type: dict

set_metadata(metadata)¶

Set the metadata on this project.

Usage example:

project_metadata = project.get_metadata()
project_metadata['tags'] = ['tag1','tag2']
project.set_metadata(project_metadata)

Parameters: metadata (dict) – the new state of the metadata for the project. You should only set a metadata object that has been retrieved using the get_metadata() call.

get_settings()¶

Gets the settings of this project. This does not contain permissions. See get_permissions()

Returns: a handle to read, modify and save the settings
Return type: dataikuapi.dss.project.DSSProjectSettings

get_permissions()¶

Get the permissions attached to this project

Returns: A dict containing the owner and the permissions, as a list of pairs of group name and permission type
Return type: dict

set_permissions(permissions)¶

Sets the permissions on this project

Usage example:

project_permissions = project.get_permissions()
project_permissions['permissions'].append({'group':'data_scientists',
                                            'readProjectContent': True,
                                            'readDashboards': True})
project.set_permissions(project_permissions)

Parameters: permissions (dict) – a permissions object with the same structure as the one returned by get_permissions() call

get_interest()¶

Get the interest of this project. The interest means the number of watchers and the number of stars.

Returns

a dict object containing the interest of the project with two fields:

starCount: number of stars for this project
watchCount: number of users watching this project

Return type

dict

get_timeline(item_count=100)¶

Get the timeline of this project. The timeline consists of information about the creation of this project (by whom, and when), the last modification of this project (by whom and when), a list of contributors, and a list of modifications. This list of modifications contains a maximum of item_count elements (default to 100). If item_count is greater than the real number of modification, item_count is adjusted.

Parameters

item_count (int) – maximum number of modifications to retrieve in the items list

Returns

a timeline where the top-level fields are :

allContributors: all contributors who have been involved in this project
items: a history of the modifications of the project
createdBy: who created this project
createdOn: when the project was created
lastModifiedBy: who modified this project for the last time
lastModifiedOn: when this modification took place

Return type

dict

list_datasets(as_type='listitems')¶

List the datasets in this project.

Parameters: as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).
Returns: The list of the datasets. If “as_type” is “listitems”, each one as a dataikuapi.dss.dataset.DSSDatasetListItem. If “as_type” is “objects”, each one as a dataikuapi.dss.dataset.DSSDataset
Return type: list

get_dataset(dataset_name)¶

Get a handle to interact with a specific dataset

Parameters: dataset_name (str) – the name of the desired dataset
Returns: A dataset handle
Return type: dataikuapi.dss.dataset.DSSDataset

create_dataset(dataset_name, type, params=None, formatType=None, formatParams=None)¶

Create a new dataset in the project, and return a handle to interact with it.

The precise structure of params and formatParams depends on the specific dataset type and dataset format type. To know which fields exist for a given dataset type and format type, create a dataset from the UI, and use get_dataset() to retrieve the configuration of the dataset and inspect it. Then reproduce a similar structure in the create_dataset() call.

Not all settings of a dataset can be set at creation time (for example partitioning). After creation, you’ll have the ability to modify the dataset

Parameters

dataset_name (str) – the name of the dataset to create. Must not already exist
type (str) – the type of the dataset
params (dict) – the parameters for the type, as a python dict (defaults to {})
formatType (str) – an optional format to create the dataset with (only for file-oriented datasets)
formatParams (dict) – the parameters to the format, as a python dict (only for file-oriented datasets, default to {})

Returns

A dataset handle

Return type

dataikuapi.dss.dataset.DSSDataset

create_upload_dataset(dataset_name, connection=None)¶

Create a new dataset of type ‘UploadedFiles’ in the project, and return a handle to interact with it.

Parameters

dataset_name (str) – the name of the dataset to create. Must not already exist
connection (str) – the name of the upload connection (defaults to None)

Returns

A dataset handle

Return type

dataikuapi.dss.dataset.DSSDataset

create_filesystem_dataset(dataset_name, connection, path_in_connection)¶

Create a new filesystem dataset in the project, and return a handle to interact with it.

Parameters

dataset_name (str) – the name of the dataset to create. Must not already exist
connection (str) – the name of the connection
path_in_connection (str) – the path of the dataset in the connection

Returns

A dataset handle

Return type

dataikuapi.dss.dataset.DSSDataset

create_s3_dataset(dataset_name, connection, path_in_connection, bucket=None)¶

Creates a new external S3 dataset in the project and returns a dataikuapi.dss.dataset.DSSDataset to interact with it.

The created dataset does not have its format and schema initialized, it is recommended to use autodetect_settings() on the returned object

Parameters

dataset_name (str) – the name of the dataset to create. Must not already exist
connection (str) – the name of the connection
path_in_connection (str) – the path of the dataset in the connection
bucket (str) – the name of the s3 bucket (defaults to None)

Returns

A dataset handle

Return type

dataikuapi.dss.dataset.DSSDataset

create_fslike_dataset(dataset_name, dataset_type, connection, path_in_connection, extra_params=None)¶

Create a new file-based dataset in the project, and return a handle to interact with it.

Parameters

dataset_name (str) – the name of the dataset to create. Must not already exist
dataset_type (str) – the type of the dataset
connection (str) – the name of the connection
path_in_connection (str) – the path of the dataset in the connection
extra_params (dict) – a python dict of extra parameters (defaults to None)

Returns

A dataset handle

Return type

dataikuapi.dss.dataset.DSSDataset

create_sql_table_dataset(dataset_name, type, connection, table, schema)¶

Create a new SQL table dataset in the project, and return a handle to interact with it.

Parameters

dataset_name (str) – the name of the dataset to create. Must not already exist
type (str) – the type of the dataset
connection (str) – the name of the connection
table (str) – the name of the table in the connection
schema (str) – the schema of the table

Returns

A dataset handle

Return type

dataikuapi.dss.dataset.DSSDataset

new_managed_dataset_creation_helper(dataset_name)¶: Caution

Deprecated. Please use new_managed_dataset()

new_managed_dataset(dataset_name)¶

Initializes the creation of a new managed dataset. Returns a dataikuapi.dss.dataset.DSSManagedDatasetCreationHelper or one of its subclasses to complete the creation of the managed dataset.

Usage example:

builder = project.new_managed_dataset("my_dataset")
builder.with_store_into("target_connection")
dataset = builder.create()

Parameters: dataset_name (str) – Name of the dataset to create
Returns: An object to create the managed dataset
Return type: dataikuapi.dss.dataset.DSSManagedDatasetCreationHelper

list_streaming_endpoints(as_type='listitems')¶

List the streaming endpoints in this project.

Parameters: as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).
Returns: The list of the streaming endpoints. If “as_type” is “listitems”, each one as a dataikuapi.dss.streaming_endpoint.DSSStreamingEndpointListItem. If “as_type” is “objects”, each one as a dataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint
Return type: list

get_streaming_endpoint(streaming_endpoint_name)¶

Get a handle to interact with a specific streaming endpoint

Parameters: streaming_endpoint_name (str) – the name of the desired streaming endpoint
Returns: A streaming endpoint handle
Return type: dataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint

create_streaming_endpoint(streaming_endpoint_name, type, params=None)¶

Create a new streaming endpoint in the project, and return a handle to interact with it.

The precise structure of params depends on the specific streaming endpoint type. To know which fields exist for a given streaming endpoint type, create a streaming endpoint from the UI, and use get_streaming_endpoint() to retrieve the configuration of the streaming endpoint and inspect it. Then reproduce a similar structure in the create_streaming_endpoint() call.

Not all settings of a streaming endpoint can be set at creation time (for example partitioning). After creation, you’ll have the ability to modify the streaming endpoint.

Parameters

streaming_endpoint_name (str) – the name for the new streaming endpoint
type (str) – the type of the streaming endpoint
params (dict) – the parameters for the type, as a python dict (defaults to {})

Returns

A streaming endpoint handle

Return type

dataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint

create_kafka_streaming_endpoint(streaming_endpoint_name, connection=None, topic=None)¶

Create a new kafka streaming endpoint in the project, and return a handle to interact with it.

Parameters

streaming_endpoint_name (str) – the name for the new streaming endpoint
connection (str) – the name of the kafka connection (defaults to None)
topic (str) – the name of the kafka topic (defaults to None)

Returns

A streaming endpoint handle

Return type

dataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint

create_httpsse_streaming_endpoint(streaming_endpoint_name, url=None)¶

Create a new https streaming endpoint in the project, and return a handle to interact with it.

Parameters

streaming_endpoint_name (str) – the name for the new streaming endpoint
url (str) – the url of the endpoint (defaults to None)

Returns

A streaming endpoint handle

Return type

dataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint

new_managed_streaming_endpoint(streaming_endpoint_name, streaming_endpoint_type=None)¶

Initializes the creation of a new streaming endpoint. Returns a dataikuapi.dss.streaming_endpoint.DSSManagedStreamingEndpointCreationHelper to complete the creation of the streaming endpoint

Parameters

streaming_endpoint_name (str) – Name of the new streaming endpoint - must be unique in the project
streaming_endpoint_type (str) – Type of the new streaming endpoint (optional if it can be inferred from a connection type)

Returns

An object to create the streaming endpoint

Return type

DSSManagedStreamingEndpointCreationHelper

create_prediction_ml_task(input_dataset, target_variable, ml_backend_type='PY_MEMORY', guess_policy='DEFAULT', prediction_type=None, wait_guess_complete=True)¶

Creates a new prediction task in a new visual analysis lab for a dataset.

Parameters

input_dataset (str) – the dataset to use for training/testing the model
target_variable (str) – the variable to predict
ml_backend_type (str) – ML backend to use, one of PY_MEMORY, MLLIB or H2O (defaults to PY_MEMORY)
guess_policy (str) – Policy to use for setting the default parameters. Valid values are: DEFAULT, SIMPLE_FORMULA, DECISION_TREE, EXPLANATORY and PERFORMANCE (defaults to DEFAULT)
prediction_type (str) – The type of prediction problem this is. If not provided the prediction type will be guessed. Valid values are: BINARY_CLASSIFICATION, REGRESSION, MULTICLASS (defaults to None)
wait_guess_complete (boolean) – if False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms (defaults to True). You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)

Returns

A ML task handle of type ‘PREDICTION’

Return type

dataikuapi.dss.ml.DSSMLTask

create_clustering_ml_task(input_dataset, ml_backend_type='PY_MEMORY', guess_policy='KMEANS', wait_guess_complete=True)¶

Creates a new clustering task in a new visual analysis lab for a dataset.

The returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms.

You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)

Parameters

ml_backend_type (str) – ML backend to use, one of PY_MEMORY, MLLIB or H2O (defaults to PY_MEMORY)
guess_policy (str) – Policy to use for setting the default parameters. Valid values are: KMEANS and ANOMALY_DETECTION (defaults to KMEANS)
wait_guess_complete (boolean) – if False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms (defaults to True). You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)

Returns

A ML task handle of type ‘CLUSTERING’

Return type

dataikuapi.dss.ml.DSSMLTask

create_timeseries_forecasting_ml_task(input_dataset, target_variable, time_variable, timeseries_identifiers=None, guess_policy='TIMESERIES_DEFAULT', wait_guess_complete=True)¶

Creates a new time series forecasting task in a new visual analysis lab for a dataset.

Parameters

input_dataset (string) – The dataset to use for training/testing the model
target_variable (string) – The variable to forecast
time_variable (string) – Column to be used as time variable. Should be a Date (parsed) column.
timeseries_identifiers (list) – List of columns to be used as time series identifiers (when the dataset has multiple series)
guess_policy (string) – Policy to use for setting the default parameters. Valid values are: TIMESERIES_DEFAULT, TIMESERIES_STATISTICAL, and TIMESERIES_DEEP_LEARNING
wait_guess_complete (boolean) – If False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms. You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)

:return :class dataiku.dss.ml.DSSMLTask

list_ml_tasks()¶

List the ML tasks in this project

Returns: the list of the ML tasks summaries, each one as a python dict
Return type: list

get_ml_task(analysis_id, mltask_id)¶

Get a handle to interact with a specific ML task

Parameters

analysis_id (str) – the identifier of the visual analysis containing the desired ML task
mltask_id (str) – the identifier of the desired ML task

Returns

A ML task handle

Return type

dataikuapi.dss.ml.DSSMLTask

list_mltask_queues()¶

List non-empty ML task queues in this project

Returns: an iterable listing of MLTask queues (each a dict)
Return type: dataikuapi.dss.ml.DSSMLTaskQueues

create_analysis(input_dataset)¶

Creates a new visual analysis lab for a dataset.

Parameters: input_dataset (str) – the dataset to use for the analysis
Returns: A visual analysis handle
Return type: dataikuapi.dss.analysis.DSSAnalysis

list_analyses()¶

List the visual analyses in this project

Returns: the list of the visual analyses summaries, each one as a python dict
Return type: list

get_analysis(analysis_id)¶

Get a handle to interact with a specific visual analysis

Parameters: analysis_id (str) – the identifier of the desired visual analysis
Returns: A visual analysis handle
Return type: dataikuapi.dss.analysis.DSSAnalysis

list_saved_models()¶

List the saved models in this project

Returns: the list of the saved models, each one as a python dict
Return type: list

get_saved_model(sm_id)¶

Get a handle to interact with a specific saved model

Parameters: sm_id (str) – the identifier of the desired saved model
Returns: A saved model handle
Return type: dataikuapi.dss.savedmodel.DSSSavedModel

create_mlflow_pyfunc_model(name, prediction_type=None)¶

Creates a new external saved model for storing and managing MLFlow models

Parameters

name (str) – Human readable name for the new saved model in the flow
prediction_type (str) – Optional (but needed for most operations). One of BINARY_CLASSIFICATION, MULTICLASS or REGRESSION

Returns

The created saved model handle

Return type

dataikuapi.dss.savedmodel.DSSSavedModel

create_proxy_model(name, prediction_type=None)¶

EXPERIMENTAL. Creates a new external saved model that can contain proxy model as versions.

This is an experimental API, subject to change. :param string name: Human readable name for the new saved model in the flow :param string prediction_type: Optional (but needed for most operations). One of BINARY_CLASSIFICATION, MULTICLASS or REGRESSION

list_managed_folders()¶

List the managed folders in this project

Returns: the list of the managed folders, each one as a python dict
Return type: list

get_managed_folder(odb_id)¶

Get a handle to interact with a specific managed folder

Parameters: odb_id (str) – the identifier of the desired managed folder
Returns: A managed folder handle
Return type: dataikuapi.dss.managedfolder.DSSManagedFolder

create_managed_folder(name, folder_type=None, connection_name='filesystem_folders')¶

Create a new managed folder in the project, and return a handle to interact with it

Parameters

name (str) – the name of the managed folder
folder_type (str) – type of storage (defaults to None)
connection_name (str) – the connection name (defaults to filesystem_folders)

Returns

A managed folder handle

Return type

dataikuapi.dss.managedfolder.DSSManagedFolder

list_model_evaluation_stores()¶

List the model evaluation stores in this project.

Returns: The list of the model evaluation stores
Return type: list of dataikuapi.dss.modelevaluationstore.DSSModelEvaluationStore

get_model_evaluation_store(mes_id)¶

Get a handle to interact with a specific model evaluation store

Parameters: mes_id (str) – the id of the desired model evaluation store
Returns: A model evaluation store handle
Return type: dataikuapi.dss.modelevaluationstore.DSSModelEvaluationStore

create_model_evaluation_store(name)¶

Create a new model evaluation store in the project, and return a handle to interact with it.

Parameters: name (str) – the name for the new model evaluation store
Returns: A model evaluation store handle
Return type: dataikuapi.dss.modelevaluationstore.DSSModelEvaluationStore

list_model_comparisons()¶

List the model comparisons in this project.

Returns: The list of the model comparisons
Return type: list

get_model_comparison(mec_id)¶

Get a handle to interact with a specific model comparison

Parameters: mec_id (str) – the id of the desired model comparison
Returns: A model comparison handle
Return type: dataikuapi.dss.modelcomparison.DSSModelComparison

create_model_comparison(name, prediction_type)¶

Create a new model comparison in the project, and return a handle to interact with it.

Parameters

name (str) – the name for the new model comparison
prediction_type (str) – one of BINARY_CLASSIFICATION, REGRESSION, MULTICLASS, and TIMESERIES_FORECAST

Returns

A new model comparison handle

Return type

dataikuapi.dss.modelcomparison.DSSModelComparison

list_jobs()¶

List the jobs in this project

Returns: a list of the jobs, each one as a python dict, containing both the definition and the state
Return type: list

get_job(id)¶

Get a handler to interact with a specific job

Parameters: id (str) – the id of the desired job
Returns: A job handle
Return type: dataikuapi.dss.job.DSSJob

start_job(definition)¶

Create a new job, and return a handle to interact with it

Parameters

definition (dict) –

The definition should contain:

the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD)
a list of outputs to build from the available types: (DATASET, MANAGED_FOLDER, SAVED_MODEL, STREAMING_ENDPOINT)
(Optional) a refreshHiveMetastore field (True or False) to specify whether to re-synchronize the Hive metastore for recomputed HDFS datasets.

Returns

A job handle

Return type

dataikuapi.dss.job.DSSJob

start_job_and_wait(definition, no_fail=False)¶

Starts a new job and waits for it to complete.

Parameters

definition (dict) –
The definition should contain:
- the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD)
- a list of outputs to build from the available types: (DATASET, MANAGED_FOLDER, SAVED_MODEL, STREAMING_ENDPOINT)
- (Optional) a refreshHiveMetastore field (True or False) to specify whether to re-synchronize the Hive metastore for recomputed HDFS datasets.
no_fail (bool) – if true, the function won’t fail even if the job fails or aborts (defaults to False)

Returns

the final status of the job

Return type

str

new_job(job_type='NON_RECURSIVE_FORCED_BUILD')¶

Create a job to be run. You need to add outputs to the job (i.e. what you want to build) before running it.

job_builder = project.new_job()
job_builder.with_output("mydataset")
complete_job = job_builder.start_and_wait()
print("Job %s done" % complete_job.id)

Parameters: job_type (str) – the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD) (defaults to NON_RECURSIVE_FORCED_BUILD)
Returns: A job handle
Return type: dataikuapi.dss.project.JobDefinitionBuilder

new_job_definition_builder(job_type='NON_RECURSIVE_FORCED_BUILD')¶: Caution

Deprecated. Please use new_job()

list_jupyter_notebooks(active=False, as_type='object')¶

List the jupyter notebooks of a project.

Parameters

active (bool) – if True, only return currently running jupyter notebooks (defaults to active).
as_type (bool) – How to return the list. Supported values are “listitems” and “object” (defaults to object).

Returns

The list of the notebooks. If “as_type” is “listitems”, each one as a dataikuapi.dss.jupyternotebook.DSSJupyterNotebookListItem, if “as_type” is “objects”, each one as a dataikuapi.dss.jupyternotebook.DSSJupyterNotebook

Return type

list of dataikuapi.dss.jupyternotebook.DSSJupyterNotebook or list of dataikuapi.dss.jupyternotebook.DSSJupyterNotebookListItem

get_jupyter_notebook(notebook_name)¶

Get a handle to interact with a specific jupyter notebook

Parameters: notebook_name (str) – The name of the jupyter notebook to retrieve
Returns: A handle to interact with this jupyter notebook
Return type: dataikuapi.dss.jupyternotebook.DSSJupyterNotebook jupyter notebook handle

create_jupyter_notebook(notebook_name, notebook_content)¶

Create a new jupyter notebook and get a handle to interact with it

Parameters

notebook_name (str) – the name of the notebook to create
notebook_content (dict) – the data of the notebook to create, as a dict. The data will be converted to a JSON string internally. Use DSSJupyterNotebook.get_content() on a similar existing DSSJupyterNotebook object in order to get a sample definition object.

Returns

A handle to interact with the newly created jupyter notebook

Return type

dataikuapi.dss.jupyternotebook.DSSJupyterNotebook jupyter notebook handle

list_continuous_activities(as_objects=True)¶

List the continuous activities in this project

Parameters: as_objects (bool) – if True, returns a list of dataikuapi.dss.continuousactivity.DSSContinuousActivity objects, else returns a list of python dicts (defaults to True)
Returns: a list of the continuous activities, each one as a python dict, containing both the definition and the state
Return type: list

get_continuous_activity(recipe_id)¶

Get a handler to interact with a specific continuous activities

Parameters: recipe_id (str) – the identifier of the recipe controlled by the continuous activity
Returns: A job handle
Return type: dataikuapi.dss.continuousactivity.DSSContinuousActivity

get_variables()¶

Gets the variables of this project.

Returns: a dictionary containing two dictionaries : “standard” and “local”. “standard” are regular variables, exported with bundles. “local” variables are not part of the bundles for this project
Return type: dict

set_variables(obj)¶

Sets the variables of this project.

Warning

If executed from a python recipe, the changes made by set_variables will not be “seen” in that recipe. Use the internal API dataiku.get_custom_variables() instead if this behavior is needed

Parameters: obj (dict) – must be a modified version of the object returned by get_variables

update_variables(variables, type='standard')¶

Updates a set of variables for this project

Parameters

dict (variables) – a dict of variable name -> value to set. Keys of the dict must be strings. Values in the dict can be strings, numbers, booleans, lists or dicts
str (type) – Can be “standard” to update regular variables or “local” to update local-only variables that are not part of bundles for this project (defaults to standard)

list_api_services()¶

List the API services in this project

Returns: the list of API services, each one as a python dict
Return type: list

create_api_service(service_id)¶

Create a new API service, and returns a handle to interact with it. The newly-created service does not have any endpoint.

Parameters: service_id (str) – the ID of the API service to create
Returns: A API Service handle
Return type: dataikuapi.dss.apiservice.DSSAPIService

get_api_service(service_id)¶

Get a handle to interact with a specific API Service from the API Designer

Parameters: service_id (str) – The identifier of the API Designer API Service to retrieve
Returns: A handle to interact with this API Service
Return type: dataikuapi.dss.apiservice.DSSAPIService

list_exported_bundles()¶

List all the bundles created in this project on the Design Node.

Returns: A dictionary of all bundles for a project on the Design node.
Return type: dict

export_bundle(bundle_id)¶

Creates a new project bundle on the Design node

Parameters: bundle_id (str) – bundle id tag

get_exported_bundle_archive_stream(bundle_id)¶

Download a bundle archive that can be deployed in a DSS automation Node, as a binary stream.

Warning

The stream must be closed after use. Use a with statement to handle closing the stream at the end of the block by default. For example:

with project.get_exported_bundle_archive_stream('v1') as fp:
    # use fp

# or explicitly close the stream after use
fp = project.get_exported_bundle_archive_stream('v1')
# use fp, then close
fp.close()

Parameters: bundle_id (str) – the identifier of the bundle

download_exported_bundle_archive_to_file(bundle_id, path)¶

Download a bundle archive that can be deployed in a DSS automation Node into the given output file.

Parameters

bundle_id (str) – the identifier of the bundle
path (str) – if “-“, will write to /dev/stdout

publish_bundle(bundle_id, published_project_key=None)¶

Publish a bundle on the Project Deployer.

Parameters

bundle_id (str) – The identifier of the bundle
published_project_key (str) – The key of the project on the Project Deployer where the bundle will be published.A new published project will be created if none matches the key. If the parameter is not set, the key from the current DSSProject is used.

Returns

a dict with info on the bundle state once published. It contains the keys “publishedOn” for the publish date, “publishedBy” for the user who published the bundle, and “publishedProjectKey” for the key of the Project Deployer project used.

Return type

dict

list_imported_bundles()¶

List all the bundles imported for this project, on the Automation node.

Returns: a dict containing bundle imports for a project, on the Automation node.
Return type: dict

import_bundle_from_archive(archive_path)¶

Imports a bundle from a zip archive path on the Automation node.

Parameters: archive_path (str) – A full path to a zip archive, for example /home/dataiku/my-bundle-v1.zip

import_bundle_from_stream(fp)¶

Imports a bundle from a file stream, on the Automation node.

Usage example:

project = client.get_project('MY_PROJECT')
with open('/home/dataiku/my-bundle-v1.zip', 'rb') as f:
    project.import_bundle_from_stream(f)

Parameters: fp (file-like) – file handler.

activate_bundle(bundle_id, scenarios_to_enable=None)¶

Activates a bundle in this project.

Parameters

bundle_id (str) – The ID of the bundle to activate
scenarios_to_enable (dict) – An optional dict of scenarios to enable or disable upon bundle activation. The format of the dict should be scenario IDs as keys with values of True or False (defaults to {}).

Returns

A report containing any error or warning messages that occurred during bundle activation

Return type

dict

preload_bundle(bundle_id)¶

Preloads a bundle that has been imported on the Automation node

Parameters: bundle_id (str) – the bundle_id for an existing imported bundle

list_scenarios(as_type='listitems')¶

List the scenarios in this project.

Parameters: as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).
Returns: The list of the datasets. If “rtype” is “listitems”, each one as a dataikuapi.dss.scenario.DSSScenarioListItem. If “rtype” is “objects”, each one as a dataikuapi.dss.scenario.DSSScenario
Return type: list

get_scenario(scenario_id)¶

Get a handle to interact with a specific scenario

Parameters: str – scenario_id: the ID of the desired scenario
Returns: A scenario handle
Return type: dataikuapi.dss.scenario.DSSScenario

create_scenario(scenario_name, type, definition=None)¶

Create a new scenario in the project, and return a handle to interact with it

Parameters

scenario_name (str) – The name for the new scenario. This does not need to be unique (although this is strongly recommended)
type (str) – The type of the scenario. MUst be one of ‘step_based’ or ‘custom_python’
definition (dict) – the JSON definition of the scenario. Use get_definition(with_status=False) on an existing DSSScenario object in order to get a sample definition object (defaults to {‘params’: {}})

Returns

a dataikuapi.dss.scenario.DSSScenario handle to interact with the newly-created scenario

list_recipes(as_type='listitems')¶

List the recipes in this project

Parameters: as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).
Returns: The list of the recipes. If “as_type” is “listitems”, each one as a dataikuapi.dss.recipe.DSSRecipeListItem. If “as_type” is “objects”, each one as a dataikuapi.dss.recipe.DSSRecipe
Return type: list

get_recipe(recipe_name)¶

Gets a dataikuapi.dss.recipe.DSSRecipe handle to interact with a recipe

Parameters: recipe_name (str) – The name of the recipe
Returns: A recipe handle
Return type: dataikuapi.dss.recipe.DSSRecipe

create_recipe(recipe_proto, creation_settings)¶

Create a new recipe in the project, and return a handle to interact with it. We strongly recommend that you use the creator helpers instead of calling this directly.

Some recipe types require additional parameters in creation_settings:

‘grouping’ : a ‘groupKey’ column name
‘python’, ‘sql_query’, ‘hive’, ‘impala’ : the code of the recipe as a ‘payload’ string

Parameters

recipe_proto (dict) – a prototype for the recipe object. Must contain at least ‘type’ and ‘name’
creation_settings (dict) – recipe-specific creation settings

Returns

A recipe handle

Return type

dataikuapi.dss.recipe.DSSRecipe

new_recipe(type, name=None)¶

Initializes the creation of a new recipe. Returns a dataikuapi.dss.recipe.DSSRecipeCreator or one of its subclasses to complete the creation of the recipe.

Usage example:

grouping_recipe_builder = project.new_recipe("grouping")
grouping_recipe_builder.with_input("dataset_to_group_on")
# Create a new managed dataset for the output in the "filesystem_managed" connection
grouping_recipe_builder.with_new_output("grouped_dataset", "filesystem_managed")
grouping_recipe_builder.with_group_key("column")
recipe = grouping_recipe_builder.build()

# After the recipe is created, you can edit its settings
recipe_settings = recipe.get_settings()
recipe_settings.set_column_aggregations("value", sum=True)
recipe_settings.save()

# And you may need to apply new schemas to the outputs
recipe.compute_schema_updates().apply()

Parameters

type (str) – Type of the recipe
name (str) – Optional, base name for the new recipe.

Returns

A new DSS Recipe Creator handle

Return type

dataikuapi.dss.recipe.DSSRecipeCreator

get_flow()¶

Returns: A Flow handle
Return type: A dataikuapi.dss.flow.DSSProjectFlow

sync_datasets_acls()¶

Resync permissions on HDFS datasets in this project

Attention

This call requires an API key with admin rights

Returns: a handle to the task of resynchronizing the permissions
Return type: dataikuapi.dss.future.DSSFuture

list_running_notebooks(as_objects=True)¶

Caution

Deprecated. Use DSSProject.list_jupyter_notebooks()

List the currently-running notebooks

Returns: list of notebooks. Each object contains at least a ‘name’ field
Return type: list

get_tags()¶

List the tags of this project.

Returns: a dictionary containing the tags with a color
Return type: dict

set_tags(tags=None)¶: Set the tags of this project. :param dict tags: must be a modified version of the object returned by list_tags (defaults to {})

list_macros(as_objects=False)¶

List the macros accessible in this project

Parameters: as_objects – if True, return the macros as dataikuapi.dss.macro.DSSMacro macro handles instead of a list of python dicts (defaults to False)
Returns: the list of the macros
Return type: list

get_macro(runnable_type)¶

Get a handle to interact with a specific macro

Parameters: runnable_type (str) – the identifier of a macro
Returns: A macro handle
Return type: dataikuapi.dss.macro.DSSMacro

get_wiki()¶

Get the wiki

Returns: the wiki associated to the project
Return type: dataikuapi.dss.wiki.DSSWiki

get_object_discussions()¶

Get a handle to manage discussions on the project

Returns: the handle to manage discussions
Return type: dataikuapi.dss.discussion.DSSObjectDiscussions

init_tables_import()¶

Start an operation to import Hive or SQL tables as datasets into this project

Returns: a dataikuapi.dss.project.TablesImportDefinition to add tables to import
Return type: dataikuapi.dss.project.TablesImportDefinition

list_sql_schemas(connection_name)¶

Lists schemas from which tables can be imported in a SQL connection

Parameters: connection_name (str) – name of the SQL connection
Returns: an array of schemas names
Return type: list

list_hive_databases()¶

Lists Hive databases from which tables can be imported

Returns: an array of databases names
Return type: list

list_sql_tables(connection_name, schema_name=None)¶

Lists tables to import in a SQL connection

Parameters

connection_name (str) – name of the SQL connection
schema_name (str) – Optional, name of the schema in the SQL connection in which to list tables.

Returns

an array of tables

Return type

list

list_hive_tables(hive_database)¶

Lists tables to import in a Hive database

Parameters: hive_database (str) – name of the Hive database
Returns: an array of tables
Return type: list

list_elasticsearch_indices_or_aliases(connection_name)¶

get_app_manifest()¶

Gets the manifest of the application if the project is an app template or an app instance, fails otherwise.

Returns: the manifest of the application associated to the project
Return type: dataikuapi.dss.app.DSSAppManifest

setup_mlflow(managed_folder, host=None)¶

Set up the dss-plugin for MLflow

Parameters

managed_folder (object) – the managed folder where MLflow artifacts should be stored. Can be either a managed folder id as a string, a dataikuapi.dss.managedfolder.DSSManagedFolder, or a dataiku.Folder
host (str) – setup a custom host if the backend used is not DSS (defaults to None).

get_mlflow_extension()¶

Get a handle to interact with the extension of MLflow provided by DSS

Returns: A Mlflow Extension handle
Return type: dataikuapi.dss.mlflow.DSSMLflowExtension

list_code_studios(as_type='listitems')¶

List the code studio objects in this project

Parameters: as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).
Returns: the list of the code studio objects, each one as a python dict
Return type: list

get_code_studio(code_studio_id)¶

Get a handle to interact with a specific code studio object

Parameters: code_studio_id (str) – the identifier of the desired code studio object
Returns: A code studio object handle
Return type: dataikuapi.dss.codestudio.DSSCodeStudioObject

create_code_studio(name, template_id)¶

Create a new code studio object in the project, and return a handle to interact with it

Parameters

name (str) – the name of the code studio object
template_id (str) – the identifier of a code studio template

Returns

A code studio object handle

Return type

dataikuapi.dss.codestudio.DSSCodeStudioObject

get_library()¶

Get a handle to manage the project library

Returns: A dataikuapi.dss.projectlibrary.DSSLibrary handle
Return type: dataikuapi.dss.projectlibrary.DSSLibrary

list_webapps(as_type='listitems')¶

List the webapp heads of this project

Parameters: as_type (str) – How to return the list. Supported values are “listitems” and “objects”.
Returns: The list of the webapps. If “as_type” is “listitems”, each one as a scenario.DSSWebAppListItem. If “as_type” is “objects”, each one as a scenario.DSSWebApp
Return type: list

get_webapp(webapp_id)¶: Get a handle to interact with a specific webapp :param webapp_id: the identifier of a webapp :returns: A dataikuapi.dss.webapp.DSSWebApp webapp handle

dataiku package API ¶

class dataiku.Project(project_key=None)¶

This is a handle to interact with the current project

Note: this class is also available as dataiku.Project

get_last_metric_values()¶: Get the set of last values of the metrics on this project, as a dataiku.ComputedMetrics object

get_metric_history(metric_lookup)¶: Get the set of all values a given metric took on this project :param metric_lookup: metric name or unique identifier

save_external_metric_values(values_dict)¶

Save metrics on this project. The metrics are saved with the type “external”

Parameters: values_dict – the values to save, as a dict. The keys of the dict are used as metric names

get_last_check_values()¶: Get the set of last values of the checks on this project, as a dataiku.ComputedChecks object

get_check_history(check_lookup)¶: Get the set of all values a given check took on this project :param check_lookup: check name or unique identifier

set_variables(variables)¶

Set all variables of the current project

Parameters: variables (dict) – must be a modified version of the object returned by get_variables

get_variables()¶

Get project variables :param bool typed: typed true to try to cast the variable into its original type (eg. int rather than string)

Returns:: A dictionary containing two dictionaries : “standard” and “local”. “standard” are regular variables, exported with bundles. “local” variables are not part of the bundles for this project

save_external_check_values(values_dict)¶

Save checks on this project. The checks are saved with the type “external”

Parameters: values_dict – the values to save, as a dict. The keys of the dict are used as check names