Experiment Tracking¶

For an introduction to Experiment Tracking in DSS, please see Experiment Tracking.

Experiment Tracking in DSS uses the MLflow Tracking API.

This section focuses on Dataiku-specific Extensions to the MLflow API

API Reference¶

class dataikuapi.dss.mlflow.DSSMLflowExtension(client, project_key)¶

A handle to interact with specific endpoints of the DSS MLflow integration.

Do not create this directly, use dataikuapi.dss.project.DSSProject.get_mlflow_extension()

list_models(run_id)¶

Returns the list of models of given run

Parameters: run_id (str) – run_id for which to return a list of models

list_experiments(view_type='ACTIVE_ONLY', max_results=1000)¶

Returns the list of experiments in the DSS project for which MLflow integration is setup

Parameters

view_type (str) – ACTIVE_ONLY, DELETED_ONLY or ALL
max_results (int) – max results count

Return type

dict

rename_experiment(experiment_id, new_name)¶

Renames an experiment

Parameters

experiment_id (str) – experiment id
new_name (str) – new name

restore_experiment(experiment_id)¶

Restores a deleted experiment

Parameters: experiment_id (str) – experiment id

restore_run(run_id)¶

Restores a deleted run

Parameters: run_id (str) – run id

garbage_collect()¶: Permanently deletes the experiments and runs marked as “Deleted”

create_experiment_tracking_dataset(dataset_name, experiment_ids=[], view_type='ACTIVE_ONLY', filter_expr='', order_by=[], format='LONG')¶

Creates a virtual dataset exposing experiment tracking data.

Parameters

dataset_name (str) – name of the dataset
experiment_ids (list(str)) – list of ids of experiments to filter on. No filtering if empty
view_type (str) – one of ACTIVE_ONLY, DELETED_ONLY and ALL. Default is ACTIVE_ONLY
filter_expr (str) – MLflow search expression
order_by (list(str)) – list of order by clauses. Default is ordered by start_time, then runId
format (str) – LONG or JSON. Default is LONG

clean_experiment_tracking_db()¶

Cleans the experiments, runs, params, metrics, tags, etc. for this project

This call requires an API key with admin rights

set_run_inference_info(run_id, prediction_type, classes=None, code_env_name=None, target=None)¶

Sets the type of the model, and optionally other information useful to deploy or evaluate it.

prediction_type must be one of: - REGRESSION - BINARY_CLASSIFICATION - MULTICLASS - OTHER

Classes must be specified if and only if the model is a BINARY_CLASSIFICATION or MULTICLASS model.

This information is leveraged to filter saved models on their prediction type and prefill the classes when deploying using the GUI an MLflow model as a version of a DSS Saved Model.

Parameters

prediction_type (str) – prediction type (see doc)
run_id (str) – run_id for which to set the classes
classes (list) – ordered list of classes (not for all prediction types, see doc). Every class will be converted by calling str().
code_env_name (str) – name of an adequate DSS python code environment
target (str) – name of the target