Experiment Tracking

For an introduction to Experiment Tracking in DSS, please see Experiment Tracking.

Experiment Tracking in DSS uses the MLflow Tracking API.

This section focuses on Dataiku-specific Extensions to the MLflow API

API Reference

class dataikuapi.dss.mlflow.DSSMLflowExtension(client, project_key)

A handle to interact with specific endpoints of the DSS MLflow integration.

Do not create this directly, use dataikuapi.dss.project.DSSProject.get_mlflow_extension()

list_models(run_id)

Returns the list of models of given run

Parameters

run_id (str) – run_id for which to return a list of models

list_experiments(view_type='ACTIVE_ONLY', max_results=1000)

Returns the list of experiments in the DSS project for which MLflow integration is setup

Parameters
  • view_type (str) – ACTIVE_ONLY, DELETED_ONLY or ALL

  • max_results (int) – max results count

Return type

dict

rename_experiment(experiment_id, new_name)

Renames an experiment

Parameters
  • experiment_id (str) – experiment id

  • new_name (str) – new name

restore_experiment(experiment_id)

Restores a deleted experiment

Parameters

experiment_id (str) – experiment id

restore_run(run_id)

Restores a deleted run

Parameters

run_id (str) – run id

garbage_collect()

Permanently deletes the experiments and runs marked as “Deleted”

create_experiment_tracking_dataset(dataset_name, experiment_ids=[], view_type='ACTIVE_ONLY', filter_expr='', order_by=[], format='LONG')

Creates a virtual dataset exposing experiment tracking data.

Parameters
  • dataset_name (str) – name of the dataset

  • experiment_ids (list(str)) – list of ids of experiments to filter on. No filtering if empty

  • view_type (str) – one of ACTIVE_ONLY, DELETED_ONLY and ALL. Default is ACTIVE_ONLY

  • filter_expr (str) – MLflow search expression

  • order_by (list(str)) – list of order by clauses. Default is ordered by start_time, then runId

  • format (str) – LONG or JSON. Default is LONG

clean_experiment_tracking_db()

Cleans the experiments, runs, params, metrics, tags, etc. for this project

This call requires an API key with admin rights

set_run_inference_info(run_id, prediction_type, classes=None, code_env_name=None, target=None)

Sets the type of the model, and optionally other information useful to deploy or evaluate it.

prediction_type must be one of: - REGRESSION - BINARY_CLASSIFICATION - MULTICLASS - OTHER

Classes must be specified if and only if the model is a BINARY_CLASSIFICATION or MULTICLASS model.

This information is leveraged to filter saved models on their prediction type and prefill the classes when deploying using the GUI an MLflow model as a version of a DSS Saved Model.

Parameters
  • prediction_type (str) – prediction type (see doc)

  • run_id (str) – run_id for which to set the classes

  • classes (list(str)) – ordered list of classes (not for all prediction types, see doc)

  • code_env_name (str) – name of an adequate DSS python code environment

  • target (str) – name of the target