Tracking experiments in code

Initial setup

Before you can track experiments, you need to create a Managed Folder in the project. The managed folder will be used to store artefacts. Take note of the managed folder id (8 alphanum characters, visible in the URL).

Quick start sample

import dataiku
import mlflow

project = dataiku.api_client().get_default_project()
managed_folder = project.get_managed_folder('A_MANAGED_FOLDER_ID')

with project.setup_mlflow(managed_folder=managed_folder) as mlflow:

    # Note: if you don't call this (i.e. when no experiment is specified), the default one is used
    mlflow.set_experiment("My first experiment")

    with mlflow.start_run(run_name="my_run"):
        # ...your MLflow code...
        mlflow.log_param("a", 1)
        mlflow.log_metric("b", 2)

        # This uses the regular MLflow APIs

Tracking API

DSS uses the MLflow Tracking API. Please refer to the MLflow Tracking documentation.

Autologging

MLflow Tracking comes with a very useful feature: autologging, which automatically logs metrics, parameters, and models for common machine-learning packages without the need for explicit log statements.

Leveraging MLflow autologging requires no additional configuration of the DSS integration. Some machine learning packages, such as PyTorch, may however require additional packages.

In the following sample, we activate MLflow autologging for a SKlearn model. Metrics and artifacts are automatically logged.

import dataiku
import mlflow
import sklearn.linear_model.ElasticNet

project = dataiku.api_client().get_default_project()
managed_folder = project.get_managed_folder('A_MANAGED_FOLDER_ID')

with project.setup_mlflow(managed_folder=managed_folder) as mlflow:
    mlflow.set_experiment("Let's autolog")

    # activate Mflow autologging
    mlflow.sklearn.autolog()

    with mlflow.start_run(run_name="my_run"):
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

Other topics

Logging into another project

You can log experiments into another project than the current one by using:

project = dataiku.api_client().get_project("MYOTHERPROJECT")

Experiment tracking outside DSS

MLflow Tracking integration is configured through the dataikuapi package. See Using the APIs outside of DSS for how to use it from outside of DSS

Usage without context manager

While the usage of the context manager (“with” statement) is recommended, it is not mandatory. You can use this instead:

import dataiku
import mlflow

project = dataiku.api_client().get_default_project()
managed_folder = project.get_managed_folder('A_MANAGED_FOLDER_ID')

mlflow_handle = project.setup_mlflow(managed_folder=managed_folder)

mlflow.set_experiment("My first experiment")

with mlflow.start_run(run_name="my_run"):
    # ...your MLflow code...
    mlflow.log_param("a", 1)
    mlflow.log_metric("b", 2)

mlflow_handle.clear()

Cautions

If you do not set up the integration before using the MLflow client, or use the client after clearing the integration, it may fall back to its default mode: writing experiment data as the current user, on the filesystem of the host of the DSS server.

Supported versions

See Limitations and supported versions for supported MLflow versions.