Importing MLflow models

You can import an already trained MLflow Model into DSS as a Saved Model.

Importing MLflow models is done:

  • through the API

  • or using the “Deploy” action available for models in Experiment Tracking’s runs (see Deploying MLflow models).

This section focuses on the deployment through the API. It assumes that you already have a MLflow model in a model_directory, i.e. a local folder on the local filesystem, or a Managed Folder.

This section also assumes that you already have a code environment including core packages, MLflow, scikit-learn, statsmodels, as well as the Machine Learning package you used to train your model, in the version recommended by MLflow. The Python version of this code environment should be 3.7 or higher. Please refer to Limitations and supported versions for information on supported versions.

The steps are then:

  1. Create a DSS Saved Model using dataikuapi.dss.project.DSSProject.create_mlflow_pyfunc_model()

  2. Import the MLflow model into the Saved Model using dataikuapi.dss.savedmodel.DSSSavedModel.import_mlflow_version_from_path() or dataikuapi.dss.savedmodel.DSSSavedModel.import_mlflow_version_from_managed_folder()

  3. Use the returned MLflow handler to set metadata and evaluate the DSS Saved Model: dataikuapi.dss.savedmodel.MLFlowVersionHandler()

Note

You may specify a code environment when importing a project. If not, the current code environment defined for the project will be resolved and used.

import dataiku

 # if using API from inside DSS
client = dataiku.api_client()

project = client.get_project("PROJECT_ID")

# 1. Create DSS Saved Model
saved_model = project.create_mlflow_pyfunc_model(name, prediction_type, 'code-environment-to-use')

# 2. Load the MLflow Model as a new version of DSS Saved Model
## either from DSS host local filesystem:
mlflow_version = saved_model.import_mlflow_version_from_path("version_id", model_directory, 'code-environment-to-use')
## or from a DSS managed folder:
mlflow_version = saved_model.import_mlflow_version_from_managed_folder('version_id', 'managed_folder_id', path_of_model, 'code-environment-to-use')

# 3. Evaluate the saved model version
# (Optional, only for regression or classification models with tabular input data, mandatory to have access to the saved model performance tab)
mlflow_version.set_core_metadata(target_column, classes, evaluation_dataset_name)
mlflow_version.evaluate(evaluation_dataset_name)

Note

You may also use the API to import models trained in experiment runs, as any model stored in a managed folder.