Machine learning

Through the public API, the Python client allows you to automate all the aspects of the lifecycle of machine learning models.

  • Creating a visual analysis and ML task

  • Tuning settings

  • Training models

  • Inspecting model details and results

  • Deploying saved models to Flow and retraining them

Concepts

In DSS, you train models as part of a visual analysis. A visual analysis is made of a preparation script, and one or several ML Tasks.

A ML Task is an individual section in which you train models. A ML Task is either a prediction of a single target variable, or a clustering.

The ML API allows you to manipulate ML Tasks, and use them to train models, inspect their details, and deploy them to the Flow.

Once deployed to the Flow, the Saved model can be retrained by the usual build mechanism of DSS.

A ML Task has settings, which control:

  • Which features are active

  • The preprocessing settings for each features

  • Which algorithms are active

  • The hyperparameter settings (including grid searched hyperparameters) for each algorithm

  • The settings of the grid search

  • Train/Test splitting settings

  • Feature selection and generation settings

Usage samples

The whole cycle

This examples create a prediction task, enables an algorithm, trains it, inspects models, and deploys one of the model to Flow

# client is a DSS API client

p = client.get_project("MYPROJECT")

# Create a new ML Task to predict the variable "target" from "trainset"
mltask = p.create_prediction_ml_task(
    input_dataset="trainset",
    target_variable="target",
    ml_backend_type='PY_MEMORY', # ML backend to use
    guess_policy='DEFAULT' # Template to use for setting default parameters
)

# Wait for the ML task to be ready
mltask.wait_guess_complete()

# Obtain settings, enable GBT, save settings
settings = mltask.get_settings()
settings.set_algorithm_enabled("GBT_CLASSIFICATION", True)
settings.save()

# Start train and wait for it to be complete
mltask.start_train()
mltask.wait_train_complete()

# Get the identifiers of the trained models
# There will be 3 of them because Logistic regression and Random forest were default enabled
ids = mltask.get_trained_models_ids()

for id in ids:
    details = mltask.get_trained_model_details(id)
    algorithm = details.get_modeling_settings()["algorithm"]
    auc = details.get_performance_metrics()["auc"]

    print("Algorithm=%s AUC=%s" % (algorithm, auc))

# Let's deploy the first model
model_to_deploy = ids[0]

ret = mltask.deploy_to_flow(model_to_deploy, "my_model", "trainset")

print("Deployed to saved model id = %s train recipe = %s" % (ret["savedModelId"], ret["trainRecipeName"]))

The methods for creating prediction and clustering ML tasks are defined at dataikuapi.dss.project.DSSProject.create_prediction_ml_task() and dataikuapi.dss.project.DSSProject.create_clustering_ml_task().

Obtaining a handle to an existing ML Task

When you create these ML tasks, the returned dataikuapi.dss.ml.DSSMLTask object will contain two fields analysis_id and mltask_id that can later be used to retrieve the same DSSMLTask object

# client is a DSS API client

p = client.get_project("MYPROJECT")
mltask = p.get_ml_task(analysis_id, mltask_id)

Tuning feature preprocessing

Enabling and disabling features

# mltask is a DSSMLTask object

settings = mltask.get_settings()

settings.reject_feature("not_useful")
settings.use_feature("useful")

settings.save()

Changing advanced parameters for a feature

# mltask is a DSSMLTask object

settings = mltask.get_settings()

# Use impact coding rather than dummy-coding
fs = settings.get_feature_preprocessing("mycategory")
fs["category_handling"] = "IMPACT"

# Impute missing with most frequent value
fs["missing_handling"] = "IMPUTE"
fs["missing_impute_with"] = "MODE"

settings.save()

Tuning algorithms

Exporting a model documentation

This sample shows how to generate and download a model documentation from a template.

See Model Document Generator for more information.

# mltask is a DSSMLTask object

details = mltask.get_trained_model_details(id)

# Launch the model document generation by either
# using the default template for this model by calling without argument
# or specifying a managed folder id and the path to the template to use in that folder
future = details.generate_documentation(FOLDER_ID, "path/my_template.docx")

# Alternatively, use a custom uploaded template file
with open("my_template.docx", "rb") as f:
    future = details.generate_documentation_from_custom_template(f)

# Wait for the generation to finish, retrieve the result and download the generated
# model documentation to the specified file
result = future.wait_for_result()
export_id = result["exportId"]

details.download_documentation_to_file(export_id, "path/my_model_documentation.docx")

API Reference

Interaction with a ML Task

class dataikuapi.dss.ml.DSSMLTask(client, project_key, analysis_id, mltask_id)
static from_full_model_id(client, fmi, project_key=None)
delete()

Delete the present ML task

wait_guess_complete()

Waits for guess to be complete. This should be called immediately after the creation of a new ML Task (if the ML Task was created with wait_guess_complete=False), before calling get_settings or train

get_status()

Gets the status of this ML Task

Returns

a dict

get_settings()

Gets the settings of this ML Tasks

Returns

a DSSMLTaskSettings object to interact with the settings

Return type

dataikuapi.dss.ml.DSSMLTaskSettings

train(session_name=None, session_description=None, run_queue=False)

Trains models for this ML Task

Parameters
  • session_name (str) – name for the session

  • session_description (str) – description for the session

This method waits for train to complete. If you want to train asynchronously, use start_train() and wait_train_complete()

This method returns the list of trained model identifiers. It returns models that have been trained for this train session, not all trained models for this ML task. To get all identifiers for all models trained across all training sessions, use get_trained_models_ids()

These identifiers can be used for get_trained_model_snippet(), get_trained_model_details() and deploy_to_flow()

Returns

A list of model identifiers

Return type

list of strings

ensemble(model_ids=None, method=None)

Create an ensemble model of a set of models

Parameters
  • model_ids (list) – A list of model identifiers (defaults to [])

  • method (str) – the ensembling method. One of: AVERAGE, PROBA_AVERAGE, MEDIAN, VOTE, LINEAR_MODEL, LOGISTIC_MODEL

This method waits for the ensemble train to complete. If you want to train asynchronously, use start_ensembling() and wait_train_complete()

This method returns the identifier of the trained ensemble. To get all identifiers for all models trained across all training sessions, use get_trained_models_ids()

This identifier can be used for get_trained_model_snippet(), get_trained_model_details() and deploy_to_flow()

Returns

A model identifier

Return type

string

start_train(session_name=None, session_description=None, run_queue=False)

Starts asynchronously a new train session for this ML Task.

Parameters
  • session_name (str) – name for the session

  • session_description (str) – description for the session

This returns immediately, before train is complete. To wait for train to complete, use wait_train_complete()

start_ensembling(model_ids=None, method=None)

Creates asynchronously a new ensemble models of a set of models.

Parameters
  • model_ids (list) – A list of model identifiers (defaults to [])

  • method (str) – the ensembling method (AVERAGE, PROBA_AVERAGE, MEDIAN, VOTE, LINEAR_MODEL, LOGISTIC_MODEL)

This returns immediately, before train is complete. To wait for train to complete, use wait_train_complete()

Returns

the model identifier of the ensemble

Return type

string

wait_train_complete()

Waits for train to be complete (if started with start_train())

get_trained_models_ids(session_id=None, algorithm=None)

Gets the list of trained model identifiers for this ML task.

These identifiers can be used for get_trained_model_snippet() and deploy_to_flow()

Returns

A list of model identifiers

Return type

list of strings

get_trained_model_snippet(id=None, ids=None)

Gets a quick summary of a trained model, as a dict. For complete information and a structured object, use get_trained_model_detail()

Parameters
  • id (str) – a model id

  • ids (list) – a list of model ids

Return type

dict

get_trained_model_details(id)

Gets details for a trained model

Parameters

id (str) – Identifier of the trained model, as returned by get_trained_models_ids()

Returns

A DSSTrainedPredictionModelDetails or DSSTrainedClusteringModelDetails representing the details of this trained model id

Return type

DSSTrainedPredictionModelDetails or DSSTrainedClusteringModelDetails

delete_trained_model(model_id)

Deletes a trained model

Parameters

model_id (str) – Model identifier, as returend by get_trained_models_ids()

train_queue()

Trains this MLTask’s queue

Returns

A dict including the next sessionID to be trained in the queue

:rtype dict

deploy_to_flow(model_id, model_name, train_dataset, test_dataset=None, redo_optimization=True)

Deploys a trained model from this ML Task to a saved model + train recipe in the Flow.

Parameters
  • model_id (str) – Model identifier, as returned by get_trained_models_ids()

  • model_name (str) – Name of the saved model to deploy in the Flow

  • train_dataset (str) – Name of the dataset to use as train set. May either be a short name or a PROJECT.name long name (when using a shared dataset)

  • test_dataset (str) – Name of the dataset to use as test set. If null, split will be applied to the train set. May either be a short name or a PROJECT.name long name (when using a shared dataset). Only for PREDICTION tasks

  • redo_optimization (bool) – Should the hyperparameters optimization phase be done ? Defaults to True. Only for PREDICTION tasks

Returns

A dict containing: “savedModelId” and “trainRecipeName” - Both can be used to obtain further handles

Return type

dict

redeploy_to_flow(model_id, recipe_name=None, saved_model_id=None, activate=True)

Redeploys a trained model from this ML Task to a saved model + train recipe in the Flow. Either recipe_name of saved_model_id need to be specified

Parameters
  • model_id (str) – Model identifier, as returned by get_trained_models_ids()

  • recipe_name (str) – Name of the training recipe to update

  • saved_model_id (str) – Name of the saved model to update

  • activate (bool) – Should the deployed model version become the active version

Returns

A dict containing: “impactsDownstream” - whether the active version changed and downstream recipes are impacted

Return type

dict

remove_unused_splits()

Deletes all stored splits data that are not anymore in use for this ML Task.

It is generally not needed to call this method

remove_all_splits()

Deletes all stored splits data for this ML Task. This operation saves disk space.

After performing this operation, it will not be possible anymore to: * Ensemble already trained models * View the “predicted data” or “charts” for already trained models * Resume training of models for which optimization had been previously interrupted

Training new models remains possible

guess(prediction_type=None, reguess_level=None, target_variable=None, timeseries_identifiers=None, time_variable=None, full_reguess=None)

Reguess all the settings of the ML task when no optional parameter are given. For prediction ML tasks only, set a new value for a core parameter of the task (target variable or prediction type) and subsequently reguess the impacted settings.

Parameters
  • prediction_type (string) – Only valid for prediction tasks of either BINARY_CLASSIFICATION, MULTICLASS or REGRESSION type, ignored otherwise. The prediction type to set. Cannot be set if target_variable, time_variable, or timeseries_identifiers is also specified.

  • target_variable (string) – Only valid for prediction tasks, ignored for clustering. The target variable to set. Cannot be set if prediction_type, time_variable, or timeseries_identifiers is also specified.

  • timeseries_identifiers (list) – Only valid for time series forecasting tasks. List of columns to be used as time series identifiers. Cannot be set if prediction_type, target_variable, or time_variable is also specified.

  • time_variable (string) – Only valid for time series forecasting tasks. Column to be used as time variable. Cannot be set if prediction_type, target_variable, or timeseries_identifiers is also specified.

  • full_reguess (bool) – Only valid for prediction tasks, ignored for clustering. Scope of the reguess process: whether it should reguess all the settings after changing a core parameter, or only reguess impacted settings (e.g. target remapping when changing the target, metrics when changing the prediction type…). Ignored if no core parameter is given. Defaults to true.

  • reguess_level (string) –

    Deprecated, use full_reguess instead. Only valid for prediction tasks. Can be one of the following values: - TARGET_CHANGE: Change the target if target_variable is specified, reguess the target remapping, and

    clear the model’s assertions if any. Equivalent to `full_reguess`=False (recommended usage)

    • FULL_REGUESS: All the settings of the ML task are reguessed.

      Equivalent to `full_reguess`=True (recommended usage)

Manipulation of settings

class dataikuapi.dss.ml.DSSMLTaskSettings(client, project_key, analysis_id, mltask_id, mltask_settings)

Object to read and modify the settings of a ML task.

Do not create this object directly, use DSSMLTask.get_settings() instead

get_raw()

Gets the raw settings of this ML Task. This returns a reference to the raw settings, not a copy, so changes made to the returned object will be reflected when saving.

Return type

dict

get_feature_preprocessing(feature_name)

Gets the feature preprocessing params for a particular feature. This returns a reference to the feature’s settings, not a copy, so changes made to the returned object will be reflected when saving

Returns

A dict of the preprocessing settings for a feature

Return type

dict

foreach_feature(fn, only_of_type=None)

Applies a function to all features (except target)

Parameters
  • fn (function) – Function that takes 2 parameters: feature_name and feature_params and returns modified feature_params

  • only_of_type (str) – if not None, only applies to feature of the given type. Can be one of CATEGORY, NUMERIC, TEXT or VECTOR

reject_feature(feature_name)

Marks a feature as rejected and not used for training

Parameters

feature_name (str) – Name of the feature to reject

use_feature(feature_name)

Marks a feature as input for training

Parameters

feature_name (str) – Name of the feature to reject

get_algorithm_settings(algorithm_name)
get_diagnostics_settings()

Gets the diagnostics settings for a mltask. This returns a reference to the diagnostics’ settings, not a copy, so changes made to the returned object will be reflected when saving.

This method returns a dictionary of the settings with: - ‘enabled’: indicates if the diagnostics are enabled globally, if False, all diagnostics will be disabled - ‘settings’: a list of dict comprised of:

  • ‘type’: the diagnostic type

  • ‘enabled’: indicates if the diagnostic type is enabled, if False, all diagnostics of that type will be disabled

Please refer to the documentation for details on available diagnostics.

Returns

A dict of diagnostics settings

Return type

dict

set_diagnostics_enabled(enabled)

Globally enables or disables all diagnostics.

Parameters

enabled (bool) – if the diagnostics should be enabled or not

set_diagnostic_type_enabled(diagnostic_type, enabled)

Enables or disables a diagnostic based on its type.

Please refer to the documentation for details on available diagnostics.

Parameters
  • diagnostic_type (str) – Name (in capitals) of the diagnostic type.

  • enabled (bool) – if the diagnostic should be enabled or not

set_algorithm_enabled(algorithm_name, enabled)

Enables or disables an algorithm based on its name.

Please refer to the documentation for details on available algorithms.

Parameters

algorithm_name (str) – Name (in capitals) of the algorithm.

disable_all_algorithms()

Disables all algorithms

get_all_possible_algorithm_names()

Returns the list of possible algorithm names, i.e. the list of valid identifiers for set_algorithm_enabled() and get_algorithm_settings()

This includes all possible algorithms, regardless of the prediction kind (regression/classification) or engine, so some algorithms may be irrelevant

Returns

the list of algorithm names as a list of strings

Return type

list of string

get_enabled_algorithm_names()
Returns

the list of enabled algorithm names as a list of strings

Return type

list of string

get_enabled_algorithm_settings()
Returns

the map of enabled algorithm names with their settings

Return type

dict

set_metric(metric=None, custom_metric=None, custom_metric_greater_is_better=True, custom_metric_use_probas=False)

Sets the score metric to optimize for a prediction ML Task

Parameters
  • metric (str) – metric to use. Leave empty to use a custom metric. You need to set the custom_metric value in that case

  • custom_metric (str) – code of the custom metric

  • custom_metric_greater_is_better (bool) – whether the custom metric is a score or a loss

  • custom_metric_use_probas (bool) – whether to use the classes’ probas or the predicted value (for classification)

add_custom_python_model(name='Custom Python Model', code='')

Adds a new custom python model

Parameters
  • name (str) – name of the custom model

  • code (str) – code of the custom model

add_custom_mllib_model(name='Custom MLlib Model', code='')

Adds a new custom MLlib model

Parameters
  • name (str) – name of the custom model

  • code (str) – code of the custom model

save()

Saves back these settings to the ML Task

class dataikuapi.dss.ml.DSSPredictionMLTaskSettings(client, project_key, analysis_id, mltask_id, mltask_settings)
class PredictionTypes
BINARY = 'BINARY_CLASSIFICATION'
REGRESSION = 'REGRESSION'
MULTICLASS = 'MULTICLASS'
get_all_possible_algorithm_names()

Returns the list of possible algorithm names, i.e. the list of valid identifiers for set_algorithm_enabled() and get_algorithm_settings()

This includes all possible algorithms, regardless of the prediction kind (regression/classification) or engine, so some algorithms may be irrelevant

Returns

the list of algorithm names as a list of strings

Return type

list of string

get_enabled_algorithm_names()
Returns

the list of enabled algorithm names as a list of strings

Return type

list of string

get_algorithm_settings(algorithm_name)

Gets the training settings for a particular algorithm. This returns a reference to the algorithm’s settings, not a copy, so changes made to the returned object will be reflected when saving.

This method returns the settings for this algorithm as an PredictionAlgorithmSettings (extended dict). All algorithm dicts have at least an “enabled” property/key in the settings. The “enabled” property/key indicates whether this algorithm will be trained.

Other settings are algorithm-dependent and are the various hyperparameters of the algorithm. The precise properties/keys for each algorithm are not all documented. You can print the returned AlgorithmSettings to learn more about the settings of each particular algorithm.

Please refer to the documentation for details on available algorithms.

Parameters

algorithm_name (str) – Name (in capitals) of the algorithm.

Returns

A PredictionAlgorithmSettings (extended dict) for one of the built-in prediction algorithms

Return type

PredictionAlgorithmSettings

split_ordered_by(feature_name, ascending=True)

Deprecated. Use split_params.set_time_ordering()

remove_ordered_split()

Deprecated. Use split_params.unset_time_ordering()

use_sample_weighting(feature_name)

Deprecated. use set_weighting()

set_weighting(method, feature_name=None)

Sets the method to weight samples.

If there was a WEIGHT feature declared previously, it will be set back as an INPUT feature first.

Parameters
  • method (str) – Method to use. One of NO_WEIGHTING, SAMPLE_WEIGHT (must give a feature name), CLASS_WEIGHT or CLASS_AND_SAMPLE_WEIGHT (must give a feature name)

  • feature_name (str) – Name of the feature to use as sample weight

remove_sample_weighting()

Deprecated. Use set_weighting(method=”NO_WEIGHTING”) instead

get_assertions_params()

Retrieves the assertions parameters for this ml task

Return type

DSSMLAssertionsParams

class dataikuapi.dss.ml.DSSClusteringMLTaskSettings(client, project_key, analysis_id, mltask_id, mltask_settings)
get_algorithm_settings(algorithm_name)

Gets the training settings for a particular algorithm. This returns a reference to the algorithm’s settings, not a copy, so changes made to the returned object will be reflected when saving.

This method returns a dictionary of the settings for this algorithm. All algorithm dicts have at least an “enabled” key in the dictionary. The ‘enabled’ key indicates whether this algorithm will be trained

Other settings are algorithm-dependent and are the various hyperparameters of the algorithm. The precise keys for each algorithm are not all documented. You can print the returned dictionary to learn more about the settings of each particular algorithm

Please refer to the documentation for details on available algorithms.

Param

algorithm_name: Name of the algorithm (uppercase).

Type

algorithm_name: str

Returns

A dict of the settings for an algorithm

Return type

dict

class dataikuapi.dss.ml.DSSTimeseriesForecastingMLTaskSettings(client, project_key, analysis_id, mltask_id, mltask_settings)
class PredictionTypes
TIMESERIES_FORECAST = 'TIMESERIES_FORECAST'
get_time_step_params()

Gets the time step parameters for the time series forecasting task. This returns a reference to the time step parameters, not a copy, so changes made to the returned object will be reflected when saving

Returns

A dict of the time step parameters

Return type

dict

set_time_step(time_unit=None, n_time_units=None, end_of_week_day=None, reguess=True, update_algorithm_settings=True)

Sets the time step parameters for the time series forecasting task.

Parameters
  • time_unit (str) – time unit for forecasting step. Valid values are: MILLISECOND, SECOND, MINUTE, HOUR, DAY, BUSINESS_DAY, WEEK, MONTH, QUARTER, HALF_YEAR, YEAR

  • n_time_units (int) – number of time units within a time step

  • end_of_week_day (int) – only useful for the WEEK time unit. Valid values are: 1 (Sunday), 2 (Monday), …, 7 (Saturday)

  • reguess (bool) – Defaults to true. Whether to reguess the ML task settings after changing the time step params

  • update_algorithm_settings (bool) – Defaults to true. Whether the algorithm settings should be reguessed after changing time step parameters.

Returns

get_resampling_params()

Gets the time series resampling parameters for the time series forecasting task. This returns a reference to the time series resampling parameters, not a copy, so changes made to the returned object will be reflected when saving

Returns

A dict of the resampling parameters

Return type

dict

set_numerical_interpolation(method=None, constant=None)

Sets the time series resampling numerical interpolation parameters

Parameters
  • method (str) – Interpolation method. Valid values are: NEAREST, PREVIOUS, NEXT, LINEAR, QUADRATIC, CUBIC, CONSTANT

  • constant (float) – Value for the CONSTANT interpolation method

Returns

set_numerical_extrapolation(method=None, constant=None)

Sets the time series resampling numerical extrapolation parameters

Parameters
  • method (str) – Extrapolation method. Valid values are: PREVIOUS_NEXT, NO_EXTRAPOLATION, CONSTANT, LINEAR, QUADRATIC, CUBIC

  • constant (float) – Value for the CONSTANT extrapolation method

Returns

set_categorical_imputation(method=None, constant=None)

Sets the time series resampling categorical imputation parameters

Parameters
  • method (str) – Imputation method. Valid values are: MOST_COMMON, NULL, CONSTANT, PREVIOUS_NEXT, PREVIOUS, NEXT

  • constant (str) – Value for the CONSTANT imputation method

Returns

set_duplicate_timestamp_handling(method)

Sets the time series duplicate timestamp handling

Parameters

method (str) – Duplicate timestamp handling method. Valid values are: FAIL_IF_CONFLICTING, DROP_IF_CONFLICTING, MEAN_MODE

property forecast_horizon
Returns

Number of time steps to be forecast

Return type

int

set_forecast_horizon(forecast_horizon, reguess=True, update_algorithm_settings=True)
Parameters
  • forecast_horizon (int) – Number of time steps to be forecast

  • reguess (bool) – Defaults to true. Whether to reguess the ML task settings after changing the forecast horizon

  • update_algorithm_settings (bool) – Defaults to true. Whether the algorithm settings should be reguessed after the forecast horizon.

property evaluation_gap
Returns

Number of skipped time steps for evaluation

Return type

int

property time_variable
Returns

Feature used as time variable (read-only)

Return type

str

property timeseries_identifiers
Returns

Features used as time series identifiers (read-only copy)

Return type

list

property quantiles_to_forecast
Returns

List of quantiles to forecast

Return type

list

class dataikuapi.dss.ml.PredictionSplitParamsHandler(mltask_settings)

Object to modify the train/test splitting params.

SPLIT_PARAMS_KEY = 'splitParams'
get_raw()

Gets the raw settings of the prediction split configuration. This returns a reference to the raw settings, not a copy, so changes made to the returned object will be reflected when saving.

Return type

dict

set_split_random(train_ratio=0.8, selection=None, dataset_name=None)

Sets the train/test split to random splitting of an extract of a single dataset

Parameters
  • train_ratio (float) – Ratio of rows to use for train set. Must be between 0 and 1

  • selection (object) – A DSSDatasetSelectionBuilder to build the settings of the extract of the dataset. May be None (won’t be changed)

  • dataset_name (str) – Name of dataset to split. If None, the main dataset used to create the visual analysis will be used.

set_split_kfold(n_folds=5, selection=None, dataset_name=None)

Sets the train/test split to k-fold splitting of an extract of a single dataset

Parameters
  • n_folds (int) – number of folds. Must be greater than 0

  • selection (object) – A DSSDatasetSelectionBuilder to build the settings of the extract of the dataset. May be None (won’t be changed)

  • dataset_name (str) – Name of dataset to split. If None, the main dataset used to create the visual analysis will be used.

set_split_explicit(train_selection, test_selection, dataset_name=None, test_dataset_name=None, train_filter=None, test_filter=None)

Sets the train/test split to explicit extract of one or two dataset(s)

Parameters
  • train_selection (object) – A DSSDatasetSelectionBuilder to build the settings of the extract of the train dataset. May be None (won’t be changed)

  • test_selection (object) – A DSSDatasetSelectionBuilder to build the settings of the extract of the test dataset. May be None (won’t be changed)

  • dataset_name (str) – Name of dataset to use for the extracts. If None, the main dataset used to create the ML Task will be used.

  • test_dataset_name (str) – Name of a second dataset to use for the test data extract. If None, both extracts are done from dataset_name

  • train_filter (object) – A DSSFilterBuilder to build the settings of the filter of the train dataset. May be None (won’t be changed)

  • test_filter (object) – A DSSFilterBuilder to build the settings of the filter of the test dataset. May be None (won’t be changed)

set_time_ordering(feature_name, ascending=True)

Uses a variable to sort the data for train/test split and hyperparameter optimization by time

Parameters
  • feature_name (str) – Name of the variable to use

  • ascending (bool) – True iff the test set is expected to have larger time values than the train set

unset_time_ordering()

Remove time-based ordering for train/test split and hyperparameter optimization

has_time_ordering()
Returns

whether the splitting uses time ordering

Return type

bool

get_time_ordering_variable()
Returns

the name of the variable

Return type

str

is_time_ordering_ascending()
Returns

True if the ordering is set to be ascending with respect to the time-ordering variable

Return type

bool

Exploration of results

class dataikuapi.dss.ml.DSSTrainedPredictionModelDetails(details, snippet, saved_model=None, saved_model_version=None, mltask=None, mltask_model_id=None)

Object to read details of a trained prediction model

Do not create this object directly, use DSSMLTask.get_trained_model_details() instead

get_roc_curve_data()
get_performance_metrics()

Returns all performance metrics for this model.

For binary classification model, this includes both “threshold-independent” metrics like AUC and “threshold-dependent” metrics like precision. Threshold-dependent metrics are returned at the threshold value that was found to be optimal during training.

To get access to the per-threshold values, use the following:

# Returns a list of tested threshold values
details.get_performance()["perCutData"]["cut"]
# Returns a list of F1 scores at the tested threshold values
details.get_performance()["perCutData"]["f1"]
# Both lists have the same length

If K-Fold cross-test was used, most metrics will have a “std” variant, which is the standard deviation accross the K cross-tested folds. For example, “auc” will be accompanied with “aucstd”

Returns

a dict of performance metrics values

Return type

dict

get_assertions_metrics()

Retrieves assertions metrics computed for this trained model

Returns

an object representing assertion metrics

Return type

DSSMLAssertionsMetrics

get_hyperparameter_search_points()

Gets the list of points in the hyperparameter search space that have been tested.

Returns a list of dict. Each entry in the list represents a point.

For each point, the dict contains at least:
  • “score”: the average value of the optimization metric over all the folds at this point

  • “params”: a dict of the parameters at this point. This dict has the same structure

    as the params of the best parameters

get_preprocessing_settings()

Gets the preprocessing settings that were used to train this model

Return type

dict

get_modeling_settings()

Gets the modeling (algorithms) settings that were used to train this model.

Note: the structure of this dict is not the same as the modeling params on the ML Task (which may contain several algorithm)

Return type

dict

get_actual_modeling_params()

Gets the actual / resolved parameters that were used to train this model, post hyperparameter optimization.

Returns

A dictionary, which contains at least a “resolved” key, which is a dict containing the post-optimization parameters

Return type

dict

get_trees()

Gets the trees in the model (for tree-based models)

Returns

a DSSTreeSet object to interact with the trees

Return type

dataikuapi.dss.ml.DSSTreeSet

get_coefficient_paths()

Gets the coefficient paths for Lasso models

Returns

a DSSCoefficientPaths object to interact with the coefficient paths

Return type

dataikuapi.dss.ml.DSSCoefficientPaths

get_scoring_jar_stream(model_class='model.Model', include_libs=False)

Get a scoring jar for this trained model, provided that you have the license to do so and that the model is compatible with optimized scoring. You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.

Parameters
  • model_class (str) – fully-qualified class name, e.g. “com.company.project.Model”

  • include_libs (bool) – if True, also packs the required dependencies; if False, runtime will require the scoring libs given by DSSClient.scoring_libs()

Returns

a jar file, as a stream

Return type

file-like

get_scoring_pmml_stream()

Get a scoring PMML for this trained model, provided that you have the license to do so and that the model is compatible with PMML scoring You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.

Returns

a PMML file, as a stream

Return type

file-like

get_scoring_python_stream()

Download the zip containing data to use for this trained model, provided that you have the license to do so and that the model is compatible with Python scoring. You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.

Returns

an archive file, as a stream

Return type

file-like

get_scoring_python(filename)

Download the zip containing data to use Python scoring for this trained model in filename, provided that you have the license to do so and that the model is compatible with Python scoring.

Parameters

filename (str) – filename of the resulting downloaded file

get_scoring_mlflow_stream()

Download the zip containing this trained model using MLflow Model format, provided that you have the license to do so and that the model is compatible with MLflow scoring. You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.

Returns

an archive file, as a stream

Return type

file-like

get_scoring_mlflow(filename)

Download the zip containing data for this trained model, using MLflow Model format, provided that you have the license to do so and that the model is compatible with MLflow scoring

Parameters

filename (str) – filename to the resulting MLflow Model zip

compute_subpopulation_analyses(split_by, wait=True, sample_size=1000, random_state=1337, n_jobs=1, debug_mode=False)

Launch computation of Subpopulation analyses for this trained model.

Parameters
  • split_by (list|str) – column(s) on which subpopulation analyses are to be computed (one analysis per column)

  • wait (bool) – if True, the call blocks until the computation is finished and returns the results directly

  • sample_size (int) – number of records of the dataset to use for the computation

  • random_state (int) – random state to use to build sample, for reproducibility

  • n_jobs (int) – number of cores used for parallel training. (-1 means ‘all cores’)

  • debug_mode (bool) – if True, output all logs (slower)

Returns

if wait is True, an object containing the Subpopulation analyses, else a future to wait on the result

Return type

dataikuapi.dss.ml.DSSSubpopulationAnalyses or dataikuapi.dss.future.DSSFuture

get_subpopulation_analyses()

Retrieve all subpopulation analyses computed for this trained model

Returns

the subpopulation analyses

Return type

dataikuapi.dss.ml.DSSSubpopulationAnalyses

compute_partial_dependencies(features, wait=True, sample_size=1000, random_state=1337, n_jobs=1, debug_mode=False)

Launch computation of Partial dependencies for this trained model.

Parameters
  • features (list|str) – feature(s) on which partial dependencies are to be computed

  • wait (bool) – if True, the call blocks until the computation is finished and returns the results directly

  • sample_size (int) – number of records of the dataset to use for the computation

  • random_state (int) – random state to use to build sample, for reproducibility

  • n_jobs (int) – number of cores used for parallel training. (-1 means ‘all cores’)

  • debug_mode (bool) – if True, output all logs (slower)

Returns

if wait is True, an object containing the Partial dependencies, else a future to wait on the result

Return type

dataikuapi.dss.ml.DSSPartialDependencies or dataikuapi.dss.future.DSSFuture

get_partial_dependencies()

Retrieve all partial dependencies computed for this trained model

Returns

the partial dependencies

Return type

dataikuapi.dss.ml.DSSPartialDependencies

download_documentation_stream(export_id)

Download a model documentation, as a binary stream.

Warning: this stream will monopolize the DSSClient until closed.

Parameters

export_id – the id of the generated model documentation returned as the result of the future

Returns

A DSSFuture representing the model document generation process

download_documentation_to_file(export_id, path)

Download a model documentation into the given output file.

Parameters
  • export_id – the id of the generated model documentation returned as the result of the future

  • path – the path where to download the model documentation

Returns

None

property full_id
generate_documentation(folder_id=None, path=None)

Start the model document generation from a template docx file in a managed folder, or from the default template if no folder id and path are specified.

Parameters
  • folder_id – (optional) the id of the managed folder

  • path – (optional) the path to the file from the root of the folder

Returns

A DSSFuture representing the model document generation process

generate_documentation_from_custom_template(fp)

Start the model document generation from a docx template (as a file object).

Parameters

fp (object) – A file-like object pointing to a template docx file

Returns

A DSSFuture representing the model document generation process

get_diagnostics()

Retrieves diagnostics computed for this trained model

Returns

list of diagnostics

Return type

list of type dataikuapi.dss.ml.DSSMLDiagnostic

get_origin_analysis_trained_model()

Fetch details about the model in an analysis, this model has been exported from. Returns None if the deployed trained model does not have an origin analysis trained model.

Return type

DSSTrainedModelDetails | None

get_raw()

Gets the raw dictionary of trained model details

get_raw_snippet()

Gets the raw dictionary of trained model snippet. The snippet is a lighter version than the details.

get_train_info()

Returns various information about the train process (size of the train set, quick description, timing information)

Return type

dict

get_user_meta()

Gets the user-accessible metadata (name, description, cluster labels, classification threshold) Returns the original object, not a copy. Changes to the returned object are persisted to DSS by calling save_user_meta()

save_user_meta()
class dataikuapi.dss.ml.DSSTrainedClusteringModelDetails(details, snippet, saved_model=None, saved_model_version=None, mltask=None, mltask_model_id=None)

Object to read details of a trained clustering model

Do not create this object directly, use DSSMLTask.get_trained_model_details() instead

get_raw()

Gets the raw dictionary of trained model details

get_train_info()

Returns various information about the train process (size of the train set, quick description, timing information)

Return type

dict

get_facts()

Gets the ‘cluster facts’ data, i.e. the structure behind the screen “for cluster X, average of Y is Z times higher than average

Return type

DSSClustersFacts

get_performance_metrics()

Returns all performance metrics for this clustering model.

Returns

a dict of performance metrics values

Return type

dict

get_preprocessing_settings()

Gets the preprocessing settings that were used to train this model

Return type

dict

get_modeling_settings()

Gets the modeling (algorithms) settings that were used to train this model.

Note: the structure of this dict is not the same as the modeling params on the ML Task (which may contain several algorithm)

Return type

dict

get_actual_modeling_params()

Gets the actual / resolved parameters that were used to train this model.

Returns

A dictionary, which contains at least a “resolved” key

Return type

dict

get_scatter_plots()

Gets the cluster scatter plot data

Returns

a DSSScatterPlots object to interact with the scatter plots

Return type

dataikuapi.dss.ml.DSSScatterPlots

download_documentation_stream(export_id)

Download a model documentation, as a binary stream.

Warning: this stream will monopolize the DSSClient until closed.

Parameters

export_id – the id of the generated model documentation returned as the result of the future

Returns

A DSSFuture representing the model document generation process

download_documentation_to_file(export_id, path)

Download a model documentation into the given output file.

Parameters
  • export_id – the id of the generated model documentation returned as the result of the future

  • path – the path where to download the model documentation

Returns

None

property full_id
generate_documentation(folder_id=None, path=None)

Start the model document generation from a template docx file in a managed folder, or from the default template if no folder id and path are specified.

Parameters
  • folder_id – (optional) the id of the managed folder

  • path – (optional) the path to the file from the root of the folder

Returns

A DSSFuture representing the model document generation process

generate_documentation_from_custom_template(fp)

Start the model document generation from a docx template (as a file object).

Parameters

fp (object) – A file-like object pointing to a template docx file

Returns

A DSSFuture representing the model document generation process

get_diagnostics()

Retrieves diagnostics computed for this trained model

Returns

list of diagnostics

Return type

list of type dataikuapi.dss.ml.DSSMLDiagnostic

get_origin_analysis_trained_model()

Fetch details about the model in an analysis, this model has been exported from. Returns None if the deployed trained model does not have an origin analysis trained model.

Return type

DSSTrainedModelDetails | None

get_raw_snippet()

Gets the raw dictionary of trained model snippet. The snippet is a lighter version than the details.

get_user_meta()

Gets the user-accessible metadata (name, description, cluster labels, classification threshold) Returns the original object, not a copy. Changes to the returned object are persisted to DSS by calling save_user_meta()

save_user_meta()

Saved models

class dataikuapi.dss.savedmodel.DSSSavedModel(client, project_key, sm_id)

A handle to interact with a saved model on the DSS instance.

Do not create this directly, use dataikuapi.dss.DSSProject.get_saved_model()

property id
get_settings()

Returns the settings of this saved model.

Return type

DSSSavedModelSettings

list_versions()

Get the versions of this saved model

Returns

a list of the versions, as a dict of object. Each object contains at least a “id” parameter, which can be passed to get_metric_values(), get_version_details() and set_active_version()

Return type

list

get_active_version()

Gets the active version of this saved model

Returns

a dict representing the active version or None if no version is active. The dict contains at least a “id” parameter, which can be passed to get_metric_values(), get_version_details() and set_active_version()

Return type

dict

get_version_details(version_id)

Gets details for a version of a saved model

Parameters

version_id (str) – Identifier of the version, as returned by list_versions()

Returns

A DSSTrainedPredictionModelDetails representing the details of this trained model id

Return type

DSSTrainedPredictionModelDetails

set_active_version(version_id)

Sets a particular version of the saved model as the active one

delete_versions(versions, remove_intermediate=True)

Delete version(s) of the saved model

Parameters
  • versions (list[str]) – list of versions to delete

  • remove_intermediate – also remove intermediate versions (default: True). In the case of a partitioned

model, an intermediate version is created every time a partition has finished training. :type remove_intermediate: bool

get_origin_ml_task()

Fetch the last ML task that has been exported to this saved model. Returns None if the saved model does not have an origin ml task.

Return type

DSSMLTask | None

import_mlflow_version_from_path(version_id, path, code_env_name='INHERIT', container_exec_config_name='NONE', set_active=True, binary_classification_threshold=0.5)

Create a new version for this saved model from a path containing a MLFlow model.

Requires the saved model to have been created using dataikuapi.dss.project.DSSProject.create_mlflow_pyfunc_model().

Parameters
  • version_id (str) – Identifier of the version to create

  • path (str) – An absolute path on the local filesystem. Must be a folder, and must contain a MLFlow model

  • code_env_name (str) – Name of the code env to use for this model version. The code env must contain at least mlflow and the package(s) corresponding to the used MLFlow-compatible frameworks. If value is “INHERIT”, the default active code env of the project will be used

  • container_exec_config_name (str) – Name of the containerized execution configuration to use while creating this model version. If value is “INHERIT”, the container execution configuration of the project will be used. If value is “NONE”, local execution will be used (no container)

  • set_active (bool) – sets this new version as the active version of the saved model

  • binary_classification_threshold (float) – For binary classification, define the actual threshold for the imported version. Default to 0.5

:return a :class:ExternalModelVersionHandler in order to interact with the new MLFlow model version

import_mlflow_version_from_managed_folder(version_id, managed_folder, path, code_env_name='INHERIT', container_exec_config_name='INHERIT', set_active=True, binary_classification_threshold=0.5)

Create a new version for this saved model from a path containing a MLFlow model in a managed folder.

Requires the saved model to have been created using dataikuapi.dss.project.DSSProject.create_mlflow_pyfunc_model().

Parameters
  • version_id (str) – Identifier of the version to create

  • managed_folder (str) – Identifier of the managed folder or dataikuapi.dss.managedfolder.DSSManagedFolder

  • path (str) – Path of the MLflow folder in the managed folder

  • code_env_name (str) – Name of the code env to use for this model version. The code env must contain at least mlflow and the package(s) corresponding to the used MLFlow-compatible frameworks. If value is “INHERIT”, the default active code env of the project will be used

  • container_exec_config_name (str) – Name of the containerized execution configuration to use for evaluating this model version. If value is “INHERIT”, the container execution configuration of the project will be used. If value is “NONE”, local execution will be used (no container)

  • set_active (bool) – sets this new version as the active version of the saved model

  • binary_classification_threshold (float) – For binary classification, define the actual threshold for the imported version. Default to 0.5

:return a ExternalModelVersionHandler in order to interact with the new MLFlow model version

create_proxy_model_version(version_id, protocol, configuration)

EXPERIMENTAL. Creates a new version of a proxy model.

This is an experimental API, subject to change. Requires the saved model to have been created using dataikuapi.dss.project.DSSProject.create_proxy_model(). :param str version_id: Identifier of the version to create :param str protocol: one of [“KServe”, “DSS_API_NODE”] :param dict configuration: A dictionary containing the required params for the selected protocol :return a :class:ExternalModelVersionHandler in order to interact with the new Proxy model version

get_external_model_version_handler(version_id)

Returns a :class:ExternalModelVersionHandler to interact with an External model version (MLflow or Proxy model)

get_metric_values(version_id)

Get the values of the metrics on the version of this saved model

Returns:

a list of metric objects and their value

get_zone()

Gets the flow zone of this saved model

Return type

dataikuapi.dss.flow.DSSFlowZone

move_to_zone(zone)

Moves this object to a flow zone

Parameters

zone (object) – a dataikuapi.dss.flow.DSSFlowZone where to move the object

share_to_zone(zone)

Share this object to a flow zone

Parameters

zone (object) – a dataikuapi.dss.flow.DSSFlowZone where to share the object

unshare_from_zone(zone)

Unshare this object from a flow zone

Parameters

zone (object) – a dataikuapi.dss.flow.DSSFlowZone from where to unshare the object

get_usages()

Get the recipes referencing this model

Returns:

a list of usages

get_object_discussions()

Get a handle to manage discussions on the saved model

Returns

the handle to manage discussions

Return type

dataikuapi.discussion.DSSObjectDiscussions

delete()

Delete the saved model

class dataikuapi.dss.savedmodel.DSSSavedModelSettings(saved_model, settings)

A handle on the settings of a saved model

Do not create this class directly, instead use dataikuapi.dss.DSSSavedModel.get_settings()

get_raw()
property prediction_metrics_settings

The settings of evaluation metrics for a prediction saved model

save()

Saves the settings of this saved model

MLflow models

class dataikuapi.dss.savedmodel.ExternalModelVersionHandler(saved_model, version_id)

Handler to interact with an External model version (MLflow import of Proxy model

get_settings()
set_core_metadata(target_column_name, class_labels=None, get_features_from_dataset=None, features_list=None, output_style='AUTO_DETECT', container_exec_config_name='NONE')

Sets metadata for this MLFlow model version

In addition to target_column_name, one of get_features_from_dataset or features_list must be passed in order to be able to evaluate performance

Parameters
  • target_column_name (str) – name of the target column. Mandatory in order to be able to evaluate performance

  • class_labels (list) – List of strings, ordered class labels. Mandatory in order to be able to evaluate performance on classification models

  • get_features_from_dataset (str) – Name of a dataset to get feature names from

  • features_list (list) – List of {“name”: “feature_name”, “type”: “feature_type”}

  • container_exec_config_name (str) – Name of the containerized execution configuration to use for running the evaluation process. If value is “INHERIT”, the container execution configuration of the project will be used. If value is “NONE” (default), local execution will be used (no container)

evaluate(dataset_ref, container_exec_config_name='INHERIT', selection=None, use_optimal_threshold=True)

Evaluates the performance of this model version on a particular dataset. After calling this, the “result screens” of the MLFlow model version will be available (confusion matrix, error distribution, performance metrics, …) and more information will be available when calling DSSSavedModel.get_version_details()

set_core_metadata() must be called before you can evaluate a dataset :param str dataset_ref: Evaluation dataset to use (either a dataset name, “PROJECT.datasetName”, DSSDataset instance or dataiku.Dataset instance) :param str container_exec_config_name: Name of the containerized execution configuration to use for running the evaluation process.

If value is “INHERIT”, the container execution configuration of the project will be used. If value is “NONE”, local execution will be used (no container)

Parameters
  • selection (str) – will default to HEAD_SEQUENTIAL with a maxRecords of 10_000.

  • use_optimal_threshold (boolean) – Choose between optimized or actual threshold. Optimized threshold has been computed according to the metric set on the saved model setting “prediction_metrics_settings[‘thresholdOptimizationMetric’]”

class dataikuapi.dss.savedmodel.MLFlowVersionSettings(version_handler, data)

Handle for the settings of an imported MLFlow model version

property raw
save()

Algorithm details

This section documents which algorithms are available, and some of the settings for them.

These algorithm names can be used for dataikuapi.dss.ml.DSSMLTaskSettings.get_algorithm_settings() and dataikuapi.dss.ml.DSSMLTaskSettings.set_algorithm_enabled()

Note

This documentation does not cover all settings of all algorithms. To know which settings are available for an algorithm, use mltask_settings.get_algorithm_settings('ALGORITHM_NAME') and print the returned dictionary.

Generally speaking, most algorithm settings which are arrays means that this parameter can be grid-searched. All values will be tested as part of the hyperparameter optimization.

For more documentation of settings, please refer to the UI of the visual machine learning, which contains detailed documentation for all algorithm parameters

LOGISTIC_REGRESSION

  • Type: Prediction (binary or multiclass)

  • Available on backend: PY_MEMORY

  • Main parameters:

{
    "multi_class": SingleCategoryHyperparameterSettings, # accepted valued: ['multinomial', 'ovr']
    "penalty": CategoricalHyperparameterSettings, # possible values: ["l1", "l2"]
    "C": NumericalHyperparameterSettings, # scaling: "LOGARITHMIC"
    "n_jobs": 2
}

RANDOM_FOREST_CLASSIFICATION

  • Type: Prediction (binary or multiclass)

  • Available on backend: PY_MEMORY

  • Main parameters:

{
    "n_estimators": NumericalHyperparameterSettings, # scaling: "LINEAR"
    "min_samples_leaf": NumericalHyperparameterSettings, # scaling: "LINEAR"
    "max_tree_depth": NumericalHyperparameterSettings, # scaling: "LINEAR"
    "max_feature_prop": NumericalHyperparameterSettings, # scaling: "LINEAR"
    "max_features": NumericalHyperparameterSettings, # scaling: "LINEAR"
    "selection_mode": SingleCategoryHyperparameterSettings, # accepted_values=['auto', 'sqrt', 'log2', 'number', 'prop']
    "n_jobs": 4
}

RANDOM_FOREST_REGRESSION

  • Type: Prediction (regression)

  • Available on backend: PY_MEMORY

  • Main parameters: same as RANDOM_FOREST_CLASSIFICATION

EXTRA_TREES

  • Type: Prediction (all kinds)

  • Available on backend: PY_MEMORY

RIDGE_REGRESSION

  • Type: Prediction (regression)

  • Available on backend: PY_MEMORY

LASSO_REGRESSION

  • Type: Prediction (regression)

  • Available on backend: PY_MEMORY

LEASTSQUARE_REGRESSION

  • Type: Prediction (regression)

  • Available on backend: PY_MEMORY

SVC_CLASSIFICATION

  • Type: Prediction (binary or multiclass)

  • Available on backend: PY_MEMORY

SVM_REGRESSION

  • Type: Prediction (regression)

  • Available on backend: PY_MEMORY

SGD_CLASSIFICATION

  • Type: Prediction (binary or multiclass)

  • Available on backend: PY_MEMORY

SGD_REGRESSION

  • Type: Prediction (regression)

  • Available on backend: PY_MEMORY

GBT_CLASSIFICATION

  • Type: Prediction (binary or multiclass)

  • Available on backend: PY_MEMORY

GBT_REGRESSION

  • Type: Prediction (regression)

  • Available on backend: PY_MEMORY

DECISION_TREE_CLASSIFICATION

  • Type: Prediction (binary or multiclass)

  • Available on backend: PY_MEMORY

DECISION_TREE_REGRESSION

  • Type: Prediction (regression)

  • Available on backend: PY_MEMORY

LIGHTGBM_CLASSIFICATION

  • Type: Prediction (binary or multiclass)

  • Available on backend: PY_MEMORY

LIGHTGBM_REGRESSION

  • Type: Prediction (regression)

  • Available on backend: PY_MEMORY

XGBOOST_CLASSIFICATION

  • Type: Prediction (binary or multiclass)

  • Available on backend: PY_MEMORY

XGBOOST_REGRESSION

  • Type: Prediction (regression)

  • Available on backend: PY_MEMORY

NEURAL_NETWORK

  • Type: Prediction (all kinds)

  • Available on backend: PY_MEMORY

DEEP_NEURAL_NETWORK_REGRESSION

  • Type: Prediction (regression)

  • Available on backend: PY_MEMORY

DEEP_NEURAL_NETWORK_CLASSIFICATION

  • Type: Prediction (binary or multiclass)

  • Available on backend: PY_MEMORY

KNN

  • Type: Prediction (all kinds)

  • Available on backend: PY_MEMORY

LARS

  • Type: Prediction (all kinds)

  • Available on backend: PY_MEMORY

MLLIB_LOGISTIC_REGRESSION

  • Type: Prediction (binary or multiclass)

  • Available on backend: MLLIB

MLLIB_DECISION_TREE

  • Type: Prediction (all kinds)

  • Available on backend: MLLIB

MLLIB_RANDOM_FOREST

  • Type: Prediction (all kinds)

  • Available on backend: MLLIB

MLLIB_GBT

  • Type: Prediction (all kinds)

  • Available on backend: MLLIB

MLLIB_LINEAR_REGRESSION

  • Type: Prediction (regression)

  • Available on backend: MLLIB

MLLIB_NAIVE_BAYES

  • Type: Prediction (all kinds)

  • Available on backend: MLLIB

Other

  • SCIKIT_MODEL

  • MLLIB_CUSTOM

  • SPARKLING_DEEP_LEARNING

  • SPARKLING_GBM

  • SPARKLING_RF

  • SPARKLING_GLM

  • SPARKLING_NB