Model Evaluation Stores¶
Through the public API, the Python client allows you to perform evaluation of models. Those models are typically models trained in the Lab, and then deployed to the Flow as Saved Models (see Machine learning for additional information). They can also be external models.
Concepts¶
With a DSS model¶
In DSS, you can evaluate a version of a Saved Model using an Evaluation Recipe. An Evaluation Recipe takes as input a Saved Model and a Dataset on which to perform this evaluation. An Evaluation Recipe can have three outputs:
an output dataset,
a metrics dataset, or
a Model Evaluation Store (MES).
By default, the active version of the Saved Model is evaluated. This can be configured in the Evaluation Recipe.
If a MES is configured as an output, a Model Evaluation (ME) will be written in the MES each time the MES is built (or each time the Evaluation Recipe is run).
A Model Evaluation is a container for metrics of the evaluation of the Saved Model Version on the Evaluation Dataset. Those metrics include:
all available performance metrics,
the Data Drift metric.
The Data Drift metric is the accuracy of a model trained to recognize lines:
from the evaluation dataset
from the train time test dataset of the configured version of the Saved Model.
The higher this metric, the better the model can separate lines from the evaluation dataset from those from the train time test dataset. And so, the more data from the evaluation dataset is different from train time data.
Detailed information and other tools, including a binomial test, univariate data drift, and feature drift importance, are available in the Input Data Drift tab of a Model Evaluation. Note that this tool is interactive and that displayed results are not persisted.
With an external model¶
In DSS, you can also evaluate an external model using a Standalone Evaluation Recipe. A Standalone Evaluation Recipe (SER) takes as input a labeled dataset containing labels, predictions, and (optionally) weights. A SER takes a single output: a Model Evaluation Store.
As the Evaluation Recipe, the Standalone Evaluation Recipe will output a Model Evaluation to the configured Model Evaluation Store each time it runs. In this case, however, the Data Drift can not be computed as there is no notion of reference data.
How evaluation is performed¶
The Evaluation Recipe and its counterpart for external models, the Standalone Evaluation Recipe, perform the evaluation on a sample of the Evaluation Dataset. The sampling parameters are defined in the recipe. Note that the sample will contain at most 20,000 lines.
Performance metrics are then computed on this sample.
Data drift can be computed in three ways:
at evaluation time, between the evaluation dataset and the train time test dataset;
using the API, between the samples of a Model Evaluation, a Saved Model Version (sample of train time test dataset) or a Lab Model (sample of train time test dataset);
interactively, in the “Input data drift” tab of a Model Evaluation.
In all cases, to compute the Data Drift, the sample of the Model Evaluation and a sample of the reference data are concatenated. In order to balance the data, those samples are truncated to the length of the smallest one. If the size of the reference sample if higher than the size of the ME sample, the reference sample will be truncated.
So:
at evaluation time, we shall take as input the sample of the Model Evaluation (whose length is at most 20,000 lines) and a sample of the train time test dataset;
interactively, the sample of the reference model evaluation and:
if the other compared item is an ME, its sample;
if the other compared item is a Lab Model or an SMV, a sample of its train time test dataset.
Limitations¶
Model Evaluation Stores cannot be used with:
clustering models,
ensembling models,
partitioned models.
Compatible prediction models have to be Python models.
Usage samples¶
Create a Model Evaluation Store¶
# client is a DSS API client
p = client.get_project("MYPROJECT")
mes_id = p.create_model_evaluation_store("My Mes Name")
Note that the display name of a Model Evaluation Store (in the above sample My Mes Name) is distinct from its unique id.
Retrieve a Model Evaluation Store¶
# client is a DSS API client
p = client.get_project("MYPROJECT")
mes_id = p.get_model_evaluation_store("mes_id")
List Model Evaluation Stores¶
# client is a DSS API client
p = client.get_project("MYPROJECT")
stores = p.list_model_evaluation_stores(as_type="objects")
Create an Evaluation Recipe¶
Build a Model Evaluation Store and retrieve the performance and data drift metrics of the just computed ME¶
# client is a DSS API client
p = client.get_project("MYPROJECT")
mes = project.get_model_evaluation_store("M3s_1d")
mes.build()
me = mes.get_latest_model_evaluation()
full_info = me.get_full_info()
metrics = full_info.metrics
List Model Evaluations from a store¶
# client is a DSS API client
p = client.get_project("MYPROJECT")
mes = project.get_model_evaluation_store("M3s_1d")
me_list = mes.list_model_evaluations()
Retrieve an array of creation date / accuracy from a store¶
p = client.get_project("MYPROJECT")
mes = project.get_model_evaluation_store("M3s_1d")
me_list = mes.list_model_evaluations()
res = []
for me in me_list:
full_info = me.get_full_info()
creation_date = full_info.creation_date
accuracy = full_info.metrics["accuracy"]
res.append([creation_date,accuracy])
Retrieve an array of label value / precision from a store¶
The date of creation of a model evaluation might not be the best way to key a metric. In some cases, it might be more interesting to use the labeling system, for instance to tag the version of the evaluation dataset.
If the user created a label “myCustomLabel:evaluationDataset”, he may retrieve an array of label value / precision from a store with the following snippet:
p = client.get_project("MYPROJECT")
mes = project.get_model_evaluation_store("M3s_1d")
me_list = mes.list_model_evaluations()
res = []
for me in me_list:
full_info = me.get_full_info()
label_value = next(x for x in full_info.user_meta["labels"] if x["key"] == "myCustomLabel:evaluationDataset")
precision= full_info.metrics["precision"]
res.append([label_value,precision])
Compute data drift of the evaluation dataset of a Model Evaluation with the train time test dataset of its base DSS model version¶
# using base SMV is implicit
drift = me1.compute_data_drift()
drift_model_result = drift.drift_model_result
drift_model_accuracy = drift_model_result.drift_model_accuracy
print("Value: {} < {} < {}".format(drift_model_accuracy.lower_confidence_interval,
drift_model_accuracy.value,
drift_model_accuracy.upper_confidence_interval))
print("p-value: {}".format(drift_model_accuracy.pvalue))
Compute data drift, display results and adjust parameters¶
# me1 and me2 are two compatible model evaluations (having the same prediction type) from any store
drift = me1.compute_data_drift(me2)
drift_model_result = drift.drift_model_result
drift_model_accuracy = drift_model_result.drift_model_accuracy
print("Value: {} < {} < {}".format(drift_model_accuracy.lower_confidence_interval,
drift_model_accuracy.value,
drift_model_accuracy.upper_confidence_interval))
print("p-value: {}".format(drift_model_accuracy.pvalue))
# Check sample sizes
print("Reference sample size: {}".format(drift_model_result.get_raw()["referenceSampleSize"]))
print("Current sample size: {}".format(drift_model_result.get_raw()["currentSampleSize"]))
# check columns handling
per_col_settings = drift.per_column_settings
for col_settings in per_col_settings:
print("col {} - default handling {} - actual handling {}".format(col_settings.name, col_settings.default_column_handling, col_settings.actual_column_handling))
# recompute, with Pclass set as CATEGORICAL
drift = me1.compute_data_drift(me2,
DataDriftParams.from_params(
PerColumnDriftParamBuilder().with_column_drift_param("Pclass", "CATEGORICAL", True).build()
)
)
...
API reference¶
There are two main parts related to the handling of metrics and checks in Dataiku’s Python APIs:
dataiku.core.model_evaluation_store.ModelEvaluationStore
anddataiku.core.model_evaluation_store.ModelEvaluation
in the dataiku package. They were initially designed for usage within DSS.dataikuapi.dss.modelevaluationstore.DSSModelEvaluationStore
anddataikuapi.dss.modelevaluationstore.DSSModelEvaluation
in the dataikuapi package. They were initially designed for usage outside of DSS.
Both set of classes have fairly similar capabilities.
For more details on the two packages, please see Python APIs.
dataiku package API¶
-
class
dataiku.
ModelEvaluationStore
(lookup, project_key=None, ignore_flow=False)¶ This is a handle to interact with a model evaluation store
Note: this class is also available as
dataiku.ModelEvaluationStore
-
get_info
(sensitive_info=False)¶ Get information about the location and settings of this model evaluation store :rtype: dict
-
get_path
()¶ Gets the filesystem path of this model evaluation store.
-
get_id
()¶
-
get_name
()¶
-
list_runs
()¶
-
get_evaluation
(evaluation_id)¶
-
get_last_metric_values
()¶ Get the set of last values of the metrics on this folder, as a
dataiku.ComputedMetrics
object
-
get_metric_history
(metric_lookup)¶ Get the set of all values a given metric took on this folder :param metric_lookup: metric name or unique identifier
-
-
class
dataiku.core.model_evaluation_store.
ModelEvaluation
(store, evaluation_id)¶ This is a handle to interact with a model evaluation
-
set_preparation_steps
(steps, requested_output_schema, context_project_key=None)¶
-
get_schema
()¶ Gets the schema of the sample in this model evaluation store, as an array of objects like this one: { ‘type’: ‘string’, ‘name’: ‘foo’, ‘maxLength’: 1000 }. There is more information for the map, array and object types.
-
get_dataframe
(columns=None, infer_with_pandas=True, parse_dates=True, bool_as_str=False, float_precision=None)¶ Read the sample in the run as a Pandas dataframe.
Pandas dataframes are fully in-memory, so you need to make sure that your dataset will fit in RAM before using this.
Keywords arguments:
infer_with_pandas – uses the types detected by pandas rather than the dataset schema as detected in DSS. (default True)
parse_dates – Date column in DSS’s dataset schema are parsed (default True)
bool_as_str – Leave boolean values as strings (default False)
Inconsistent sampling parameter raise ValueError.
Note about encoding:
Column labels are “unicode” objects
When a column is of string type, the content is made of utf-8 encoded “str” objects
-
iter_dataframes_forced_types
(names, dtypes, parse_date_columns, sampling=None, chunksize=10000, float_precision=None)¶
-
iter_dataframes
(chunksize=10000, infer_with_pandas=True, parse_dates=True, columns=None, bool_as_str=False, float_precision=None)¶ Read the model evaluation sample to Pandas dataframes by chunks of fixed size.
Returns a generator over pandas dataframes.
Useful is the sample doesn’t fit in RAM.
-
dataikuapi package API¶
-
class
dataikuapi.dss.modelevaluationstore.
DSSModelEvaluationStore
(client, project_key, mes_id)¶ A handle to interact with a model evaluation store on the DSS instance.
Do not create this directly, use
dataikuapi.dss.DSSProject.get_model_evaluation_store()
-
property
id
¶
-
get_settings
()¶ Returns the settings of this model evaluation store.
- Return type
-
get_zone
()¶ Gets the flow zone of this model evaluation store
- Return type
-
move_to_zone
(zone)¶ Moves this object to a flow zone
- Parameters
zone (object) – a
dataikuapi.dss.flow.DSSFlowZone
where to move the object
Share this object to a flow zone
- Parameters
zone (object) – a
dataikuapi.dss.flow.DSSFlowZone
where to share the object
Unshare this object from a flow zone
- Parameters
zone (object) – a
dataikuapi.dss.flow.DSSFlowZone
from where to unshare the object
-
get_usages
()¶ Get the recipes referencing this model evaluation store
- Returns:
a list of usages
-
get_object_discussions
()¶ Get a handle to manage discussions on the model evaluation store
- Returns
the handle to manage discussions
- Return type
dataikuapi.discussion.DSSObjectDiscussions
-
delete
()¶ Delete the model evaluation store
-
list_model_evaluations
()¶ List the model evaluations in this model evaluation store. The list is sorted by ME creation date.
- Returns
The list of the model evaluations
- Return type
list of
dataikuapi.dss.modelevaluationstore.DSSModelEvaluation
-
get_model_evaluation
(evaluation_id)¶ Get a handle to interact with a specific model evaluation
- Parameters
evaluation_id (string) – the id of the desired model evaluation
- Returns
A
dataikuapi.dss.modelevaluationstore.DSSModelEvaluation
model evaluation handle
-
get_latest_model_evaluation
()¶ Get a handle to interact with the latest model evaluation computed
- Returns
A
dataikuapi.dss.modelevaluationstore.DSSModelEvaluation
model evaluation handle if the store is not empty, else None
-
delete_model_evaluations
(evaluations)¶ Remove model evaluations from this store
-
build
(job_type='NON_RECURSIVE_FORCED_BUILD', wait=True, no_fail=False)¶ Starts a new job to build this model evaluation store and wait for it to complete. Raises if the job failed.
job = mes.build() print("Job %s done" % job.id)
- Parameters
job_type – The job type. One of RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD or RECURSIVE_FORCED_BUILD
wait – wait for the build to finish before returning
no_fail – if True, does not raise if the job failed. Valid only when wait is True
- Returns
the
dataikuapi.dss.job.DSSJob
job handle corresponding to the built job- Return type
-
get_last_metric_values
()¶ Get the metrics of the latest model evaluation built
- Returns:
a list of metric objects and their value
-
get_metric_history
(metric)¶ Get the history of the values of the metric on this model evaluation store
- Returns:
an object containing the values of the metric, cast to the appropriate type (double, boolean,…)
-
compute_metrics
(metric_ids=None, probes=None)¶ Compute metrics on this model evaluation store. If the metrics are not specified, the metrics setup on the model evaluation store are used.
-
property
-
class
dataikuapi.dss.modelevaluationstore.
DSSModelEvaluationStoreSettings
(model_evaluation_store, settings)¶ A handle on the settings of a model evaluation store
Do not create this class directly, instead use
dataikuapi.dss.DSSModelEvaluationStore.get_settings()
-
get_raw
()¶
-
save
()¶
-
-
class
dataikuapi.dss.modelevaluationstore.
DSSModelEvaluation
(model_evaluation_store, evaluation_id)¶ A handle on a model evaluation
Do not create this class directly, instead use
dataikuapi.dss.DSSModelEvaluationStore.get_model_evaluation()
-
get_full_info
()¶ Retrieve the model evaluation with its performance data
- Returns
the model evaluation full info, as a
dataikuapi.dss.DSSModelEvaluationInfo
-
get_full_id
()¶
-
delete
()¶ Remove this model evaluation
-
property
full_id
¶
-
compute_data_drift
(reference=None, data_drift_params=None, wait=True)¶ Compute data drift against a reference model or model evaluation. The reference is determined automatically unless specified.
- Parameters
reference (Union[str, DSSModelEvaluation, DSSTrainedPredictionModelDetails]) – saved model version (full ID or DSSTrainedPredictionModelDetails) or model evaluation (full ID or DSSModelEvaluation) to use as reference (optional)
data_drift_params (DataDriftParams) – data drift computation settings as a
dataikuapi.dss.modelevaluationstore.DataDriftParams
(optional)wait – data drift computation settings (optional)
- Returns
a
dataikuapi.dss.modelevaluationstore.DataDriftResult
containing data drift analysis results if wait is True, or aDSSFuture
handle otherwise
-
get_metrics
()¶ Get the metrics for this model evaluation. Metrics must be understood here as Metrics in DSS Metrics & Checks
- Returns
the metrics, as a JSON object
-
get_sample_df
()¶ Get the sample of the evaluation dataset on which the evaluation was performed
- Returns
the sample content, as a
pandas.DataFrame
-
-
class
dataikuapi.dss.modelevaluationstore.
DSSModelEvaluationFullInfo
(model_evaluation, full_info)¶ A handle on the full information on a model evaluation.
Includes information such as the full id of the evaluated model, the evaluation params, the performance and drift metrics, if any, etc.
Do not create this class directly, instead use
dataikuapi.dss.DSSModelEvaluation.get_full_info()
-
metrics
¶ The performance and data drift metric, if any.
-
creation_date
¶ The date and time of the creation of the model evaluation, as an epoch.
-
user_meta
¶ The user-accessible metadata (name, labels) Returns the original object, not a copy. Changes to the returned object are persisted to DSS by calling
save_user_meta()
.
-
get_raw
()¶
-
save_user_meta
()¶
-
-
class
dataikuapi.dss.modelevaluationstore.
DataDriftParams
(data)¶ Object that represents parameters for data drift computation. Do not create this object directly, use
dataikuapi.dss.modelevaluationstore.DataDriftParams.from_params()
instead.-
static
from_params
(per_column_settings, nb_bins=10, compute_histograms=True, confidence_level=0.95)¶ Creates parameters for data drift computation from columns, number of bins, compute histograms and confidence level
- Parameters
per_column_settings (dict) – A dict representing the per column settings.
You should use a
PerColumnDriftParamBuilder
to build it. :param int nb_bins: (optional) Nb. bins in histograms (apply to all columns) - default: 10 :param bool compute_histograms: (optional) Enable/disable histograms - default: True :param float confidence_level: (optional) Used to compute confidence interval on drift’s model accuracy - default: 0.95
-
static
-
class
dataikuapi.dss.modelevaluationstore.
PerColumnDriftParamBuilder
¶ Builder for a map of per column drift params settings. Used as a helper before computing data drift to build columns param expected in
dataikuapi.dss.modelevaluationstore.DataDriftParams.from_params()
.-
build
()¶ Returns the built dict for per column drift params settings
-
with_column_drift_param
(name, handling='AUTO', enabled=True)¶ Sets the drift params settings for given column name.
- Param
string name: The name of the column
- Param
string handling: (optional) The column type, should be either NUMERICAL, CATEGORICAL or AUTO (default: AUTO)
- Param
bool enabled: (optional) False means the column is ignored in drift computation (default: True)
-
-
class
dataikuapi.dss.modelevaluationstore.
DataDriftResult
(data)¶ A handle on the data drift result of a model evaluation.
Do not create this class directly, instead use
dataikuapi.dss.DSSModelEvaluation.compute_data_drift()
-
drift_model_result
¶ Drift analysis based on drift modeling.
-
univariate_drift_result
¶ Per-column drift analysis based on pairwise comparison of distributions.
-
per_column_settings
¶ Information about column handling that has been used (errors, types, etc).
-
get_raw
()¶ - Returns
the raw data drift result
- Return type
dict
-
-
class
dataikuapi.dss.modelevaluationstore.
DriftModelResult
(data)¶ A handle on the drift model result.
Do not create this class directly, instead use
dataikuapi.dss.modelevaluationstore.DataDriftResult.drift_model_result
-
get_raw
()¶ - Returns
the raw drift model result
- Return type
dict
-
-
class
dataikuapi.dss.modelevaluationstore.
UnivariateDriftResult
(data)¶ A handle on the univariate data drift.
Do not create this class directly, instead use
dataikuapi.dss.modelevaluationstore.DataDriftResult.univariate_drift_result
-
per_column_drift_data
¶ Drift data per column, as a dict of column name -> drift data.
-
get_raw
()¶ - Returns
the raw univariate data drift
- Return type
dict
-
-
class
dataikuapi.dss.modelevaluationstore.
ColumnSettings
(data)¶ A handle on column handling information.
Do not create this class directly, instead use
dataikuapi.dss.modelevaluationstore.DataDriftResult.get_per_column_settings()
-
actual_column_handling
¶ The actual column handling (either forced via drift params or inferred from model evaluation preprocessings). It can be any of NUMERICAL, CATEGORICAL, or IGNORED.
-
default_column_handling
¶ The default column handling (based on model evaluation preprocessing only). It can be any of NUMERICAL, CATEGORICAL, or IGNORED.
-
get_raw
()¶ - Returns
the raw column handling information
- Return type
dict
-