Evaluating DSS models¶
To evaluate a DSS model, you must create an Evaluation recipe.
An Evaluation recipe takes as inputs:
an evaluation dataset
An Evaluation Recipe can have up to three outputs:
an Evaluation Store, containing the main Model Evaluation and all associated result screens
an output dataset, containing the input features, prediction and correctness of prediction for each record
a metrics dataset, containing just the performance metrics for this evaluation (i.e. it’s a subset of the Evaluation Store)
Any combination of those three outputs is valid.
Each time the evaluation recipe runs, a new Model Evaluation is added into the Evaluation Store.
In this screen, you can:
Select the model version to evaluate. Default is the active version.
Set the human readable name of the evaluation. Defaults to date and time.
(Advanced) Override the unique identifier of the new Model Evaluation in the output Evaluation Store. Should only be used if you want to overwrite an evaluation. Default is a random string.
Other items in the Output block relate to the output dataset. The Metrics block controls what is output in the metrics dataset.
Both the Evaluation Recipe and Standalone Evaluation Recipe allow the definition of labels that will be added to the computed Model Evaluation. Those labels may be useful to implement your own semantics. See Analyzing evaluation results for additional information.
A Model Evaluation (ME) uses a sample of at most 20000 lines. By default, the 20000 first lines of the evaluation dataset will be used. You can adjust the sampling method, but the sample used by the ME will always have a maximal length of 20000 lines.
The model must be a non-partitioned classification or regression model
Non-tabular MLflow inputs are not supported
Computer vision models are not supported
Deep Learning models are not supported
Time series forecasting models are not supported