Input Data Drift

Input Data Drift analyses the distribution of features in the evaluated data. If the distribution of features changes significantly, this likely indicates that the underlying data has significantly changed, which could signal a concept drift.

Having ground truth / labels is not required for Input Data Drift.

The Input Data Drift tab is made of three parts.

  • Global drift score

  • Univariate Data Drift

  • Feature drift importance

Global drift score features the same drift model used to compute the “data drift” metric displayed in the “Evaluations” tab of an Evaluation store. In addition to the accuracy of the drift model are also available:

  • a lower and upper bound

  • a binomial test on drift detection

../../_images/drift-model.png

The drift model is trained on the concatenation of the samples from the related model version training and from the evaluated dataset. Those samples may be truncated to match the size of the other sample to obtain 50% of data from each dataset. The drift model predicts whether a row belongs to one or another sample. The higher the accuracy is, the better the drift model can recognize where a row comes from, and so the more likely has the data.

Univariate data drift performs this operation per feature. In this section, you can analyze which feature has drifted and whether it should be handled as numerical or categorical.

../../_images/univariate-data-drift.png

Lastly, the Feature drift importance scatter plot shows feature importance for the original model versus feature importance for the (data classifying) drift model.

../../_images/feature-drift-importance.png