You can modify the drift detection logic by selecting a statistical test already available in the library, including PSI, K–L divergence, Jensen-Shannon distance, Wasserstein distance. See more details about available tests. You can also set a different confidence level or implement a custom test, by defining custom options.
To build up a better intuition for which tests are better in different kinds of use cases, visit our blog to read our in-depth guide to the tradeoffs when choosing the statistical test for data drift.
How it looks
The default report includes 4 components. All plots are interactive.
1. Data Drift Summary
The report returns the share of drifting features and an aggregate Dataset Drift result. For example:
Dataset Drift sets a rule on top of the results of the statistical tests for individual features. By default, Dataset Drift is detected if at least 50% of features drift at a 0.95 confidence level.
To set different Dataset Drift conditions, you can define custom options.
2. Data Drift Table
The table shows the drifting features first, sorting them by P-value. You can also choose to sort the rows by the feature name or type.
3. Data Drift by Feature
By clicking on each feature, you can explore the values mapped in a plot.
The dark green line is the mean, as seen in the reference dataset.
The green area covers one standard deviation from the mean.
4. Data Distribution by Feature
You can also zoom on distributions to understand what has changed.
You can set different Options for Data / Target drift to modify the existing components of the report. Use this to change the statistical tests used, define Dataset Drift conditions, or change histogram Bins.
In production: as early monitoring of model quality. In absence of ground truth labels, you can monitor for changes in the input data. Use it e.g. to decide when to retrain the model, apply business logic on top of the model output, or whether to act on predictions. You can combine it with monitoring model outputs using the Numerical or Categorical Target Drift report.
In production: to debug the model decay. Use the tool to explore how the input data has changed.
In A/B test or trial use. Detect training-serving skew and get the context to interpret test results.
Before deployment. Understand drift in the offline environment. Explore past shifts in the data to define retraining needs and monitoring strategies. Here is a blog about it.
To find useful features when building a model. You can also use the tool to compare feature distributions in different classes to surface the best discriminants.
If you choose to generate a JSON profile, it will contain the following information:
Data Drift Dashboard Examples
Browse our example notebooks to see sample Reports.