TL;DR: You can detect and explore changes in the target function (prediction) and detect distribution drift.
- Report: for visual analysis or metrics export, use the
- Test Suite: for pipeline checks, use a
TestColumnDrifttest and apply it to the prediction or target column. Since it is a single test, there is no need for a Preset.
You can analyze target or prediction drift:
1. To monitor the model performance without ground truth. When you do not have true labels or actuals, you can monitor Prediction Drift to react to meaningful changes. For example, to detect when there is a distribution shift in predicted values, probabilities, or classes. You can often combine it with the Data Drift analysis.
2. When you are debugging the model decay. If you observe a drop in performance, you can evaluate Target Drift to see how the behavior of the target changed and explore the shift in the relationship between the features and prediction (target).
3. Before model retraining. Before feeding fresh data into the model, you might want to verify whether it even makes sense. If there is no target drift and no data drift, the retraining might not be necessary.
To run drift checks as part of the pipeline, use the Test Suite. To explore and debug, use the Report.
If you want to visually explore the prediction or target drift, you can create a new Report object and use the
Aggregated visuals in plots. Starting from v 0.3.2, all visuals in the Evidently Reports are aggregated by default. This helps decrease the load time and report size for larger datasets. If you work with smaller datasets or samples, you can pass an option to generate plots with raw data. You can choose whether you want it on not based on the size of your dataset.
num_target_drift_report = Report(metrics=[
TargetDriftPresethelps detect and explore changes in the target function and/or model predictions:
- Performs a suitable statistical test to compare target (prediction) distribution.
- For numerical targets, calculates the correlations between the feature and the target (prediction)
- Plots the relations between each individual feature and the target (prediction)
You can generate this preset both for numerical targets (e.g. if you have a regression problem) or categorical targets (e.g. if you have a classification problem). You can explicitly specify the type of the target column in column mapping. If it is not specified, Evidently will define the column type automatically.
- You will need two datasets. The reference dataset serves as a benchmark. Evidently analyzes the change by comparing the current production data to the reference data.
- To run this preset, you need to have target and/or prediction columns available. Input features are optional. Pass them if you want to analyze the correlations between the features and target (prediction). Evidently estimates the drift for the target and predictions in the same manner. If you pass both columns, Evidently will generate two sets of plots. If you pass only one of them (either target or predictions), Evidently will build one set of plots.
- Column mapping. Evidently can evaluate drift both for numerical and categorical targets. You can explicitly specify the type of target using the task parameter in column mapping. If it is not specified, Evidently will try to identify the target type automatically. It is recommended to use column mapping to avoid errors.
The report includes 4 components. All plots are interactive.
The report first shows the comparison of target (prediction) distributions in the current and reference datasets. You can see the result of the statistical test or the value of a distance metric.
Evidently uses the default data drift detection algorithm to select the drift detection method based on target type and the number of observations in the reference dataset.
You can modify the drift detection logic by selecting a different method already available in the library, including PSI, K–L divergence, Jensen-Shannon distance, Wasserstein distance, and/or by setting a different threshold. See more details about setting data drift parameters. You can also implement a custom drift detection method.
For numerical targets, the report calculates the Pearson correlation between the target (prediction) and each individual feature in the two datasets to detect a change in the relationship.
The report shows the correlations between individual features and the target (prediction) in the current and reference dataset. It helps detects shifts in the relationship.
For numerical targets, the report visualizes the target (prediction) values by index or time (if the
datetimecolumn is available or defined in the
column_mappingdictionary). This plot helps explore the target behavior and compare it between the datasets.
Finally, it generates an interactive table with the visualizations of dependencies between the target and each feature.
If you click on any feature in the table, you get an overview of its behavior. The plot shows how feature values relate to the target (prediction) values and if there are differences between the datasets. It helps explore if they can explain the target (prediction) shift.
For numerical targets:
We recommend paying attention to the behavior of the most important features since significant changes might confuse the model and cause higher errors. For example, in a Boston house pricing dataset, we can see a new segment with values of TAX above 600 but the low value of the target (house price).
For categorical targets:
You can get the report output as a JSON or a Python dictionary:
- You can create a different report or test suite from scratch taking this one as an inspiration.