Feature importance in data drift
How to show feature importance in Data Drift evaluations.
Last updated
How to show feature importance in Data Drift evaluations.
Last updated
You can add feature importances to the dataset-level data drift Tests and Metrics:
DataDriftTable
TestShareOfDriftedColumns
Notebook example on showing feature importance:
By default, the feature importance column is not shown. To display them, you must set the feature_importance
parameter as True
.
If you do not specify anything else, Evidently will train a random forest model using the provided dataset and derive the feature importances.
Notes:
This is only possible if your dataset contains the target
column.
If you have both current
and reference
datasets, two different models will be trained. You will have two columns with feature importance: one for reference
and one for current
data.
If your dataset also contains the prediction
column, you should clearly label it using Column Mapping to avoid it being treated as a feature.
You can also pass the list of feature importances derived during the model training process. This is a recommended option.
In this case, pass it as a list using the additional_data
parameter when running the Report.
You can pass the current_feature_importance
– a single column will appear in this case. You can also optionally pass reference_feature_importance
.