Feature importance in data drift
How to show feature importance in Data Drift evaluations.
You can add feature importances to the dataset-level data drift Tests and Metrics:
DataDriftTable
TestShareOfDriftedColumns
Code example
Notebook example on showing feature importance:
Compute feature importances
By default, the feature importance column is not shown. To display them, you must set the feature_importance
parameter as True
.
If you do not specify anything else, Evidently will train a random forest model using the provided dataset and derive the feature importances.
Notes:
This is only possible if your dataset contains the
target
column.If you have both
current
andreference
datasets, two different models will be trained. You will have two columns with feature importance: one forreference
and one forcurrent
data.If your dataset also contains the
prediction
column, you should clearly label it using Column Mapping to avoid it being treated as a feature.
Pass your own importances
You can also pass the list of feature importances derived during the model training process. This is a recommended option.
In this case, pass it as a list using the additional_data
parameter when running the Report.
You can pass the current_feature_importance
– a single column will appear in this case. You can also optionally pass reference_feature_importance
.
Last updated