Jupyter notebooks
How to use Evidently from Jupyter notebook
Take the following steps to create and display a Dashboard in Jupyter notebook, export the report as an HTML file, or generate a JSON Profile.
If you want to display the dashboards in Jupyter notebook, make sure you installed the Jupyter nbextension.

1. Prepare your data as pandas DataFrames

To analyze data or target drift, you always need two datasets. For the model performance reports, the second dataset is optional.
    The first dataset is reference. This can be training or earlier production data.
    The second dataset is current. It should include the recent production data.
You can prepare the datasets as two pandas DataFrames. The structure of both datasets should be identical. Performance or drift will be evaluated by comparing the current data to the reference data.
You can also prepare a single pandas DataFrame to generate a comparative report. When calling the dashboard, you should specify the rows that belong to the reference and production dataset accordingly.
Model Performance reports can be generated for a single dataset, with no comparison performed. In this case, you can simply pass a single DataFrame.

Dataset structure

The data structure is different depending on the report type.
    For the Data Drift report, include the input features only.
    For the Target Drift reports, include the input features and the column with the Target and/or the Prediction.
    For the Model Performance reports, include the input features, the column with the Target, and the column with the Prediction.
If you include more columns than needed for a given report, they will be ignored.
Below is a summary of the data requirements:
Report Type
Feature columns
Target column
Prediction column
Works with a single dataset
Required
No
No
No
Required
Target and/or Prediction required
Target and/or Prediction required
No
Required
Target and/or Prediction required
Target and/or Prediction required
No
Required
Required
Required
Yes
Required
Required
Required
Yes
Required
Required
Required
Yes

DataFrame requirements

Make sure the data complies with the following expectations.
1) All column names are string
2) All feature columns that are analyzed for drift have the numerical type (np.number)
    All non-numerical columns will be ignored. Categorical data can be encoded as numerical labels and specified in the column mapping
    The datetime column is the only exception. If available, it will be used as the x-axis in the data plots.

2. Pass the column_mapping into Dashboard

If the column_mapping is not specified or set as None, we use the default mapping strategy:
    All features will be treated as numerical.
    The column with 'id' name will be treated as an ID column.
    The column with 'datetime' name will be treated as a datetime column.
    The column with 'target' name will be treated as a target function.
    The column with 'prediction' name will be treated as a model prediction.
ID, datetime, target, and prediction are utility columns. Requirements are different depending on the report type:
    For the Data Drift report, these columns are not required. If you specify id, target, and prediction, they will be excluded from the data drift report. If you specify the datetime, it will be used in data plots.
    For the Target Drift reports, we expect either the target or the prediction column or both. ID and datetime are optional.
    For Model Performance reports, both the target and the prediction column are required. ID and datetime are optional.
You can create a column_mapping to specify if your dataset includes the utility columns, and split the features into numerical and categorical types.
Column_mapping is a python dictionary with the following format:
1
column_mapping = {}
2
3
column_mapping['target'] = 'y' #'y' is the name of the column with the target function
4
column_mapping['prediction'] = 'pred' #'pred' is the name of the column(s) with model predictions
5
column_mapping['id'] = None #there is no ID column in the dataset
6
column_mapping['datetime'] = 'date' #'date' is the name of the column with datetime
7
8
column_mapping['numerical_features'] = ['temp', 'atemp', 'humidity'] #list of numerical features
9
column_mapping['categorical_features'] = ['season', 'holiday'] #list of categorical features
Copied!

NOTE: Categorical features in Data Drift

Though the data drift tool works only with numerical data, you can also estimate drift for categorical features. To do that, you should encode the categorical data with numerical labels. You can use other strategies to represent categorical data as numerical, for instance, OneHotEncoding.
Then you should create column_mapping dict and list all encoded categorical features in the categorical_feature section, like:
1
column_mapping['categorical_features'] = ['encoded_cat_feature_1',
2
'encoded_cat_feature_2']
Copied!
Categorical features will be then treated accordingly. The data drift report will use the chi-squared test.
NOTE: Column names in Probabilistic Classification
The tool expects your DataFrame(s) to contain columns with the names matching the ones from the ‘prediction’ list. Each column should include information about the predicted probability [0;1] for the corresponding class.
column_mapping['prediction'] = [‘class_name1’, ‘class_name2’, ‘class_name3’,… etc]
NOTE: Column order in Binary Classification
For binary classification, class order matters. The tool expects that the target (so-called positive) class is the first in the column_mapping['prediction'] list.

If you are unsure how to use column mapping, watch this video tutorial:

3. Generate the report

You can choose one or several of the following Tabs.
    DataDriftTab to estimate the data drift
    NumTargetDriftTab to estimate target drift for the numerical target (for problem statements with the numerical target function: regression, probabilistic classification or ranking, etc.)
    CatTargetDriftTab to estimate target drift for the categorical target (for problem statements with the categorical target function: binary classification, multi-class classification, etc.)
    RegressionPerformanceTab to explore the performance of a regression model.
    ClassificationPerformanceTab to explore the performance of a classification model
    ProbClassificationPerformanceTab to explore the performance of a probabilistic classification model and the quality of the model calibration
You can generate the report without specifying the column_mapping:
1
drift_dashboard = Dashboard(tabs=[DataDriftTab])
2
drift_dashboard.calculate(reference_data, recent_data)
Copied!
And with the column_mapping specification:
1
drift_dashboard_with_mapping = Dashboard(tabs=[DataDriftTab])
2
drift_dashboard_with_mapping.calculate(reference_data, recent_data,
3
column_mapping=column_mapping)
Copied!

4. Explore the dashboard in the Jupyter notebook

You can display the chosen Tabs in a single Dashboard directly in the notebook.
1
drift_dashboard.show()
Copied!
If the report is not displayed, this might be due to the dataset size. The dashboard contains the data necessary to generate interactive plots and can become large. The limitation depends on infrastructure. In this case, we suggest applying sampling to your dataset. In Jupyter notebook, that can be done directly with pandas. You can also generate JSON instead 👇🏼

5. Export the report as an HTML file

You can save the report as an HTML file, and open it in your browser.
1
drift_dashboard.save("reports/my_report.html")
Copied!
If you get a security alert, press "trust HTML".
You will need to specify the path where to save your report and the report name. The report will not open automatically. To explore it, you should open it from the destination folder.

6. Create a JSON profile

Alternatively, you can generate and view the output as a JSON profile.
1
data_drift_profile = Profile(sections=[DataDriftProfileSection])
2
data_drift_profile.calculate(reference_data, recent_data,
3
column_mapping=column_mapping)
4
data_drift_profile.json()
Copied!
For each profile, you should specify sections to include. They work just like Tabs. You can choose among:
    DataDriftProfileSection to estimate the data drift,
    NumTargetDriftProfileSection to estimate target drift for numerical target,
    CatTargetDriftProfileSectionto estimate target drift for categorical target,
    ClassificationPerformanceProfileSection to explore the performance of a classification model,
    ProbClassificationPerformanceProfileSection to explore the performance of a probabilistic classification model,
    RegressionPerformanceProfileSection to explore the performance of a regression model.
Last modified 2mo ago