Quick start

You can generate the reports and profiles using Jupyter notebook or terminal. Both options are described below.

If you prefer a video version, here is a 6-min Quick Start on how to generate Data and Target Drift reports in Jupyter notebook.

Prepare the data

To generate the reports using the Jupyter notebook, prepare the data as pandas DataFrames. To use the terminal, prepare it as csvfiles.

You can prepare your data as two pandas DataFrames or csv files. The first should include your reference data, the second—current production data. The structure of both datasets should be identical.

You can also generate comparative reports from a single DataFrameor csv file. In this case, you will need to identify rows that relate to reference and production data accordingly.

Model Performance reports can be generated for a single dataset, with no comparison performed. In this case, you can simply pass a single DataFrameor csv file.

The data structure is different depending on the report type.

  • For the Data Drift report, include the input features only.

  • For the Target Drift reports, include the input features and the column with the Target and/or the Prediction.

  • For the Model Performance reports, include the input features, the column with the Target, and the column with the Prediction.

If you include more column types that are expected for a given report, they will be ignored.

Below is a summary of the data requirements.

Report Type

Feature columns

Target column

Prediction column

Works with a single dataset

Data Drift

Required

No

No

No

Numerical Target Drift

Required

Target and/or Prediction required

Target and/or Prediction required

No

Categorical Target Drift

Required

Target and/or Prediction required

Target and/or Prediction required

No

Regression Performance

Required

Required

Required

Yes

Classification Performance

Required

Required

Required

Yes

Probabilistic Classification Performance

Required

Required

Required

Yes

Decide on the output format

Calculation results can be available in one of the following formats:

  • An interactive visual Dashboard displayed inside the Jupyter notebook.

  • An exportable HTML report. The same as dashboard, but standalone.

  • A JSON profile that includes a summary of metrics and statistical test results.

Dashboards are best for ad-hoc analysis, performance debugging, and team sharing.

Profiles are best for integration into prediction pipelines or with external visualization tools.

You can proceed to work with Jupyter notebook or generate JSON profiles and HTML reports via Terminal.

Output format

Jupyter notebook

Terminal

Dashboard

+

-

HTML report

+

+

JSON profile

+

+

All options are described below.

Jupyter notebook

Generating dashboards and HTML reports

After installing the tool, import Evidently and the required tabs:

import pandas as pd
from sklearn import datasets
from evidently.dashboard import Dashboard
from evidently.tabs import DataDriftTab

Create a Pandas DataFrame with the dataset to analyze:

iris = datasets.load_iris()
iris_frame = pd.DataFrame(iris.data, columns = iris.feature_names)

Dashboard generates an interactive report that includes the selected Tabs.

You can choose the following Tabs:

  • DataDriftTab to estimate the data drift

  • NumTargetDriftTab to estimate target drift for the numerical target

  • CatTargetDriftTab to estimate target drift for the categorical target

  • RegressionPerfomanceTab to explore the performance of a regression model

  • ClassificationPerfomanceTab to explore the performance of a classification model

  • ProbClassificationPerformanceTab to explore the performance of a probabilistic classification model

To generate the Data Drift report and save it as HTML, run:

iris_data_drift_report = Dashboard(tabs=[DataDriftTab])
iris_data_drift_report.calculate(iris_frame[:75], iris_frame[75:],
column_mapping = None)
iris_data_drift_report.save("reports/my_report.html")

To generate the Data Drift and the Categorical Target Drift reports, first add a target column to the initial dataset:

iris_frame['target'] = iris.target

Then run:

iris_data_and_target_drift_report = Dashboard(tabs=[DataDriftTab, CatTargetDriftTab])
iris_data_and_target_drift_report.calculate(iris_frame[:75], iris_frame[:75],
column_mapping=None)
iris_data_drift_report.save("reports/my_report_with_2_tabs.html")

If you get a security alert, press "trust html". The HTML report does not open automatically. To explore it, you should open it from the destination folder.

To generate the Regression Model Performance report, run:

regression_model_performance = Dashboard(tabs=[RegressionPerfomanceTab])
regression_model_performance.calculate(reference_data, current_data,
column_mapping=column_mapping)
regression_model_performance.show()

To generate the Regression Model Performance report for a singleDataFrame , run:

regression_single_model_performance = Dashboard(tabs=[RegressionPerfomanceTab])
regression_single_model_performance.calculate(reference_data, None,
column_mapping=column_mapping)
regression_single_model_performance.show()

To generate the Classification Model Performance report, run:

classification_performance_report = Dashboard(tabs=[ClassificationPerformanceTab])
classification_performance_report.calculate(reference_data, current_data,
column_mapping=column_mapping)
classification_performance_report.show()

For Probabilistic Classification Model Performance report, run:

classification_performance_report = Dashboard(tabs=[ProbClassificationPerformanceTab])
classification_performance_report.calculate(reference_data, current_data,
column_mapping=column_mapping)
classification_performance_report.show()

You can also generate either of the Classification reports for a single DataFrame. In this case, run:

classification_single_model_performance = Dashboard(tabs=[ClassificationPerformanceTab])
classification_single_model_performance.calculate(reference_data, None,
column_mapping=column_mapping)
classification_single_model_performance.show()

or

prob_classification_single_model_performance = Dashboard(tabs=[ProbClassificationPerformanceTab])
prob_classification_single_model_performance.calculate(reference_data, None,
column_mapping=column_mapping)
prob_classification_single_model_performance.show()

Generating JSON profiles

After installing the tool, import Evidently profile and the required sections:

import pandas as pd
from sklearn import datasets
from evidently.model_profile import Profile
from evidently.profile_sections import DataDriftProfileSection
iris = datasets.load_iris()
iris_frame = pd.DataFrame(iris.data, columns = iris.feature_names)

To generate the Data Drift profile, run:

iris_data_drift_profile = Profile(sections=[DataDriftProfileSection])
iris_data_drift_profile.calculate(iris_frame, iris_frame, column_mapping=None)
iris_data_drift_profile.json()

To generate the Data Drift and the Categorical Target Drift profile, run:

iris_target_and_data_drift_profile = Profile(sections=[DataDriftProfileSection, CatTargetDriftProfileSection])
iris_target_and_data_drift_profile.calculate(iris_frame[:75], iris_frame[75:], column_mapping=None)
iris_target_and_data_drift_profile.json()

You can also generate a Regression Model Performance for a single DataFrame. In this case, run:

regression_single_model_performance = Profile(sections=[RegressionPerformanceProfileSection])
regression_single_model_performance.calculate(reference_data, None, column_mapping=column_mapping)
regression_single_model_performance.json()

To generate the Classification Model Performance profile, run:

classification_performance_profile = Profile(sections=[ClassificationPerformanceProfileSection])
classification_performance_profile.calculate(reference_data, current_data, column_mapping=column_mapping)
classification_performance_profile.json()

For Probabilistic Classification Model Performance profile, run:

classification_performance_report = Profile(sections=[ProbClassificationPerformanceProfileSection])
classification_performance_report.calculate(reference_data, current_data, column_mapping=column_mapping)
classification_performance_report.json()

You can also generate either of the Classification profiles for a single DataFrame. In this case, run:

classification_single_model_performance = Profile(sections=[ClassificationPerformanceProfileSection])
classification_single_model_performance.calculate(reference_data, None, column_mapping=column_mapping)
classification_single_model_performance.json()

or

prob_classification_single_model_performance = Profile(sections=[ProbClassificationPerformanceProfileSection])
prob_classification_single_model_performance.calculate(reference_data, None, column_mapping=column_mapping)
prob_classification_single_model_performance.json()

Terminal

To generate the HTML report, run the following command in bash:

$ python -m evidently calculate dashboard --config config.json
--reference reference.csv --current current.csv --output output_folder --report_name output_file_name

To generate a JSON profile, run the following command in bash:

$ python -m evidently calculate profile --config config.json
--reference reference.csv --current current.csv --output output_folder --report_name output_file_name

Here:

  • reference is the path to the reference data,

  • current is the path to the current data,

  • output is the path to the output folder,

  • config is the path to the configuration file,

  • pretty_print to print the JSON profile with indents (for profile only).

You can choose the following Tabs:

  • data_drift to estimate the data drift,

  • num_target_drift to estimate target drift for the numerical target

  • cat_target_drift to estimate target drift for the categorical target

  • regression_performance to explore the performance of a regression model

  • classification_performance to explore the performance of a classification model

  • prob_classification_performance to explore the performance of a probabilistic classification model

To configure the report you need to create the config.json file. This file configures the way of reading your input data and the type of the report.

Configuration examples

Here is an example of a simple configuration, where we have comma-separated csv files with headers and there is no date column in the data.

Dashboard:

{
"data_format":{
"separator":",",
"header":true,
"date_column":null
},
"column_mapping":{},
"dashboard_tabs":["cat_target_drift"]
}

Profile:

{
"data_format":{
"separator":",",
"header":true,
"date_column":null
},
"column_mapping":{},
"profile_sections":["data_drift"],
"pretty_print":true
}

Here is an example for a more complicated configuration, where we have comma-separated csv files with headers and datetime column. We also specified the column_mapping dictionary, where we added information about the datetime, target and numerical_features.

Dashboard:

{
"data_format":{
"separator":",",
"header":true,
"date_column":"datetime"
},
"column_mapping":{
"datetime":"datetime",
"target":"target",
"numerical_features":["mean radius", "mean texture", "mean perimeter",
"mean area", "mean smoothness", "mean compactness", "mean concavity",
"mean concave points", "mean symmetry"]},
"dashboard_tabs":["cat_target_drift"]
}

Profile:

{
"data_format":{
"separator":",",
"header":true,
"date_column":null
},
"column_mapping":{
"target":"target",
"numerical_features":["mean radius", "mean texture", "mean perimeter",
"mean area", "mean smoothness", "mean compactness", "mean concavity",
"mean concave points", "mean symmetry"]},
"profile_sections":["data_drift", "cat_target_drift"],
"pretty_print":true
}