Tutorial - Reports and Tests
Last updated
Last updated
In this tutorial, you will use the Evidently open-source Python library to evaluate data stability and data drift on tabular data. You will run batch checks on a toy dataset and generate visual Reports and Test Suites in your Python environment.
We recommend going through this tutorial once to understand the basic functionality. Once you complete it, you will be ready to use all Evidently evaluations, including checks for ML model quality or text data.
You can later run the Evidently Reports and Test Suites independently or use them as a logging layer for Evidently ML Monitoring. You can choose between self-hosting ML monitoring dashboard or sending the Reports and Test Suite to Evidently Cloud platform to monitor metrics over time.
To complete the tutorial, you need basic knowledge of Python. You should be able to complete it in about 15 minutes.
You can reproduce the steps in Jupyter notebooks or Colab or open and run a sample notebook from the links below.
Colab:
Jupyter notebook:
Video version:
You will go through the following steps:
Install Evidently
Prepare the input data
Generate a pre-built data drift report
Customize the report
Run and customize data stability tests
If you're having problems or getting stuck, reach out on Discord.
To install Evidently using the pip package manager, run:
If you are using Google Colab, Kaggle Kernel, Deepnote or Databricks notebooks, run the following command in the notebook cell:
To install Evidently in Jupyter notebook on Windows, run:
After installing the tool, import evidently
and the required components. In this tutorial, you will use several test suites and reports. Each corresponds to a specific type of analysis.
You should also import pandas
, numpy
, and the toy california_housing
dataset.
In the example, you will work with a toy dataset. In practice, you should use the model prediction logs. They can include input data, model predictions, and true labels or actuals, if available.
To prepare the data for analysis, create a pandas.DataFrame
:
Rename one of the columns to “target” and create a “prediction” column. This way, the dataset will resemble the model application logs with known labels.
Split the dataset by taking 5000 objects for reference and current datasets.
The first reference dataset is the baseline. This is often the data used in model training or earlier production data. The second dataset is the current production data. Evidently will compare the current data to the reference.
If you work with your own data, you can prepare two datasets with an identical schema. You can also take a single dataset and explicitly identify rows for reference and current data.
Column mapping. In this example, we directly proceed to analysis. In other cases, you might need a ColumnMapping
object to help Evidently process the input data correctly. For example, you can point to the encoded categorical features or specify the name of the target column. Consult the Column Mapping section section for help.
Evidently Reports help explore and debug data and model quality. They calculate various metrics and generate a dashboard with rich visuals.
To start, you can use Metric Presets. These are pre-built Reports that group relevant metrics to evaluate a specific aspect of the model performance.
Let’s start with the Data Drift. This Preset compares the distributions of the model features and show which have drifted. When you do not have ground truth labels or actuals, evaluating input data drift can help understand if an ML model still operates in a familiar environment.
To get the Report, create a corresponding Report
object, list the preset
to include, and point to the reference and current datasets created in the previous step:
It will display the HTML report directly in the notebook.
First, you can see the Data Drift summary.
If you click on individual features, it will show additional plots to explore.
Aggregated visuals in plots. Starting from v 0.3.2, all visuals in the Evidently Reports are aggregated by default. This helps decrease the load time and report size for larger datasets. If you work with smaller datasets or samples, you can pass an option to generate plots with raw data. You can choose whether you want it on not based on the size of your dataset.
Evidently Reports are very configurable. You can define which Metrics to include and how to calculate them.
To create a custom Report, you need to list individual Metrics. Evidently has dozens of Metrics that evaluate anything from descriptive feature statistics to model quality. You can calculate Metrics on the column level (e.g., mean value of a specific column) or dataset-level (e.g., share of drifted features in the dataset).
In this example, you can list several Metrics that evaluate individual statistics for the defined column.
You will see a combined report that includes multiple Metrics:
If you want to generate multiple column-level Metrics, there is a helper function. For example, in order to calculate the same quantile value for all the columns in the list, you can use the generator:
You can easily combine individual Metrics, Presets and metric generators in a single list:
Available Metrics and Presets. You can refer to the All Metrics reference table to browse available Metrics and Presets or use one of the example notebooks in the Examples section.
You can render the visualizations in the notebook as shown above. There are also alternative options.
If you only want to log the metrics and test results, you can get the output as a Python dictionary.
You can also get the output as JSON.
You can also save HTML or JSON externally and specify a path and file name:
You can also save the output as an Evidently JSON snapshot
. This will allow you to visualize the model or data quality over time using the Evidently ML monitoring dashboard.
Building a live ML monitoring dashboard. To better understand how the ML monitoring dashboard works, we recommend going through the ML Monitoring Quickstart after completing this tutorial.
Reports help visually explore the data or model quality or share results with the team. However, it is less convenient if you want to run your checks automatically and only react to meaningful issues.
To integrate Evidently checks in the prediction pipeline, you can use the Test Suites functionality. They are also better suited to handle large datasets.
Test Suites help compare the two datasets in a structured way. A Test Suite contains several individual tests. Each Test compares a specific value against a defined condition and returns an explicit pass/fail result. You can apply Tests to the whole dataset or individual columns.
Just like with Reports, you can create a custom Test Suite or use one of the Presets.
Let's create a custom one! Imagine you received a new batch of data. Before generating the predictions, you want to check if the quality is good enough to run your model. You can combine several Tests to check missing values, duplicate columns, and so on.
You need to create a TestSuite
object and specify the preset
:
You will get a summary with the test results:
You can also use Test Presets. For example, NoTargetPerformance
preset combines multiple checks related to data stability, drift and data quality to help evaluate the model without ground truth labels.
You can group the outputs by test status, feature, test group, and type. By clicking on “details,” you will see related plots or tables.
If some of the Tests fail, you can use the supporting visuals for debugging:
Just like with Reports, you can combine individual Tests and Presets in a single Test Suite and use column test generator to generate multiple column-level tests:
Available Tests and Presets. You can refer to the All Tests reference table to browse available Tests and Presets. To see interactive examples, refer to the notebooks in the examples section.
You can also export the output in other formats.
To integrate Evidently checks in the prediction pipeline, you can get the output as JSON or a Python dictionary:
You can extract necessary information from the JSON or Python dictionary output and design a conditional workflow around it. For example, if tests fail, you can trigger an alert, retrain the model or generate the Report.
You can also save the output as an Evidently JSON snapshot
. This will allow you to visualize the test results over time the Evidently ML monitoring dashboard.
Explore available evaluations
In this tutorial, you explored some of the data quality and data drift checks on tabular data. Evidently also support evaluations on text data and model quality checks.
The easiest way to understand what else is there is to look at Presets. Both Tests and Reports have multiple Presets. Some, like Data Quality, require only input data. You can use them even without the reference dataset. When you have the true labels, you can run Presets like Regression Performance, Ranking Performance and Classification Performance to evaluate the model quality and errors.
To understand the contents of each Preset, head to the Preset overview. If you want to see the pre-rendered examples of the reports, browse Colab notebooks in the Examples section. You can also design custom Reports and Test Suites from individual Metrics and Tests.
Learn how to get a Monitoring Dashboard
If you want to track the results of different checks over time, you get an ML monitoring dashboard. Go through the ML monitoring quickstart (Evidently Cloud) - Recommended or ML monitoring quickstart (Self-hosting) to see how to monitor metrics over time.
Explore available integrations
To explore how to integrate Evidently with other tools, refer to the Integrations. For example, if you run predictions in batches, you can use a tool like Airflow to orchestrate the process.
Go through the steps in more detail
To better understand working with Reports and Test Suites, refer to the User Guide section of the docs. A good next step is to explore how to pass custom test parameters to define your own test conditions.
Evidently is in active development, so expect things to change and evolve. You can subscribe to the user newsletter or follow our releases on GitHub to stay updated about the latest functionality.
We run a Discord community to connect with our users and chat about ML in production topics.
In case you have feedback or need help, just ask in Discord or open a GitHub issue.
And if you want to support a project, give us a star on GitHub!