Tutorial - Data & ML Monitoring
Get started with Evidently Cloud. Run checks and customize a Dashboard in 15 minutes.
Last updated
Get started with Evidently Cloud. Run checks and customize a Dashboard in 15 minutes.
Last updated
In this tutorial, you'll set up production data and ML monitoring for a toy ML model. You'll run evaluations in Python and access a web dashboard in Evidently Cloud.
The tutorial consists of three parts:
Overview of the architecture (2 min).
Launching a pre-built demo Dashboard (2-3 min).
Setting up monitoring for a new toy dataset (10 min).
You'll need basic knowledge of Python. Once you connect the data, you can continue in the web interface.
Want a very simple example first? Check this Evidently Cloud "Hello World" instead.
Video version:
If you're having problems or getting stuck, reach out on Discord.
Evidently Cloud helps you monitor the performance of ML-powered systems in production: from tracking the quality of incoming data to the accuracy of model predictions.
The monitoring setup consists of two components:
Open-source Evidently Python library. You perform evaluations in your environment. Each run produces a JSON snapshot
with statistics, metrics, or test results for a specific period. You then send these snapshots
to Evidently Cloud using an API key.
Evidently Cloud web app. After sending the data, you can access it in the Evidently Cloud UI. You can view individual evaluation results, build a Dashboard with trends over time, and set up alerts to notify on issues.
You can run batch monitoring jobs (e.g., hourly, daily, weekly) or use Evidently Collector for near real-time checks. This tutorial shows a batch workflow.
Data security by design. By default, Evidently Cloud does not store raw data or model inferences. Snapshots contain only data aggregates (e.g., histograms of data distributions, descriptive stats, etc.) and metadata with test results. This hybrid architecture helps avoid data duplication and preserves its privacy.
Let's quickly look at an example monitoring Dashboard.
If you do not have one yet, create an Evidently Cloud account.
Go to the main page, click on "plus" sign and create a new Team. For example, "personal" Team.
Click on "Generate Demo Project" inside your Team. It will create a Project for a toy regression model that forecasts bike demand.
It'll take a few moments to populate the data. In the background, Evidently will run the code to generate Reports and Test Suites for 20 days. Once it's ready, open the Project to see a monitoring Dashboard.
Dashboards Tabs will show data quality, data drift, and model quality over time.
You can customize the choice of Panels and Tabs for your Project – this is just an example.
You can also see individual snapshots if you navigate to the "Reports" or "Test Suites" section using the left menu. They display the performance on a given day and act as a data source for the monitoring Panels.
Now, let's see how you can create something similar for your dataset!
You'll use a toy dataset to mimic a production ML model. You'll follow these steps:
Prepare a tabular dataset.
Run data quality and data drift Reports in daily batches.
Send them to Evidently Cloud.
Get a Dashboard to track metrics over time.
(Optional) Add custom monitoring panels.
(Optional) Run Test Suites for continuous testing.
In the example, you'll track data quality and drift. ML monitoring often starts here because true labels for assessing model quality come with a delay. Until then, you monitor the incoming data and predictions.
However, the core workflow tutorial covers will work for any evaluation. You can later expand it to monitor ML model quality and text-based LLM models.
To complete the tutorial, use the provided code snippets or run a sample notebook.
Jupyter notebook:
Or click to open in Colab.
Evidently is available as a PyPi package. Run the command to install it:
You can also install Evidently from Conda:
You'll need several components to complete the tutorial. Import the components to prepare the toy data:
Import the components to compute and send the snapshots:
Optional. Import the components to design monitoring Panels via API. This is entirely optional: you can also add the Panels using the UI.
You'll use the adult
dataset from OpenML.
Import it as a pandas DataFrame
.
Split it into two datasets: adult_ref
(reference dataset) and adult_prod
(current production data).
We'll base the split on the "education" feature to introduce some artificial drift for demo purposes. Current data will include people with education levels unseen in the reference dataset. Here's how you can do it:
What is a reference dataset? You need one to evaluate distribution drift. Here, you compare the current data against a past period, like an earlier data batch. You must provide this reference to compute the distance between two datasets. A reference dataset is optional when you compute descriptive stats or model quality metrics.
Preview the dataset. It resembles a binary classification use case with "class" as the prediction column.
Now, let's start monitoring!
Get the API token. To connect to Evidently Cloud, you need an access token. Use the "key" sign in the left menu to get to the token page, and click "generate token."
To connect to the Evidently Cloud workspace, run:
Now, you need to create a new Project. You can do this programmatically or in the UI.
Click on the “plus” sign on the home page. Create a Team if you do not have one yet. Type your Project name and description.
After creating a Project, click its name to open the Dashboard. Since there's no data yet, it will be empty.
To send data to this Project, you'll need to connect to it from your Python environment using get_project
method. You can find your Project ID above the monitoring Dashboard.
What is a Project? Projects help organize monitoring for different use cases. Each Project has a shared Dashboard and alerting. You can create a Project for a single ML model or dataset or put related models together and use Tags to distinguish them.
To send snapshots, first compute them using the Evidently Python library. Here's the process:
Prepare the data batch to evaluate.
Create a Report
or TestSuite
object.
Define Metrics
or Tests
to include.
Pass optional parameters, like data drift detection method or test conditions.
Compute and send the snapshot to the Project.
What are Reports and Test Suites? These are pre-built evaluations available in the open-source Evidently Python library. They cover 100+ checks for data quality, data drift, and model quality. You can check out the open-source Evidently Tutorial for an introduction. A snapshot
is a "JSON version" of a Report or Test Suite.
Let’s start with data quality and drift checks using Presets
. This will help observe how model inputs and outputs are changing. For each batch of data, you'll generate:
Data Quality Preset. It captures stats like feature ranges and missing values.
Data Drift Preset. This compares current and reference data distributions. You will use PSI (Population Stability Index) method, with a 0.3 threshold for significant drift.
To create a single Report using the first 100 rows of our "production" data:
Defining the dataset. To specify the dataset to evaluate, you pass it as the current_dataset
inside the run
method. Our example uses a slice function adult_prod.iloc[0 : 100, :]
to select 100 rows from the adult_prod
dataset. In practice, simply pass your data: current_data=your_batch_name
.
To send this Report to the Evidently Cloud, use the add_report
method.
You can now view the Report in the Evidently Cloud web app. Go to the "Reports" section via the left menu and click to open the first Report. You can also download it as an HTML or JSON.
In production, you can run evaluations on a schedule (e.g., daily or hourly) each time passing a new batch of data. Once you have multiple snapshots in the Project, you can plot trends on a monitoring Dashboard.
To simulate production use, let’s create a script to compute multiple Reports, taking 100 rows per "day":
You can set the loop variable i to 10 to generate and send Reports for 10 days.
Run the script to compute and send 10 daily snapshots. Go to the "Reports" section to view them.
However, each such Report is static. To see trends, you need a monitoring Dashboard!
Want to reuse this script for your data? If you try replacing the toy dataset for your data, increasing the i
, or adding more metrics, it's best to send Reports one by one instead of running a script. Otherwise, you might hit Rate Limits when sending many Reports together. For free trial users, the limit on the single data upload is 50MB; for paying users, it is 500MB. Snapshot size varies based on metrics and tests included.
Monitoring Dashboard helps observe trends. It pulls selected values from individual Reports to show them over time. You can add multiple monitoring Panels and organize them by Tabs.
For a simple start, you can use Tab templates, which are pre-built combinations of monitoring Panels:
Data Quality Tab: displays data quality metrics (nulls, duplicates, etc.).
Columns Tab: shows descriptive statistics for each column over time.
Data Drift Tab: shows the share of drifting features over time.
To add pre-built Tabs, enter "Edit" mode in the top right corner of the Dashboard. Click the plus sign to add a new Tab and choose the template.
You can also add individual monitoring Panels one by one. You can:
Add them to an existing or a new Tab.
Choose the Panel type, including Line Plot, Bar Plot, Histogram, Counter, etc.
Customize Panel name, legend, etc.
You can only view values stored inside snapshots. In our example, they relate to data drift and quality. You can't see model quality metrics yet, since there is no data on it. If you add a model quality Panel, it will be empty. To populate it, add more snapshots, for example, with ClassificationPreset()
.
Say, you want to add a new “Summary” Tab and add a couple of Panels to show:
Inferences over time.
The share of drifting features over time.
You can add panels both in the UI or using the Python API.
Enter the “edit” mode on the Dashboard, and use the “add Tab” and “add Panel” buttons to add a new Panel. Follow the prompts to point to a specific measurement.
To view inferences over time, plot the value current.number_of_rows
inside the DatasetSummaryMetric
.
To view the share of drifting columns, plot the value share_of_drifted_columns
inside the DatasetDriftMetric
. Choose a Panel type - for example, LINE or BAR plot, and add your legend.
How to add and modify Panels? Check the detailed instructions on how to design monitoring panels. You can also add text-only Panels and counters.
You just created a Dashboard to track individual metric values. Another option is to run your evaluations as Tests and track their outcomes.
To do this, use Test Suites instead of Reports. Each Test in a Test Suite checks a specific condition, e.g., “the share of missing values in this column should be less than 10%”. You can bundle many Tests together and track which passed or failed. You can combine dataset- and column-level tests.
What Tests are available? Choose from 50+ Tests, use Presets and create custom Test Suites.
Let’s create a Test Suite that includes:
Data Drift Test Preset. It will generate a data drift check for all columns in the dataset using the same PSI method with a 0.3 threshold.
Individual data quality tests. They will check for missing values, empty rows, columns, duplicates, and constant columns. You can set test conditions using parameters like eq
(equal) or lte
(less than or equal). If you don't specify the conditions, Evidently will auto-generate them based on the reference data.
Here's a script that again simulates generating Test Suites for 10 days in a row:
To send Test Suites to Evidently cloud, use the add_test_suite
method.
To visualize the results, add a new Dashboard Tab ("Data tests") and test-specific monitoring Panels.
Enter the “edit” Dashboard mode, click the “add Tab” and “add Panel” buttons. Choose the “Test Plot” panel type, with a "detailed" option and 1D (daily) aggregation level. You can add:
One Panel with all column drift checks. Choose the TestColumnDrift
test for `all' columns.
One Panel with dataset-level data quality checks. Choose the TestNumberOfConstantColumns
, TestShareOfMissingValues
, TestNumberOfEmptyRows
, TestNumberOfEmptyColumns
, TestNumberOfDuplicatedColumns
from the dropdown.
You'll see Dashboards with Test results over time in the new Tab. Head to the "Test Suites" section in the left menu for individual Test Suites. This helps debug Test outcomes.
To go through all the steps in more detail, refer to the complete Monitoring User Guide. Here are some of the things you might want to explore next:
Build your batch or real-time workflow. For batch evaluations, you can run regular monitoring jobs - for example, using a tool like Airflow or a script to orchestrate them. If you have a live ML service, you use Evidently collector service to collect incoming production data and manage the computations.
Add alerts. You can enable email, Slack, or Discord alerts when Tests fail or specific values are out of bounds.
Use Tags. You can add Metadata or Tags to your snapshots and filter monitoring Panels. For instance, build individual monitoring Panels for two model versions.
Need help? Ask in our Discord community.