Data and ML checks
Run a simple evaluation for tabular data
Need help? Ask on Discord.
Evidently helps you run tests and evaluations for your production ML systems. This includes:
- evaluating prediction quality (e.g. classification or regression accuracy)
- input data quality (e.g. missing values, out-of-range features)
- data and prediction drift.
Evaluating distribution shifts (data drift) in ML inputs and predictions is a typical use case that helps you detect shifts in the model quality and environment even without ground truth labels.
In this Quickstart, you’ll run a simple data drift report in Python and view the results in Evidently Cloud. If you want to stay fully local, you can also do that - just skip a couple steps.
1. Set up your environment
For a fully local flow, skip steps 1.1 and 1.3.
1.1. Set up Evidently Cloud
-
Sign up for a free Evidently Cloud account.
-
Create an Organization if you log in for the first time. Get an ID of your organization. (Link).
-
Get an API token. Click the Key icon in the left menu. Generate and save the token. (Link).
1.2. Installation and imports
Install the Evidently Python library:
Components to run the evals:
Components to connect with Evidently Cloud:
1.3. Create a Project
Connect to Evidently Cloud using your API token:
Create a Project within your Organization, or connect to an existing Project:
2. Prepare a toy dataset
Let’s import a toy dataset with tabular data:
Let’s split the data into two and introduce some artificial drift for demo purposes. Prod
data will include people with education levels unseen in the reference dataset:
Map the column types:
Create Evidently Datasets to work with:
Eval_data_2
will be our reference dataset we’ll compare against.
3. Get a Report
Let’s generate a Data Drift preset that will check for statistical distribution changes between all columns in the dataset.
You can customize drift parameters by choosing different methods and thresholds. In our case we proceed as is so default tests selected by Evidently will apply.
4. Explore the results
Local preview. In a Python environment like Jupyter notebook or Colab, run:
This will render the Report directly in the notebook cell. You can also get a JSON or Python dictionary, or save as an external HTML file.
Local Reports are great for one-off evaluations. To run continuous monitoring (e.g. track the share of drifting features over time), keep track of the results and collaborate with others, upload the results to Evidently Platform.
Upload the Report with summary results:
View the Report. Go to Evidently Cloud, open your Project, navigate to “Reports” in the left and open the Report. You will see the summary with scores and Test results.
5. Get a Dashboard (Optional)
As you run repeated evals, you may want to track the results in time by creating a Dashboard. Evidently lets you configure the dashboard in the UI or using dashboards-as-code.
This will result in the following Dashboard you’ll be able to access in the Dashboard tab (left menu).
For now, you will see only one datapoint, but as you add more Reports (e.g. daily or weekly), you’ll be able to track the results over time.
What’s next?
- See available Evidently Metrics: All Metric Table
- Understand how you can add conditional tests to your Reports: Tests.
- Explore options for Dashboard design: Dashboards
Alternatively, try DataSummaryPreset
that will generate a summary of all columns in the dataset, and run auto-generated Tests to check for data quality and core descriptive stats.