Run a Test Suite
How to run Test Suites using Evidently Python library.
Code examples
Check the sample notebooks for examples of how to generate Test Suites.
Imports
After installing Evidently, import the TestSuite
component and the necessary test_presets
or tests
you plan to use:
How it works
Here is the general flow.
Input data. Prepare data as a Pandas DataFrame. This will be your
current
data to test. You may also pass areference
dataset to generate Test conditions from this reference or run data distribution Tests. Check the input data requirements.Schema mapping. Define your data schema using Column Mapping. Optional, but highly recommended.
Define the Test Suite. Create a
TestSuite
object and pass the selectedtests
.Set the parameters. Optionally, specify Test conditions and mark certain Tests as non-critical.
Run the Test Suite. Execute the Test Suite on your
current_data
. If applicable, pass thereference_data
andcolumn_mapping
.Get the results. View the results in Jupyter notebook, export the summary, or send to the Evidently Platform.
You can use Test Presets or create your Test Suite.
Test Presets
Test Presets are pre-built Test Suites that generate Tests for a specific aspect of the data or model performance.
Evidently also automatically generates Test conditions in two ways:
Based on the reference dataset. If you provide a
reference
, Evidently derives conditions from it. For example, theTestShareOfOutRangeValues
will fail if over 10% ofcurrent
values fall outside the min-max range seen in the reference. 10% is an encoded heuristic.Based on heuristics. Without a reference, Evidently uses heuristics. For example,
TestAccuracyScore()
fails if the model performs worse than a dummy model created by Evidently. Data quality Tests likeTestNumberOfEmptyRows()
orTestNumberOfMissingValues()
assume both should be zero.
Reference: Check the default Test conditions in the All tests table.
Example 1. To apply the DataQualityTestPreset
to a single curr
dataset, with conditions generated based on heuristics:
Available Test Presets. There are others: for example, DataStabilityTestPreset
, DataDriftTestPreset
or RegressionTestPreset
. See all Presets. For interactive preview, check example notebooks.
To get the visual report with Test results, call the object in Jupyter notebook or Colab:
To get the Test results summary, generate a Python dictionary:
There are more output formats! You can also export the results in formats like HTML, JSON, dataframe, and more. Refer to the Output Formats for details.
Example 2. To apply the DataStabilityTestPreset
, with conditions generated from reference, pass the reference_data
:
Example 3. To apply the NoTargetPerformanceTestPreset
with additional parameters:
By selecting specific columns for the Preset, you reduce the number of generated column-level Tests. When you specify the data drift detection method and threshold, it will override the defaults.
Refer to the All tests table to see available parameters and defaults for each Test and Test Preset.
Custom Test Suite
You can use Presets as a starting point, but eventually, you'll want to design a Test Suite to pick specific Tests and set conditions more precisely. Here’s how:
Choose individual Tests. Select the Tests you want to include in your Test Suite.
Pass Test parameters. Set custom parameters for applicable Tests. (Optional).
Set custom conditions. Define when Tests should pass or fail. (Optional).
Mark Test criticality. Mark non-critical Tests to give a Warning instead of Fail. (Optional).
1. Choose tests
First, decide which Tests to include. Tests can be either dataset-level or column-level.
Reference: see All tests table. To see interactive examples, refer to the Example notebooks.
Row-level evaluations: To Test row-level scores for text data, read more about Text Descriptors.
Dataset-level Tests. Some Tests apply to the entire dataset, such as checking the share of drifting features or accuracy. To add them to a Test Suite, create a TestSuite
object and list the tests
one by one:
Column-level Tests. Some Tests focus on individual columns, like checking if a specific column's values stay within a range. To include column-level Tests, pass the name of the column to each Test:
Generating many column-level Tests: To simplify listing many Tests at once, use the generator helper function.
Combining Tests. You can combine column-level and dataset-level Tests in a single Test Suite. You can also include Presets and individual Tests together.
2. Set Test parameters
Tests can have optional or required parameters.
Example 1. To test a quantile value, you need to specify the quantile (Required parameter):
Example 2: To override the default drift detection method, pass the chosen statistical method (Optional), or modify the Mean Value Test to use 3 sigmas:
Example 3: To change the decision threshold for probabilistic classification to 0.8:
Reference: you can browse available Test parameters and defaults in the All tests table.
3. Set Test conditions
You can set up your Test conditions in two ways:
Automatic. If you don’t specify individual conditions, the defaults (reference or heuristic-based) will apply, just like in Test Presets.
Manual. You can define when exactly a Test should pass or fail. For example, set a lower boundary for the expected model precision. If the condition is violated, the Test fails.
You can mix both approaches in the same Test Suite, where some Tests run with defaults and others with custom conditions.
Use the following parameters to set Test conditions:
Example 1. To Test that no values are out of range, and less than (lt
) 20% of values are missing:
Example 2. You can specify both the Test condition and parameters together.
In the example above, Evidently automatically derives the feature range from the reference. You can also manually set the range (e.g., between 2 and 10). The Test fails if any value is out of this range:
Example 3. To Test that the precision and recall is over 90%, with a set decision for the classification model:
Custom conditions with Approx
If you want to set an upper and/or lower limit to the value, you can use approx instead of calculating the value itself. You can set the relative or absolute range.
To use approx
, first import this component:
Example 1. Here is how you can set the upper boundary as 5+10%:
Example 2. Here is how you can set the boundary as 5 +/-10%:
4. Set Test criticality
By default, all Tests will return a Fail
if the Test condition is not fulfilled. If you want to get a Warning
instead, use the is_critical
parameter and set it to False
. Example:
Notebook example on setting Test criticality:
Last updated