Core Concepts

This is an explanatory page to describe the key features and concepts at Evidently.

TL;DR

Evidently helps evaluate, test and monitor ML models in production.

  • A Metric is a core component of Evidently. You can combine multiple Metrics in a Report. Reports are best for visual analysis and debugging of your models and data.

  • A Test is a metric with a condition. Each test returns a pass or fail result. You can combine multiple Tests in a Test Suite. Test Suites are best for automated model checks as part of an ML pipeline.

For both Tests and Metrics, Evidently has Presets. These are pre-built combinations of metrics or checks that fit a specific use case.

  • A Snapshot is a JSON version of the Report or a Test Suite which contains measurements and test results for a specific period. You can log them over time and run an Evidently Monitoring Dashboard for continuous monitoring.

Metrics and Reports

What is a Metric?

A Metric is a component that evaluates a specific aspect of the data or model quality.

A Metric can be, literally, a single metric (for example, DatasetMissingValuesMetric() returns the share of missing features). It can also be a combination of metrics (for example, DatasetSummaryMetric() calculates various descriptive statistics for the dataset). Metrics exist on the dataset level and on the column level.

Each Metric has a visual render. Some visualizations simply return the values:

Others have rich visualizations. Here is an example of a dataset-level Metric that evaluates the error of a regression model:

Here is an example of a column-level Metric that evaluates the value range of certain feature:

Evidently contains 35+ Metrics (with 100+ different measurements) related to data quality, integrity, drift and model performance. You can also implement a custom one.

What is a Report?

A Report is a combination of different Metrics that evaluate data or ML model quality.

Уou can display an interactive report inside a Jupyter notebook or export it as an HTML file:

The Report output is also available as JSON or Python dictionary. This "text" version returns any new calculated values and, optionally, some other useful information such as histogram bins. You can also define what to include. Example:

{'timestamp': '2022-10-26 17:46:47.214403',
 'metrics': {'DatasetDriftMetric': {'threshold': 0.5,
   'number_of_columns': 15,
   'number_of_drifted_columns': 5,
   'share_of_drifted_columns': 0.3333333333333333,
   'dataset_drift': False},
  'DataDriftTable': {'number_of_columns': 15,
   'number_of_drifted_columns': 5,
   'share_of_drifted_columns': 0.3333333333333333,
   'dataset_drift': False,
   'drift_by_columns': {'age': {'column_name': 'age',
     'column_type': 'num',
     'stattest_name': 'Wasserstein distance (normed)',
     'drift_score': 0.18534692319042428,
     'drift_detected': True,
     'threshold': 0.1}}}}}

You can also export the Report output as an Evidently snapshot. This is a more complete JSON that allows recreating the original HTML Report. Use this option if you want to enable logging and continuous monitoring of the model or data performance.

You can read more about Monitoring here:

pageMonitoring overview

You can calculate most Reports for a single dataset. If you pass two datasets, they will show a side-by-side comparison.

You can generate a Report by listing individual Metrics to include in it. You can also run one of the Presets that cover a specific aspect of the model or data performance.

What is a Metric Preset?

A Metric Preset is a pre-built Report that combines Metrics for a particular use case.

You can think of it as a template. For example, there is a Preset to check for Data Drift (DataDriftPreset), Data Quality (DataQualityPreset), or Regression Performance (RegressionPreset).

You can explore all Metrics and Presets here:

pageAll metrics

When to use Reports

You can use Reports at different stages of the ML lifecycle: from exploratory data analysis and model validation to production monitoring and debugging.

Debugging and exploration. Reports are best for visual analysis of the data or model performance. For example, during model quality evaluation on the training set, when debugging the model quality decay, or comparing two models.

Metric logging. You can also add a model or data evaluation step in the ML pipeline, get outputs as a JSON or an Evidently snapshot and log it to later visualize and track model and data performance over time.

Reporting and documentation. You can also use Evidently reports to share results with the team and stakeholders or log them as documentation. For example, you can record the model performance results after training.

Tests and Test Suites

What is a Test?

Tests help perform structured data and ML model performance checks. They explicitly verify expectations about your data and model.

A Test is a Metric with a condition. It calculates a value and compares it against the defined threshold.

If the condition is satisfied, the test returns a success.

If you choose to get a visual output with the test results, you will see the current value of the metric and the test condition. On expand, you will get a supporting visualization.

Here is an example of a column-level Test that evaluates the mean value stability:

Here is an example of a dataset-level Test that evaluates model error:

If the condition is not satisfied, the Test returns a fail:

If the Test execution fails, it will return an error.

Evidently contains 70+ individual Tests that cover different aspects of model and data quality.

You can set test conditions on your own or pass the reference dataset to auto-generate test conditions. You can also run most of the Tests using defaults even if you do not pass a reference dataset: the tests will use heuristics and dummy models.

What is a Test Suite?

In most cases, you’d want to run more than one check.

You can list multiple Tests and execute them together in a Test Suite. You will see a summary of the results:

If you include a lot of Tests, you can navigate the output by groups:

Test output is available as an interactive HTML report, JSON, Python dictionary, or evidently snapshots for logging and monitoring.

You can create your Test Suite from individual Tests or use one of the existing Presets.

What is a Test Preset?

A Test Preset is a pre-built Test Suite that combines checks for a particular use case.

You can think of it as a template to start with. For example, there is a Preset to check for Data Quality (DataQualityTestPreset), Data Stability (DataStabilityTestPreset), or Regression model performance (RegressionTestPreset).

You can explore all Tests and Presets here:

pageAll tests

When to use Test Suites

For test-based monitoring of production ML models: tests are best suited for integration in ML prediction pipelines. You can easily integrate Evidently Tests with workflow management tools like Airflow.

You can use them to perform batch checks for your data or models.

For example, you can run the tests when you:

  • get a new batch of the input data

  • generate a new set of predictions

  • receive a new batch of the labeled data

  • want to check on your model on a schedule

You can then build a conditional workflow based on the result of the tests: for example, generate a visual report for debugging, trigger model retraining, or send an alert. You can also visualize the test results over time in the Evidently Monitoring UI.

During model development: you can also use tests during model development and validation. For example, you can run tests to evaluate the data quality of the new feature set or to compare test performance to training.

Test Suites or Reports?

Reports and Test Suites are complementary interfaces.

Reports are best for debugging, exploratory and ad hoc analytics. They focus on interactive visualizations and do not require setting any expectations upfront. You can use them, for example, when you just put a model in production and want to closely monitor the performance. It is best to use Reports on smaller datasets or sample your data first.

Test Suites are best for automation. Use them when you can set up expectations upfront (or derive them from the reference dataset). Tests force you to think through what you expect from your data and models, and you can run them at scale, only reacting to the failure alerts. You can use Test Suites on larger datasets since they do not include heavy visuals.

You can also use both Reports and Test Suites. For example, run tests for automated model checks and if tests fail, use Reports for visual debugging.

Last updated