What is Evidently?
Evidently helps evaluate, test, and monitor data and ML-powered systems.
Predictive tasks: classification, regression, ranking, recommendations.
Generative tasks: chatbots, RAGs, Q&A, summarization.
Data monitoring: data quality and data drift for text, tabular data, embeddings.
Evidently is available both as an open-source Python library and Evidently Cloud platform.
Get started
Evidently Cloud
AI evaluation and observability platform built on top of Evidently Python library. Includes advanced features, collaboration and support.
Evidently Open-Source
An open-source Python library with 20m+ downloads. Helps evaluate, test and monitor data, ML and LLM-powered systems.
You can explore more in-depth Examples and Tutorials.
How it works
Evidently helps evaluate and track quality of ML-based systems, from experimentation to production.
Evidently is both a library of 100+ ready-made evaluations, and a framework to easily implement yours: from Python functions to LLM judges.
Evidently has a modular architecture, and you can start with ad hoc checks without complex installations. There are 3 interfaces: you can get a visual Report
to see a summary of evaluation metrics, run conditional checks with a TestSuite
to get a pass/fail outcome, or plot the evaluation results over time on a Monitoring Dashboard
.
Reports
Reports compute different metrics on data and ML quality. You can use Reports for visual analysis and debugging, or as a computation layer for the monitoring dashboard.
You can be as hands-off or hands-on as you like: start with Presets, and customize metrics as you go.
Tests suites
Tests verify whether computed metrics satisfy defined conditions. Each Test returns a pass or fail result.
This interface helps automate your evaluations for regression testing, checks during CI/CD, or validation steps in data pipelines.
ML monitoring dashboard
The monitoring dashboard helps visualize ML system performance over time and detect issues. You can track key metrics and test outcomes.
You can use Evidently Cloud or self-host. Evidently Cloud offers extra features like user authentication and roles, built-in alerting, and a no-code interface.
What can you evaluate?
Evidently Reports, Test Suites and ML Monitoring dashboard rely on the shared set of metrics. Here are some examples of what you can evaluate.
Evaluation group | Examples |
---|---|
Tabular Data Quality | Missing values, duplicates, empty rows or columns, min-max ranges, new categorical values, correlation changes, etc. |
Text Descriptors | Text length, out-of-vocabulary words, share of special symbols, regular expressions matches. |
Data Distribution Drift | Statistical tests and distance metrics to compare distributions of model predictions, numerical and categorical features, text data, or embeddings. |
Classification Quality | Accuracy, precision, recall, ROC AUC, confusion matrix, class separation quality, classification bias. |
Regression Quality | MAE, ME, RMSE, error distribution, error normality, error bias per group and feature. |
Ranking and Recommendations | NDCG, MAP, MRR, Hit Rate, recommendation serendipity, novelty, diversity, popularity bias. |
LLM Output Quality | Model-based scoring with external models and LLMs to detect toxicity, sentiment, evaluate retrieval relevance, etc. |
You can also implement custom checks as Python functions or define your prompts for LLM-as-a-judge.
See more:
Community and support
Evidently is in active development, and we are happy to receive and incorporate feedback. If you have any questions, ideas or want to hang out and chat about doing ML in production, join our Discord community!
User newsletter
To get updates on new features, integrations and code tutorials, sign up for the Evidently User Newsletter.
Last updated