We have an applied course on LLM evaluations! Free video course with 10+ tutorials. Sign up.
Quickstarts
If you are new, start here.LLM quickstart
Evaluate the quality of text outputs.
ML quickstart
Test tabular data quality and data drift.
Tracing quickstart
Collect inputs and outputs from AI your app.
LLM Tutorials
End-to-end examples of specific workflows and use cases.LLM as a judge
How to create and evaluate an LLM judge against human labels.
RAG evaluation
A walkthrough of different RAG evaluation metrics.
LLM as a jury
Using multiple LLMs to evaluate the same output.
LLM evaluation methods
A walkthrough of different LLM evaluation methods. [CODE + VIDEO]
Descriptor cookbook
A walkthrough of different descriptors (deterministic, ML, etc.) a single notebook.
LLM judge prompt optimization (1)
Optimize a multi-class classifier using target labels.
LLM judge prompt optimization (2)
Optimize a binary classifier using target labels and free-form feedback.
ML tutorials
End-to-end examples of specific workflows and use cases.Integrations
End-to-end examples of integrating Evidently with other tools and platforms.GitHub actions
Running Evidently evals as part of CI/CD workflow. Native GitHub action integration for regression testing.
Different LLM providers as judges
Examples of using different external evaluator LLMs as LLM judges: OpenAI, Gemini, Google Vertex, Mistral, Ollama.
Evidently + Grafana: LLM evals
Visualize Evidently LLM evaluation metrics with Grafana. (Postgres as a database).
Evidently+ Grafana: Data drift
Visualize Evidently data drift evaluations on a Grafana dashboard. (Postgres as a database).
Deployment
LLM Evaluation Course - Video Tutorials
We have an applied LLM evaluation course where we walk through the core evaluation workflows. Each consists of the code example and a video tutorial walthrough. 📥 Sign up for the course 📹 See complete Youtube playlistTutorial | Description | Code example | Video |
---|---|---|---|
Intro to LLM Evals | Introduction to LLM evaluation: concepts, goals, and motivations behind evaluating LLM outputs. | – |
|
LLM Evaluation Methods | Tutorial with an overview of methods.
| Open Notebook |
|
LLM as a Judge | Tutorial on creating and tuning LLM judges aligned with human preferences. | Open Notebook |
|
Clasification Evaluation | Tutorial on evaluating LLMs and a simple predictive ML baseline on a multi-class classification task. | Open Notebook |
|
Content Generation with LLMs | Tutorial on how to use LLMs to write tweets and evaluate how engaging they are. Introduction to the concept of tracing. | Open Notebook |
|
RAG evaluations |
| Open Notebook |
|
AI agent evaluations | Tutorial on how to build a simple Q&A agent and evaluate tool choice and answer correctness. | Open Notebook |
|
Adversarial testing | Tutorial on how to run scenario-based risk testing on forbidden topics and brand risks. | Open Notebook |
|