Tutorials and guides - Evidently AI

We have an applied LLM evaluation course where we walk through the core evaluation workflows. Each consists of the code example and a video tutorial walthrough. 📥 Sign up for the course 📹 See complete Youtube playlist

Tutorial	Description	Code example	Video
Intro to LLM Evals	Introduction to LLM evaluation: concepts, goals, and motivations behind evaluating LLM outputs.	–	Video
LLM Evaluation Methods	Tutorial with an overview of methods. Part 1. Anatomy of a single evaluation. Covers basic LLM evaluation API and setup. Part 2. Reference-based evaluation: exact match, semantic similarity, BERTScore, and LLM judge. Part 3. Reference-free evaluation: text statistics, regex, ML models, LLM judges, and session-level evaluators.	Open Notebook	Video 1 Video 2 Video 3
LLM as a Judge	Tutorial on creating and tuning LLM judges aligned with human preferences.	Open Notebook	Video
Clasification Evaluation	Tutorial on evaluating LLMs and a simple predictive ML baseline on a multi-class classification task.	Open Notebook	Video
Content Generation with LLMs	Tutorial on how to use LLMs to write tweets and evaluate how engaging they are. Introduction to the concept of tracing.	Open Notebook	Video
RAG evaluations	Part 1. Theory on how to evaluate RAG systems: retrieval, generation quality and synthetic data. Part 2. Tutorial on building a toy RAG application and evaluating correctness and faithfulness.	Open Notebook	Video 1 Video 2
AI agent evaluations	Tutorial on how to build a simple Q&A agent and evaluate tool choice and answer correctness.	Open Notebook	Video
Adversarial testing	Tutorial on how to run scenario-based risk testing on forbidden topics and brand risks.	Open Notebook	Video

More examples

You can also find more examples in the Example Repository.

Intro

LLM Tutorials

Integrations

​Quickstarts

LLM quickstart

ML quickstart

Tracing quickstart

​LLM Tutorials

LLM as a judge

RAG evaluation

LLM as a jury

LLM evaluation methods

Descriptor cookbook

LLM judge prompt optimization (1)

LLM judge prompt optimization (2)

​ML tutorials

Metric cookbook

​Integrations

GitHub actions

Different LLM providers as judges

Evidently + Grafana: LLM evals

Evidently+ Grafana: Data drift

​Deployment

Evidently Open-source UI tutorial

​LLM Evaluation Course - Video Tutorials

​More examples

Quickstarts

LLM Tutorials

ML tutorials

Integrations

Deployment

LLM Evaluation Course - Video Tutorials

More examples