We have an applied course on LLM evaluations! Free video course with 10+ tutorials. Sign up.

Quickstarts

If you are new, start here.

LLM Tutorials

End-to-end examples of specific workflows and use cases.

Integrations

End-to-end examples of integrating Evidently with other tools and platforms.

LLM Evaluation Course - Video Tutorials

We have an applied LLM evaluation course where we walk through the core evaluation workflows. Each consists of the code example and a video tutorial walthrough.

📥 Sign up for the course

📹 See complete Youtube playlist

TutorialDescriptionCode exampleVideo
Intro to LLM EvalsIntroduction to LLM evaluation: concepts, goals, and motivations behind evaluating LLM outputs.
  • Video
LLM Evaluation MethodsTutorial with an overview of methods.
  • Part 1. Anatomy of a single evaluation. Covers basic LLM evaluation API and setup.
  • Part 2. Reference-based evaluation: exact match, semantic similarity, BERTScore, and LLM judge.
  • Part 3. Reference-free evaluation: text statistics, regex, ML models, LLM judges, and session-level evaluators.
Open Notebook
  • Video 1
  • Video 2
  • Video 3
LLM as a JudgeTutorial on creating and tuning LLM judges aligned with human preferences.Open Notebook
  • Video
Clasification EvaluationTutorial on evaluating LLMs and a simple predictive ML baseline on a multi-class classification task.Open Notebook
  • Video
Content Generation with LLMsTutorial on how to use LLMs to write tweets and evaluate how engaging they are. Introduction to the concept of tracing.Open Notebook
  • Video
RAG evaluations
  • Part 1. Theory on how to evaluate RAG systems: retrieval, generation quality and synthetic data.
  • Part 2. Tutorial on building a toy RAG application and evaluating correctness and faithfulness.
Open Notebook
  • Video 1
  • Video 2
AI agent evaluationsTutorial on how to build a simple Q&A agent and evaluate tool choice and answer correctness.Open Notebook
  • Video
Adversarial testingTutorial on how to run scenario-based risk testing on forbidden topics and brand risks.Open Notebook
  • Video

More examples

You can also find more examples in the Example Repository.