We have an applied course on LLM evaluations! Free video course with 10+ tutorials. Sign up.

Quickstarts

If you are new, start here.

LLM Tutorials

End-to-end examples of specific workflows and use cases.

LLM Evaluation Course - Video Tutorials

We have an applied LLM evaluation course where we walk through the core evaluation workflows. Each consists of the code example and a video tutorial walthrough.

📥 Sign up for the course

📹 See complete Youtube playlist

TutorialDescriptionCode exampleVideo
Intro to LLM EvalsIntroduction to LLM evaluation: concepts, goals, and motivations behind evaluating LLM outputs.
LLM Evaluation MethodsTutorial with an overview of methods.
  • Part 1. Anatomy of a single evaluation. Covers basic LLM evaluation API and setup.
  • Part 2. Reference-based evaluation: exact match, semantic similarity, BERTScore, and LLM judge.
  • Part 3. Reference-free evaluation: text statistics, regex, ML models, LLM judges, and session-level evaluators.
Open Notebook
LLM as a JudgeTutorial on creating and tuning LLM judges aligned with human preferences.Open Notebook
Clasification EvaluationTutorial on evaluating LLMs and a simple predictive ML baseline on a multi-class classification task.Open Notebook
Content Generation with LLMsTutorial on how to use LLMs to write tweets and evaluate how engaging they are. Introduction to the concept of tracing.Open Notebook
RAG evaluations
  • Part 1. Theory on how to evaluate RAG systems: retrieval, generation quality and synthetic data.
  • Part 2. Tutorial on building a toy RAG application and evaluating correctness and faithfulness.
Open Notebook
AI agent evaluationsTutorial on how to build a simple Q&A agent and evaluate tool choice and answer correctness.Open Notebook
Adversarial testingTutorial on how to run scenario-based risk testing on forbidden topics and brand risks.Open Notebook

More examples

You can also find more examples in the Example Repository.