End-to-end code examples.
Tutorial | Description | Code example | Video |
---|---|---|---|
Intro to LLM Evals | Introduction to LLM evaluation: concepts, goals, and motivations behind evaluating LLM outputs. | – |
|
LLM Evaluation Methods | Tutorial with an overview of methods.
| Open Notebook |
|
LLM as a Judge | Tutorial on creating and tuning LLM judges aligned with human preferences. | Open Notebook |
|
Clasification Evaluation | Tutorial on evaluating LLMs and a simple predictive ML baseline on a multi-class classification task. | Open Notebook |
|
Content Generation with LLMs | Tutorial on how to use LLMs to write tweets and evaluate how engaging they are. Introduction to the concept of tracing. | Open Notebook |
|
RAG evaluations |
| Open Notebook |
|
AI agent evaluations | Tutorial on how to build a simple Q&A agent and evaluate tool choice and answer correctness. | Open Notebook |
|
Adversarial testing | Tutorial on how to run scenario-based risk testing on forbidden topics and brand risks. | Open Notebook |
|