You can also customize existing evals with parameters, such as defining custom LLM judges or using regex-based metrics like
Contains
for word lists. See available descriptors.- You know how to use built-in descriptors.
Imports
Toy data to run the example
Toy data to run the example
To generate toy data and create a Dataset object:
Single column check
You can define aCustomColumnDescriptor
that will:
- take any column from your dataset to evaluate each value inside it
-
return a single column with numerical (
num
) scores or categorical (cat
) labels.
Multi-column check
You can alternatively define aCustomDescriptor
that:
- Takes one or many named columns from your dataset,
- Returns one or many transformed columns.
target_answer
and answer
columns, and return a label:
CustomDescriptor
to run evals for multiple columns and return multiple scores.
As a fun example, let’s reverse all words in the question
and answer
columns: