Evaluate text outputs in under 5 minutes
descriptor
. It adds a new score or label to each row in your dataset.
For LLM-as-a-judge, we’ll use OpenAI GPT-4o mini. Set OpenAI key as an environment variable:
IncludesWords
instead.Add test conditions
How to add test conditions
Custom LLM judge
How to create a custom LLM evaluator