You know how to generate Reports or Test Suites for text data using Descriptors.
You know how to pass custom parameters for Reports or Test Suites.
You know how to specify text data in column mapping.
You can use external LLMs to score your text data. This method lets you evaluate texts based on any custom criteria that you define in a prompt.
The LLM “judge” must return a numerical score or a category for each text in a column. You will then be able to view scores, analyze their distribution or run conditional tests through the usual Descriptor interface.
Evidently currently supports scoring data using Open AI LLMs. Use the OpenAIPrompting() descriptor to define your prompt and criteria.
Code example
You can refer to an end-to-end example with different Descriptors:
OpenAI key. Add the OpenAI token as the environment variable: see docs. You will incur costs when running this eval.
To import the Descriptor:
from evidently.descriptors import OpenAIPrompting
Define a prompt. This is a simplified example:
pii_prompt ="""Please identify whether the below text contains personally identifiable information, such as name, address, date of birth, or other.
Text: REPLACE Use the following categories for PII identification:1 if text contains PII0 if text does not contain PII0 if the provided data is not sufficient to make a clear determinationReturn only one category."""
The prompt has a REPLACE placeholder that will be filled with the texts you want to evaluate. Evidently will take the content of each row in the selected column, insert into the placeholder position in a prompt and pass it to the LLM for scoring.
To compute the score for the column response and get a summary Report: