Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.evidentlyai.com/llms.txt

Use this file to discover all available pages before exploring further.

Define and run the eval

To log the evaluation results to the Evidently Platform, first connect to Evidently Cloud or your local workspace and create a Project. It’s optional: you can also run evals locally in Python.
1

Prepare the input data

Get your data in a table like a pandas.DataFrame. More on data requirements. You can also load data from Evidently Platform, like tracing data you captured or synthetic datasets.
2

Create a Dataset object

Create a Dataset object with DataDefinition() that specifies column role and types. You can also use default type detection. How to set Data Definition.
eval_data = Dataset.from_pandas(
    source_df,
    data_definition=DataDefinition()
)
3

(Optional) Add descriptors

For LLM and text evals, define row-level descriptors to compute. Here, you can use a variety of methods, from deterministic to LLM judges. Optionally, add row-level tests to get explicit pass/fail outcomes on set conditions. How to use Descriptors.
eval_data.add_descriptors(descriptors=[
    TextLength("Question", alias="Length"),
    Sentiment("Answer", alias="Sentiment")
])
4

Configure Report

For dataset-level evals (classification, data drift) or to summarize descriptors, create a Report with chosen metrics or presets. How to configure Reports.
report = Report([
    DataSummaryPreset()
])
5

(Optional) Add Test conditions

Add dataset-level Pass/Fail conditions, like to check if all texts in the dataset are in < 100 symbols length. How to configure Tests.
report = Report([
    DataSummaryPreset(),
    MaxValue(column="Length", tests=[lt(100)]),
])
6

(Optional) Add Tags and Timestamps

Add tags or metadata to identify specific evaluation runs or datasets, or override the default timestamp . How to add metadata.
7

Run the Report

To execute the eval, runthe Report on the Dataset (or two).
my_eval = report.run(eval_data, None)
8

Explore the results

ws.add_run(project.id, my_eval, include_data=True)
my_eval
##my_eval.json()

Quickstarts

Check for end-to-end examples:

LLM quickstart

Evaluate the quality of text outputs.

ML quickstart

Test tabular data quality and data drift.