Dataset
object with a DataDefinition
, which maps:
- Column types (e.g., categorical, numerical, text).
- Column roles (e.g., id, prediction, target).
Basic flow
Step 1. Imports. Import the following modules:Your data can have flexible structure with any mix of categorical, numerical or text columns. Check the Reference table for data requirements in specific evaluations.
Dataset.from_pandas
with data_definition
:
DataDefinition()
. Evidently will map columns:
- By type (numerical, categorical).
- By matching column names to roles (e.g., a column “target” treated as target).
How to set the data definition manually? See the section below for available options.
Special cases
Working directly with pandas.DataFrame. You can sometimes pass apandas.DataFrame
directly to report.run()
without creating the Dataset object. This works for checks like numerical/categorical data summaries or drift detection. However, it’s best to always create a Dataset
object explicitly for clarity and control.
Working with two datasets. If you’re working with current and reference datasets (e.g., for drift detection), create a Dataset object for each. Both must have identical data definition.
Data definition
This page shows all the different mapping options. Note that you only need to use the relevant ones that apply for your evaluation scenario. For example, you don’t need columns like target/prediction to run data quality or LLM checks.Column types
Knowing the column type helps compute correct statistics, visualizations, and pick default tests.Text data
If you run LLM evaluations, simply specify the columns with inputs/outputs as text.It’s optional but useful. You can generate text descriptors without explicit mapping. But it’s a good idea to map text columns since you may later run other evals which vary by column type.
Tabular data
Map numerical, categorical or datetime columns:If you exclude certain columns in mapping, they’ll be ignored in all evaluations.
Default column types
If you do not pass explicit mapping, the following defaults apply:Column Type | Description | Automated Mapping |
---|---|---|
numerical_columns |
| All columns with numeric types (np.number ). |
datetime_columns |
| All columns with DateTime format (np.datetime64 ). |
categorical_columns |
| All non-numeric/non-datetime columns. |
text_columns |
| No automated mapping. |
ID and timestamp
If you have a timestamp or ID column, it’s useful to identify them.Column role | Description | Automated mapping |
---|---|---|
id_column |
| Column named “id” |
timestamp |
| Column named “timestamp” |
How is
timestamp
different from datetime_columns
?- DateTime is a column type. You can have many DateTime columns in the dataset. For example, conversation start / end time or features like “date of last contact.”
- Timestamp is a role. You can have a single timestamp column. It often represents the time when a data input was recorded. Use it if you want to see it as index on the plots.
LLM evals
When you generate text descriptors and add them to the dataset, they are automatically mapped asdescriptors
in Data Definition. This means they will be included in the TextEvals
preset or treated as descriptors when you plot them on the dashboard.
However, if you computed some scores or metadata externally and want to treat them as descriptors, you can map them explicitly:
Regression
To run regression quality checks, you must map the columns with:- Target: actual values.
- Prediction: predicted values.
Classification
To run classification checks, you must map the columns with:- Target: true label.
- Prediction: predicted labels/probabilities.
Multiclass
Example mapping:When you have multiclass classification with predicted probabilities in separate columns, the column names in
prediction_probas
must exactly match the class labels. For example, if your classes are 0, 1, and 2, your probability columns must be named: “0”, “1”, “2”. Values in target
and prediction
columns should be strings.Binary
Example mapping:Ranking
RecSys
To evaluate recommender systems performance, you must map the columns with:- Prediction: this could be predicted score or rank.
- Target: relevance labels (e.g., this could be an interaction result like user click or upvote, or a true relevance label)
- a binary label (where
1
is a positive outcome) - any scores (positive values, where a higher value corresponds to a better match or a more valuable user action).
user_id | item_id | prediction (score) | target (relevance) |
---|---|---|---|
user_1 | item_1 | 1.95 | 0 |
user_1 | item_2 | 0.8 | 1 |
user_1 | item_3 | 0.05 | 0 |
user_id | item_id | prediction (rank) | target (relevance) |
---|---|---|---|
user_1 | item_1 | 1 | 0 |
user_1 | item_2 | 2 | 1 |
user_1 | item_3 | 3 | 0 |