To run evaluations, you must create a Dataset object with a DataDefinition, which maps:

  • Column types (e.g., categorical, numerical, text).
  • Column roles (e.g., id, prediction, target).

This allows Evidently to process the data correctly. Some evaluations need specific columns and will fail if they’re missing. You can define the mapping using the Python API or by assigning columns visually when uploading data to the Evidently platform.

Basic flow

Step 1. Imports. Import the following modules:

from evidently import Dataset
from evidently import DataDefinition

Step 2. Prepare your data. Use a pandas.DataFrame.

Your data can have flexible structure with any mix of categorical, numerical or text columns. Check the Reference table for data requirements in specific evaluations.

Step 3. Create a Dataset object. Use Dataset.from_pandas with data_definition:

eval_data = Dataset.from_pandas(
    pd.DataFrame(source_df),
    data_definition=DataDefinition()
)

To map columns automatically, pass an empty DataDefinition() . Evidently will map columns:

  • By type (numerical, categorical).
  • By matching column names to roles (e.g., a column “target” treated as target).

Automation works in many cases, but manual mapping is more accurate. It is also necessary for evaluating prediction quality or handling text columns.

How to set the data definition manually? See the section below for available options.

Step 4. Run evals. Once the Dataset object is ready, you can add Descriptors and run Reports.

Special cases

Working directly with pandas.DataFrame. You can sometimes pass a pandas.DataFrame directly to report.run() without creating the Dataset object. This works for checks like numerical/categorical data summaries or drift detection. However, it’s best to always create a Dataset object explicitly for clarity and control.

Working with two datasets. If you’re working with current and reference datasets (e.g., for drift detection), create a Dataset object for each. Both must have identical data definition.

Data definition

This page shows all the different mapping options. Note that you only need to use the relevant ones that apply for your evaluation scenario. For example, you don’t need columns like target/prediction to run data quality or LLM checks.

Column types

Knowing the column type helps compute correct statistics, visualizations, and pick default tests.

Text data

If you run LLM evaluations, simply specify the columns with inputs/outputs as text.

definition = DataDefinition(
    text_columns=["Latest_Review"]
    )
    
eval_data = Dataset.from_pandas(
    pd.DataFrame(source_df),
    data_definition=definition
)

It’s optional but useful. You can generate text descriptors without explicit mapping. But it’s a good idea to map text columns since you may later run other evals which vary by column type.

Tabular data

Map numerical, categorical or datetime columns:

definition = DataDefinition(
    text_columns=["Latest_Review"],
    numerical_columns=["Age", "Salary"],
    categorical_columns=["Department"],
    datetime_columns=["Joining_Date"]
    )
    
eval_data = Dataset.from_pandas(
    pd.DataFrame(source_df),
    data_definition=definition
)

Explicit mapping helps avoid mistakes like misclassifying numerical columns with few unique values as categorical.

If you exclude certain columns in mapping, they’ll be ignored in all evaluations.

Default column types

If you do not pass explicit mapping, the following defaults apply:

Column TypeDescriptionAutomated Mapping
numerical_columns
  • Columns with numeric values.
All columns with numeric types (np.number).
datetime_columns
  • Columns with datetime values.
  • Ignored in data drift calculations.
All columns with DateTime format (np.datetime64).
categorical_columns
  • Columns with categorical values.
All non-numeric/non-datetime columns.
text_columns
  • Text columns.
  • Mapping required for text data drift detection.
No automated mapping.

ID and timestamp

If you have a timestamp or ID column, it’s useful to identify them.

definition = DataDefinition(
    id_column="Id",
    timestamp="Date"
    )
Column roleDescriptionAutomated mapping
id_column
  • Identifier column.
  • Ignored in data drift calculations.
Column named “id”
timestamp
  • Timestamp column.
  • Ignored in data drift calculations.
Column named “timestamp”

How istimestamp different from datetime_columns?

  • DateTime is a column type. You can have many DateTime columns in the dataset. For example, conversation start / end time or features like “date of last contact.”
  • Timestamp is a role. You can have a single timestamp column. It often represents the time when a data input was recorded. Use it if you want to see it as index on the plots.

LLM evals

When you generate text descriptors and add them to the dataset, they are automatically mapped as descriptors in Data Definition. This means they will be included in the TextEvals preset or treated as descriptors when you plot them on the dashboard.

However, if you computed some scores or metadata externally and want to treat them as descriptors, you can map them explicitly:

definition = DataDefinition(
    numerical_descriptors=["chat_length", "user_rating"],
    categorical_descriptors=["upvotes", "model_type"]
    )

Regression

To run regression quality checks, you must map the columns with:

  • Target: actual values.
  • Prediction: predicted values.

You can have several regression results in the dataset, for example in case of multiple regression. (Pass the mappings in a list).

Example mapping:

definition = DataDefinition(
    regression=[Regression(target="y_true", prediction="y_pred")]
    )

Defaults:

    target: str = "target"
    prediction: str = "prediction"

Classification

To run classification checks, you must map the columns with:

  • Target: true label.
  • Prediction: predicted labels/probabilities.

There two different mapping options, for binary and multi-class classification. You can also have several classification results in the dataset. (Pass the mappings in a list).

Multiclass

Example mapping:

from evidently import MulticlassClassification

data_def = DataDefinition(
    classification=[MulticlassClassification(
        target="target",
        prediction_labels="prediction",
        prediction_probas=["0", "1", "2"],  # If probabilistic classification
        labels={"0": "class_0", "1": "class_1", "2": "class_2"}  # Optional, for display only
    )]
)

Available options and defaults:

    target: str = "target"
    prediction_labels: str = "prediction"
    prediction_probas: Optional[List[str]] = None #if probabilistic classification
    labels: Optional[Dict[Label, str]] = None

When you have multiclass classification with predicted probabilities in separate columns, the column names in prediction_probas must exactly match the class labels. For example, if your classes are 0, 1, and 2, your probability columns must be named: “0”, “1”, “2”. Values in target and prediction columns should be strings.

Binary

Example mapping:

from evidently import BinaryClassification

definition = DataDefinition(
    classification=[BinaryClassification(
        target="target",
        prediction_labels="prediction")],
    categorical_columns=["target", "prediction"])

Available options and defaults:

    target: str = "target"
    prediction_labels: Optional[str] = None
    prediction_probas: Optional[str] = "prediction" #if probabilistic classification
    pos_label: Label = 1 #name of the positive label
    labels: Optional[Dict[Label, str]] = None

Ranking

RecSys

To evaluate recommender systems performance, you must map the columns with:

  • Prediction: this could be predicted score or rank.
  • Target: relevance labels (e.g., this could be an interaction result like user click or upvote, or a true relevance label)

The target column can contain either:

  • a binary label (where 1 is a positive outcome)
  • any scores (positive values, where a higher value corresponds to a better match or a more valuable user action).

Here are the examples of the expected data inputs.

If the system prediction is a score (expected by default):

user_iditem_idprediction (score)target (relevance)
user_1item_11.950
user_1item_20.81
user_1item_30.050

If the model prediction is a rank:

user_iditem_idprediction (rank)target (relevance)
user_1item_110
user_1item_221
user_1item_330

Example mapping:

definition = DataDefinition(
    ranking=[Recsys()]
    )

Available options and defaults:

    user_id: str = "user_id" #columns with user IDs
    item_id: str = "item_id" #columns with ranked items
    target: str = "target"
    prediction: str = "prediction"