How to map the input data.
Dataset
object with a DataDefinition
, which maps:
Dataset.from_pandas
with data_definition
:
DataDefinition()
. Evidently will map columns:
pandas.DataFrame
directly to report.run()
without creating the Dataset object. This works for checks like numerical/categorical data summaries or drift detection. However, it’s best to always create a Dataset
object explicitly for clarity and control.
Working with two datasets. If you’re working with current and reference datasets (e.g., for drift detection), create a Dataset object for each. Both must have identical data definition.
Column Type | Description | Automated Mapping |
---|---|---|
numerical_columns |
| All columns with numeric types (np.number ). |
datetime_columns |
| All columns with DateTime format (np.datetime64 ). |
categorical_columns |
| All non-numeric/non-datetime columns. |
text_columns |
| No automated mapping. |
Column role | Description | Automated mapping |
---|---|---|
id_column |
| Column named “id” |
timestamp |
| Column named “timestamp” |
timestamp
different from datetime_columns
?descriptors
in Data Definition. This means they will be included in the TextEvals
preset or treated as descriptors when you plot them on the dashboard.
However, if you computed some scores or metadata externally and want to treat them as descriptors, you can map them explicitly:
prediction_probas
must exactly match the class labels. For example, if your classes are 0, 1, and 2, your probability columns must be named: “0”, “1”, “2”. Values in target
and prediction
columns should be strings.1
is a positive outcome)user_id | item_id | prediction (score) | target (relevance) |
---|---|---|---|
user_1 | item_1 | 1.95 | 0 |
user_1 | item_2 | 0.8 | 1 |
user_1 | item_3 | 0.05 | 0 |
user_id | item_id | prediction (rank) | target (relevance) |
---|---|---|---|
user_1 | item_1 | 1 | 0 |
user_1 | item_2 | 2 | 1 |
user_1 | item_3 | 3 | 0 |