All metrics

List of all the metrics and metric presets available in Evidently.

How to use this page

This is a reference page. It shows all the metrics and metric presets available in the library, and their parameters.

You can use the menu on the right to navigate the sections. We organize the metrics by logical groups. Note that these groups do not match the presets with a similar name. For example, there are more Data Quality metrics below than in the DataQualityPreset.

You can use this reference page to discover additional metrics to include in your custom report.

How to read the tables

  • Name: the name of the metric of a preset.

  • Description: plain text explanation of the metric, or the contents of the preset. For metrics, we also specify whether the metric applies to the whole dataset or individual columns.

  • Parameters: description of the required parameters and optional parameters you can pass to the corresponding metric or preset. For metrics, we also specify the default conditions. They apply if you do not pass a custom parameter.

Metric visualizations. Each metric also includes a default render. If you want to see the visualization, navigate to the example notebooks and run the notebook with all metrics or with all metric presets.

We are doing our best to maintain this page up to date. In case of discrepancies, consult the API reference or the current version of the "All metrics" example notebook in the Examples section. If you notice an error, please send us a pull request to update the documentation!

Metric Presets

Defaults: each Metric in a Preset uses the default parameters for this Metric. You can see them in the tables below.

Preset name and DescriptionParameters

DataQualityPreset Evaluates the data quality and provides descriptive stats. Input features are required. Prediction and target are optional. Contents:

  • DatasetSummaryMetric()

  • ColumnSummaryMetric(column_name=column_name) for all or сolumns if provided

  • DatasetMissingValuesMetric()

  • DatasetCorrelationsMetric()

Optional: columns

DataDriftPreset Evaluates the data drift in the individual columns and the dataset. Input features are required. Contents:

  • DataDriftTable(сolumns=сolumns) or all if not listed

  • DatasetDriftMetric(сolumns=сolumns) or all if not listed

Optional:

  • columns

  • stattest

  • cat_stattest

  • num_stattest

  • per_column_stattest

  • text_stattest

  • stattest_threshold

  • cat_stattest_threshold

  • num_stattest_threshold

  • per_column_stattest_threshold

  • text_stattest_threshold

  • embeddings

  • embeddings_drift_method

  • drift_share

How to set data drift parameters, embeddings drift parameters.

TargetDriftPreset Evaluates the prediction or target drift. Target or prediction is required. Input features are optional. Contents:

  • ColumnDriftMetric(column_name=target, prediction)

  • ColumnCorrelationsMetric(column_name=target, prediction)

  • TargetByFeaturesTable(columns=columns) or all if not listed

If regression:

  • ColumnValuePlot(column_name=target, prediction)

Optional:

  • columns

  • stattest

  • cat_stattest

  • num_stattest

  • per_column_stattest

  • stattest_threshold

  • cat_stattest_threshold

  • num_stattest_threshold

  • per_column_stattest_threshold

How to set data drift parameters.

RegressionPreset Evaluates the quality of a regression model. Prediction and target are required. Input features are optional. Contents:

  • RegressionQualityMetric()

  • RegressionPredictedVsActualScatter()

  • RegressionPredictedVsActualPlot()

  • RegressionErrorPlot()

  • RegressionAbsPercentageErrorPlot()

  • RegressionErrorDistribution()

  • RegressionErrorNormality()

  • RegressionTopErrorMetric()

  • RegressionErrorBiasTable(columns=columns) or all if not listed

Optional: columns

ClassificationPreset Evaluates the quality of a classification model. Prediction and target are required. Input features are optional. Contents:

  • ClassificationQualityMetric()

  • ClassificationClassBalance()

  • ClassificationConfusionMatrix()

  • ClassificationQualityByClass()

If probabilistic classification, also:

  • ClassificationClassSeparationPlot()

  • ClassificationProbDistribution()

  • ClassificationRocCurve()

  • ClassificationPRCurve()

  • ClassificationPRTable()

  • ClassificationQualityByFeatureTable(columns=columns) or all if not listed

Optional:

  • columns

  • probas_threshold

  • k

TextOverviewPreset(column_name=”text”) Evaluates data drift and descriptive statistics for text data. Input features (text) are required. Contents:

  • ColumnSummaryMetric()

  • TextDescriptorsDistribution()

  • TextDescriptorsCorrelation()

If reference data is provided, also:

  • ColumnDriftMetric()

  • TextDescriptorsDriftMetric()

Required: column_name

RecsysPreset Evaluates the quality of the recommender system. Recommendations and true relevance scores are required. For some metrics, training data and item features are required. Contents:

  • PrecisionTopKMetric()

  • RecallTopKMetric()

  • FBetaTopKMetric()

  • MAPKMetric()

  • NDCGKMetric()

  • MRRKMetric()

  • HitRateKMetric()

  • PersonalizationMetric()

  • PopularityBias()

  • RecCasesTable()

  • ScoreDistribution()

  • DiversityMetric()

  • SerendipityMetric()

  • NoveltyMetric()

  • ItemBiasMetric() (pass column as a parameter)

  • UserBiasMetric()(pass column as a parameter)

Required: k Optional:

  • min_rel_score: Optional[int]

  • no_feedback_users: bool

  • normalize_arp: bool

  • user_ids: Optional[List[Union[int, str]]]

  • display_features: Optional[List[str]]

  • item_features: Optional[List[str]]

  • user_bias_columns: Optional[List[str]]

  • item_bias_columns: Optional[List[str]]

Data Integrity

Defaults for Missing Values. The metrics that calculate the number or share of missing values detect four types of the values by default: Pandas nulls (None, NAN, etc.), "" (empty string), Numpy "-inf" value, Numpy "inf" value. You can also pass a custom missing values as a parameter and specify if you want to replace the default list. Example:

DatasetMissingValuesMetric(missing_values=["", 0, "n/a", -9999, None], replace=True)
Metric nameDescriptionParameters

DatasetSummaryMetric()

Dataset-level. Calculates various descriptive statistics for the dataset, incl. the number of columns, rows, cat/num features, missing values, empty values, and duplicate values.

Required: n/a Optional:

  • missing_values = [], replace = True/False (default = four types of missing values, see above)

  • almost_constant_threshold (default = 0.95)

  • almost_duplicated_threshold (default = 0.95)

DatasetMissingValuesMetric()

Dataset-level. Calculates the number and share of missing values in the dataset. Displays the number of missing values per column.

Required: n/a Optional:

  • missing_values = [], replace = True/False (default = four types of missing values, see above)

ColumnSummaryMetric(column_name="age")

Column-level. Calculates various descriptive statistics for the column, incl. the number of missing, empty, duplicate values, etc. The stats depend on the column type: numerical, categorical, text or DateTime.

Required: column_name Optional: n/a

ColumnMissingValuesMetric(column_name="education")

Column-level. Calculates the number and share of missing values in the column.

Required: n/a Optional:

  • missing_values = [], replace = True/False (default = four types of missing values, see above)

ColumnRegExpMetric(column_name="relationship", reg_exp=r".child.")

Column-level. Calculates the number and share of the values that do not match a defined regular expression.

Required:

  • column_name

  • reg_exp

Optional:

  • top (the number of the most mismatched columns to return, default = 10)

Data Quality

Metric nameDescriptionParameters

ConflictPredictionMetric()

Dataset-level. Calculates the number of instances where the model returns a different output for an identical input. Can be a signal of low-quality model or data errors.

Required: n/a Optional: n/a

ConflictTargetMetric()

Dataset-level. Calculates the number of instances where there is a different target value or label for an identical input. Can be a signal of a labeling or data error.

Required: n/a Optional: n/a

DatasetCorrelationsMetric()

Dataset-level. Calculates the correlations between the columns in the dataset. Visualizes the heatmap.

Required: n/a Optional: n/a

ColumnDistributionMetric(column_name="education")

Column-level. Plots the distribution histogram and returns bin positions and values for the given column.

Required: column_name Optional: n/a

ColumnValuePlot(column_name="education")

Column-level. Plots the values in time.

Required: column_name Optional: n/a

ColumnQuantileMetric(column_name="education-num", quantile=0.75)

Column-level. Calculates the defined quantile value and plots the distribution for the given column.

Required:

  • column_name

  • quantile

Optional: n/a

ColumnCorrelationsMetric(column_name="education")

Column-level. Calculates the correlations between the defined column and all the other columns in the dataset.

Required: column_name Optional: n/a

ColumnValueListMetric(column_name="relationship", values=["Husband", "Unmarried"])

Column-level. Calculates the number of values in the list / out of the list / not found in a given column. The value list should be specified.

Required:

  • column_name

  • values

Optional: n/a

ColumnValueRangeMetric(column_name="age", left=10, right=20)

Column-level. Calculates the number and share of values in the specified range / out of range in a given column. Plots the distributions.

Required:

  • column_name

  • left

  • right

TextDescriptorsDistribution(column_name=”text”)

Column-level. Calculates and visualizes distributions for auto-generated text descriptors (text length, the share of out-of-vocabulary words, etc.)

Required:

  • column_name

TextDescriptorsCorrelationMetric(column_name=”text”)

Column-level. Calculates and visualizes correlations between auto-generated text descriptors and other columns in the dataset.

Required:

  • column_name

Data Drift

Defaults for Data Drift. By default, all data drift tests use the Evidently drift detection logic that selects a different statistical test or metric based on feature type and volume. You always need a reference dataset.

To modify the logic or select a different test, you should set data drift parameters or embeddings drift parameters.

Metric nameDescriptionParameters

DatasetDriftMetric()

Dataset-level. Calculates the number and share of drifted features. Returns true/false for the dataset drift at a given threshold (defined by the share of drifting features). Each feature is tested for drift individually using the default algorithm, unless a custom approach is specified.

Required: n/a Optional:

  • сolumns (default=all)

  • drift_share(default for dataset drift = 0.5)

  • stattest

  • cat_stattest

  • num_stattest

  • per_column_stattest

  • stattest_threshold

  • cat_stattest_threshold

  • num_stattest_threshold

  • per_column_stattest_threshold

How to set data drift parameters.

DataDriftTable()

Dataset-level. Calculates data drift for all columns in the dataset, or for a defined list of columns. Returns drift detection results for each column and visualizes distributions in a table. Uses the default drift algorithm of test selection, unless a custom approach is specified.

Required: n/a Optional:

  • сolumns

  • stattest

  • cat_stattest

  • num_stattest

  • per_column_stattest

  • stattest_threshold

  • cat_stattest_threshold

  • num_stattest_threshold

  • per_column_stattest_threshold

How to set data drift parameters, embeddings drift parameters.

ColumnDriftMetric('age')

Column-level. Calculates data drift for a defined column (tabular or text). Visualizes distributions. Uses the default-selected test unless a custom is specified.

Required:

  • column_name

Optional:

  • stattest

  • stattest_threshold

How to set data drift parameters

TextDescriptorsDriftMetric(column_name=”text”)

Column-level. Calculates data drift for auto-generated text descriptors and visualizes the distributions of text characteristics.

Required:

  • column_name

Optional:

  • stattest

  • stattest_threshold

EmbeddingsDriftMetric('small_subset')

Column-level. Calculates data drift for embeddings.

Required:

  • embeddings_name

Optional:

  • drift_method

How to set embeddings drift parameters.

Classification

The metrics work both for probabilistic and non-probabilistic classification. All metrics are dataset-level.

Metric nameDescriptionParameters

ClassificationDummyMetric()

Calculates the quality of the dummy model built on the same data. This can serve as a baseline.

Required: n/a Optional: n/a

ClassificationQualityMetric()

Calculates various classification performance metrics, incl. precision, accuracy, recall, F1-score, TPR, TNR, FPR, and FNR. For probabilistic classification, also: ROC AUC score, LogLoss.

Required:: n/a Optional:

  • probas_threshold (default for classification = None; default for probabilistic classification = 0.5)

  • k (default = None)

ClassificationClassBalance()

Calculates the number of objects for each label. Plots the histogram.

Required: n/a Optional: n/a

ClassificationConfusionMatrix()

Calculates the TPR, TNR, FPR, FNR, and plots the confusion matrix.

Required: n/a Optional:

  • probas_threshold(default for classification = None; default for probabilistic classification = 0.5)

  • k (default = None)

ClassificationQualityByClass()

Calculates the classification quality metrics for each class. Plots the matrix.

Required: n/a Optional:

  • probas_threshold(default for classification = None; default for probabilistic classification = 0.5)

  • k (default = None)

ClassificationClassSeparationPlot()

Visualization of the predicted probabilities by class. Applicable for probabilistic classification only.

Required: n/a Optional: n/a

ClassificationProbDistribution()

Visualization of the probability distribution by class. Applicable for probabilistic classification only.

Required: n/a Optional: n/a

ClassificationRocCurve()

Plots ROC Curve. Applicable for probabilistic classification only.

Required: n/a Optional: n/a

ClassificationPRCurve()

Plots Precision-Recall Curve. Applicable for probabilistic classification only.

Required: n/a Optional: n/a

ClassificationPRTable()

Calculates the Precision-Recall table that shows model quality at a different decision threshold.

Required: n/a Optional: n/a

ClassificationQualityByFeatureTable()

Plots the relationship between feature values and model quality.

Required: n/a Optional:

  • columns(default = all categorical and numerical columns)

Regression

All metrics are dataset-level.

Metric nameDescriptionParameters

RegressionDummyMetric()

Calculates the quality of the dummy model built on the same data. This can serve as a baseline.

Required: n/a Optional: n/a

RegressionQualityMetric()

Calculates various regression performance metrics, incl. Mean Error, MAE, MAPE, etc.

Required: n/a Optional: n/a

RegressionPredictedVsActualScatter()

Visualizes predicted vs actual values in a scatter plot.

Required: n/a Optional: n/a

RegressionPredictedVsActualPlot()

Visualizes predicted vs. actual values in a line plot.

Required: n/a Optional: n/a

RegressionErrorPlot()

Visualizes the model error (predicted - actual) in a line plot.

Required: n/a Optional: n/a

RegressionAbsPercentageErrorPlot()

Visualizes the absolute percentage error in a line plot.

Required: n/a Optional: n/a

RegressionErrorDistribution()

Visualizes the distribution of the model error in a histogram.

Required: n/a Optional: n/a

RegressionErrorNormality()

Visualizes the quantile-quantile plot (Q-Q plot) to estimate value normality.

Required: n/a Optional: n/a

RegressionTopErrorMetric()

Calculates the regression performance metrics for different groups: top-X% of predictions with overestimation, top-X% of predictions with underestimation, and the rest. Visualizes the group division on a scatter plot with predicted vs. actual values.

Required: n/a Optional:

  • top_error (default=0.05; the metrics are calculated for top-5% predictions with overestimation and underestimation).

RegressionErrorBiasTable()

Plots the relationship between feature values and model quality per group (for top-X% error groups, as above).

Required: n/a Optional:

  • columns(default = all categorical and numerical columns)

  • top_error (default=0.05; the metrics are calculated for top-5% predictions with overestimation and underestimation).

Ranking and Recommendations

All metrics are dataset-level. Check individual metric descriptions here.

Optional shared parameters for multiple metrics:

  • no_feedback_users: bool = False. Specifies whether to include the users who did not select any of the items, when computing the quality metric. Default: False.

  • min_rel_score: Optional[int] = None. Specifies the minimum relevance score to consider relevant when calculating the quality metrics for non-binary targets (e.g., if a target is a rating or a custom score).

Metric nameDescriptionParameters

RecallTopKMetric()

Calculates the recall at k.

Required:

  • k

Optional:

  • no_feedback_users

  • min_rel_score

PrecisionTopKMetric()

Calculates the precision at k.

Required:

  • k

Optional:

  • no_feedback_users

  • min_rel_score

FBetaTopKMetric()

Calculates the F-measure at k.

Required:

  • beta(default = 1)

  • k

Optional:

  • no_feedback_users

  • min_rel_score

MAPKMetric()

Calculates the Mean Average Precision (MAP) at k.

Required:

  • k

Optional:

  • no_feedback_users

  • min_rel_score

MARKMetric()

Calculates the Mean Average Recall (MAR) at k.

Required:

  • k

Optional:

  • no_feedback_users

  • min_rel_score

NDCGKMetric()

Calculates the Normalized Discounted Cumulative Gain at k.

Required:

  • k

Optional:

  • no_feedback_users

  • min_rel_score

MRRKMetric()

Calculates the Mean Reciprocal Rank (MRR) at k.

Required:

  • k

Optional:

  • min_rel_score

  • no_feedback_users

HitRateKMetric()

Calculates the hit rate at k: a share of users for which at least one relevant item is included in the K.

Required:

  • k

Optional:

  • min_rel_score

  • no_feedback_users

DiversityMetric()

Calculates intra-list Diversity at k: diversity of recommendations shown to each user in top-K recommendations, averaged by all users.

Required:

  • k

  • item_features: List

Optional:

  • -

NoveltyMetric()

Calculates novelty at k: novelty of recommendations shown to each user in top-K recommendations, averaged by all users. Requires a training dataset.

Required:

  • k

Optional:

  • -

SerendipityMetric()

Calculates serendipity at k: how unusual the relevant recommendations are in top-K, averaged by all users. Requires a training dataset.

Required:

  • k

  • item_features: List

Optional:

  • min_rel_score

PersonalizationMetric()

Measures the average uniqueness of each user's top-K recommendations.

Required:

  • k

Optional:

  • -

PopularityBias()

Evaluates the popularity bias in recommendations by computing ARP (average recommendation popularity), Gini index, and coverage. Requires a training dataset

Required:

  • K

  • normalize_arp (default: False) - whether to normalize ARP calculation by the most popular item in training

Optional:

  • -

ItemBiasMetric()

Visualizes the distribution of recommendations by a chosen dimension (column), сomparative to its distribution in the training set. Requires a training dataset.

Required:

  • k

  • column_name

Optional:

  • -

UserBiasMetric()

Visualizes the distribution of the chosen category (e.g. user characteristic), comparative to its distribution in the training dataset. Requires a training dataset.

Required:

  • k

  • column_name

Optional:

  • -

ScoreDistribution()

Computes the predicted score entropy. Visualizes the distribution of the scores at k (and all scores, if available). Applies only when the recommendations_type is a score.

Required:

  • k

Optional:

  • -

RecCasesTable()

Shows the list of recommendations for specific user IDs (or 5 random if not specified).

Required:

  • -

Optional:

  • display_features: List

  • user_ids: List

  • train_item_num: int

Last updated