All metrics

List of Metrics, Descriptors and Metric Presets available in Evidently.

How to use this page

This is a reference page. It shows all the available Metrics, Descriptors and Presets.

You can use the menu on the right to navigate the sections. We organize the Metrics by logical groups. Note that these groups do not match the Presets with a similar name. For example, there are more Data Quality Metrics than included in the DataQualityPreset.

How to read the tables

  • Name: the name of the Metric.

  • Description: plain text explanation. For Metrics, we also specify whether it applies to the whole dataset or individual columns.

  • Parameters: required and optional parameters for the Metric or Preset. We also specify the defaults that apply if you do not pass a custom parameter.

Metric visualizations. Each Metric includes a default render. To see the visualization, navigate to the example notebooks and run the notebook with all Metrics or Metric Presets.

We are doing our best to maintain this page up to date. In case of discrepancies, check the "All metrics" notebook in examples. If you notice an error, please send us a pull request with an update!

Metric Presets

Defaults: Presets use the default parameters for each Metric. You can see them in the tables below.

Data Quality Preset

DataQualityPreset captures column and dataset summaries. Input columns are required. Prediction and target are optional.

Composition:

  • DatasetSummaryMetric()

  • ColumnSummaryMetric() for all or specified сolumns

  • DatasetMissingValuesMetric()

Optional parameters:

  • columns

Data Drift Preset

DataDriftPreset evaluates the data distribution drift in all individual columns, and share of drifting columns in the dataset. Input columns are required.

Composition:

  • DataDriftTable() for all or specified columns

  • DatasetDriftMetric() for all or specified columns

Optional parameters:

  • columns

  • stattest

  • cat_stattest

  • num_stattest

  • per_column_stattest

  • text_stattest

  • stattest_threshold

  • cat_stattest_threshold

  • num_stattest_threshold

  • per_column_stattest_threshold

  • text_stattest_threshold

  • embeddings

  • embeddings_drift_method

  • drift_share

How to set data drift parameters, embeddings drift parameters.

Target Drift Preset

TargetDriftPreset evaluates the prediction or target drift. Target and/or prediction is required. Input features are optional.

Composition:

  • ColumnDriftMetric() for target and/or prediction columns

  • ColumnCorrelationsMetric() for target and/or prediction columns

  • TargetByFeaturesTable() for all or specified columns

  • ColumnValuePlot() for target and/or prediction columns - if the task is regression

Optional parameters:

  • columns

  • stattest

  • cat_stattest

  • num_stattest

  • per_column_stattest

  • stattest_threshold

  • cat_stattest_threshold

  • num_stattest_threshold

  • per_column_stattest_threshold

How to set data drift parameters.

Regression Preset

RegressionPreset evaluates the quality of a regression model. Prediction and target are required. Input features are optional.

Composition:

  • RegressionQualityMetric()

  • RegressionPredictedVsActualScatter()

  • RegressionPredictedVsActualPlot()

  • RegressionErrorPlot()

  • RegressionAbsPercentageErrorPlot()

  • RegressionErrorDistribution()

  • RegressionErrorNormality()

  • RegressionTopErrorMetric()

  • RegressionErrorBiasTable() for all or specified columns

Optional parameters:

  • columns

Classification Preset

ClassificationPreset evaluates the quality of a classification model. Prediction and target are required. Input features are optional.

Composition:

  • ClassificationQualityMetric()

  • ClassificationClassBalance()

  • ClassificationConfusionMatrix()

  • ClassificationQualityByClass()

  • ClassificationClassSeparationPlot() - if probabilistic classification

  • ClassificationProbDistribution()- if probabilistic classification

  • ClassificationRocCurve() - if probabilistic classification

  • ClassificationPRCurve() - if probabilistic classification

  • ClassificationPRTable() - if probabilistic classification

  • ClassificationQualityByFeatureTable() for all or specified columns

Optional parameters:

  • columns

  • probas_threshold

Text Overview Preset

TextOverviewPreset() provides a summary for a single or multiple text columns. Text columns are required.

Composition:

  • ColumnSummaryMetric() for text descriptors for all columns. Descriptors included:

    • Sentiment()

    • SentenceCount()

    • OOV()

    • TextLength()

    • NonLetterCharacterPercentage()

  • SemanticSimilarity() between each pair of text columns, if there is more than one.

Required parameters:

  • column_name or columns list

Optional parameters:

  • descriptors list

Text Evals

TextEvals() provides a simplified interface to list Descriptors for a given text column. It it returns a summary of evaluation results.

Composition:

  • ColumnSummaryMetric() for text descriptors for the specified text column:

    • Sentiment()

    • SentenceCount()

    • OOV()

    • TextLength()

    • NonLetterCharacterPercentage()

Required parameters:

  • column_name

RecSys (Recommender System) Preset

RecsysPreset evaluates the quality of the recommender system. Recommendations and true relevance scores are required. For some metrics, training data and item features are required.

Composition:

  • PrecisionTopKMetric()

  • RecallTopKMetric()

  • FBetaTopKMetric()

  • MAPKMetric()

  • NDCGKMetric()

  • MRRKMetric()

  • HitRateKMetric()

  • PersonalizationMetric()

  • PopularityBias()

  • RecCasesTable()

  • ScoreDistribution()

  • DiversityMetric()

  • SerendipityMetric()

  • NoveltyMetric()

  • ItemBiasMetric() (pass column as a parameter)

  • UserBiasMetric()(pass column as a parameter)

Required parameter:

  • k

Optional parameters:

  • min_rel_score: Optional[int]

  • no_feedback_users: bool

  • normalize_arp: bool

  • user_ids: Optional[List[Union[int, str]]]

  • display_features: Optional[List[str]]

  • item_features: Optional[List[str]]

  • user_bias_columns: Optional[List[str]]

  • item_bias_columns: Optional[List[str]]

Data Quality

Defaults for Missing Values. The metrics that calculate the number or share of missing values detect four types of missing values by default: Pandas nulls (None, NAN, etc.), "" (empty string), Numpy "-inf" value, Numpy "inf" value. You can also pass custom missing values as a parameter and specify if you want to replace the default list. Example:

DatasetMissingValuesMetric(missing_values=["", 0, "n/a", -9999, None], replace=True)

Text Evals

Text Evals only apply to text columns. To compute a Descriptor for a single text column, use a TextEvals Preset. Read docs.

You can also explicitly specify the Evidently Metric (e.g., ColumnSummaryMetric) to visualize the descriptor, or pick a Test (e.g., TestColumnValueMin) to run validations.

Descriptors: Text Patterns

Check for regular expression matches.

Descriptors: Text stats

Computes descriptive text statistics.

Descriptors: LLM-based

Use external LLMs with an evaluation prompt to score text data. (Also known as LLM-as-a-judge method).

Descriptors: Model-based

Use pre-trained machine learning models for evaluation.

Data Drift

Defaults for Data Drift. By default, all data drift metrics use the Evidently drift detection logic that selects a drift detection method based on feature type and volume. You always need a reference dataset.

To modify the logic or select a different test, you should set data drift parameters or embeddings drift parameters. You can choose from 20+ drift detection methods and optionally pass feature importances.

Classification

The metrics work both for probabilistic and non-probabilistic classification. All metrics are dataset-level. All metrics require column mapping of target and prediction.

Regression

All metrics are dataset-level. All metrics require column mapping of target and prediction.

Ranking and Recommendations

All metrics are dataset-level. Check individual metric descriptions here. All metrics require recommendations column mapping.

Optional shared parameters for multiple metrics:

  • no_feedback_users: bool = False. Specifies whether to include the users who did not select any of the items, when computing the quality metric. Default: False.

  • min_rel_score: Optional[int] = None. Specifies the minimum relevance score to consider relevant when calculating the quality metrics for non-binary targets (e.g., if a target is a rating or a custom score).

Last updated