latest
Search…
⌃K
Links

All tests

List of all tests and test presets available in Evidently.
This is a reference page. You can return here:
  • To discover available tests and choose which to include in a custom test suite.
  • To understand which parameters you can change for a specific test or preset.
  • To verify which tests are included in a test preset.
You can use the menu on the right to navigate the sections. We organize individual tests into groups, e.g. Data Quality, Data Integrity, Regression, etc. Note that these groups do not match the presets with similar names. For example, there are more Data Quality tests than in the DataQualityTestPreset.

How to read the tables

  • Name: the name of the test or test preset.
  • Description: plain text explanation of the test, or the content of the preset. For tests, we specify whether it applies to the whole dataset or individual columns.
  • Parameters: available configurations.
    • Required parameters are necessary for calculations, e.g. a column name for a column-level test.
    • Optional parameters modify how the underlying metric is calculated, e.g. which statistical test or correlation method is used.
    • Test condition parameters help set the conditions (e.g. equal, not equal, greater than, etc.) that define the expectations from the test output. If the condition is violated, the test returns a fail. Here you can see the complete list of the standard condition parameteres. They apply to most of the tests, and are optional.
  • Default tests condition: they apply if you do not set a custom сondition.
    • With reference: the test conditions that apply when you pass a reference dataset and Evidently can derive expectations from it.
    • No reference: the test conditions that apply if you do not provide the reference. They are based on heuristics.
Test visualizations. Each test also includes a default render. If you want to see the visualization, navigate to the example notebooks.
We are doing our best to maintain this page up to date. In case of discrepancies, consult the API reference or the "All tests" notebook in the Examples section. If you notice an error, please send us a pull request to update the documentation!

Test Presets

Default conditions for each test in the preset match the test's defaults. You can see them in the following sections on this page.
Preset name and Description
Parameters
NoTargetPerformanceTestPreset
  • TestShareOfDriftedColumns()
  • TestColumnDrift(column_name=prediction)
  • TestColumnShareOfMissingValues(column_name=column_name) for all or сolumns if provided
  • TestShareOfOutRangeValues(column_name=column_name) for all numerical_columns or among columns if provided
  • TestShareOfOutListValues(column_name=column_name) for all categorical_columns or among columns if provided
  • TestMeanInNSigmas(column_name=column_name, n=2) for all numerical_columns or among columns if provided
Optional:
  • columns
  • stattest
  • cat_stattest
  • num_stattest
  • per_column_stattest
  • stattest_threshold
  • cat_stattest_threshold
  • num_stattest_threshold
  • per_column_stattest_threshold
  • drift_share
DataStabilityTestPreset
  • TestNumberOfRows()
  • TestNumberOfColumns()
  • TestColumnsType()
  • TestColumnShareOfMissingValues()
  • TestShareOfOutRangeValues(column_name=column_name) for all numerical_columns or among columns if provided
  • TestShareOfOutListValues(column_name=column_name) for all categorical_columns or among columns if provided
  • TestMeanInNSigmas(column_name=column_name, n=2) for all numerical_columns or among columns if provided
Optional:
  • columns
DataQualityTestPreset
  • TestColumnShareOfMissingValues(column_name=column_name) for all or columns
  • TestMostCommonValueShare(column_name=column_name) for all or columns
  • TestNumberOfConstantColumns()
  • TestNumberOfDuplicatedColumns()
  • TestNumberOfDuplicatedRows()
  • TestHighlyCorrelatedColumns()
Optional:
  • columns
DataDriftTestPreset
  • TestShareOfDriftedColumns()
  • TestColumnDrift(column_name=column_name) for all or сolumns if provided
Optional:
  • columns
  • stattest
  • cat_stattest
  • num_stattest
  • per_column_stattest
  • stattest_threshold
  • cat_stattest_threshold
  • num_stattest_threshold
  • per_column_stattest_threshold
RegressionTestPreset
  • TestValueMeanError()
  • TestValueMAE()
  • TestValueRMSE()
  • TestValueMAPE()
N/A
MulticlassClassificationTestPreset
  • TestAccuracyScore()
  • TestF1Score()
  • TestPrecisionByClass()
  • TestRecallByClass()
  • TestColumnDrift(column_name=target)
  • TestNumberOfRows()
If probabilistic classification, also:
  • TestLogLoss()
  • TestRocAuc()
Optional:
  • stattest
  • stattest_threshold
BinaryClassificationTopKTestPreset
  • TestAccuracyScore(k=k)
  • TestPrecisionScore(k=k)
  • TestRecallScore(k=k)
  • TestF1Score(k=k)
  • TestColumnDrift(column_name=target)
  • TestRocAuc()
  • TestLogLoss()
Required:
  • k
Optional:
  • stattest
  • stattest_threshold
  • probas_threshold
BinaryClassificationTestPreset
  • TestColumnDrift(column_name=target)
  • TestPrecisionScore()
  • TestRecallScore()
  • TestF1Score()
  • TestAccuracyScore()
If probabilistic classification, also:
  • TestRocAuc()
Optional:
  • stattest
  • stattest_threshold
  • probas_threshold

Data Integrity

Defaults for Data Integrity. If there is no reference data or defined conditions, data integrity will be checked against a set of heuristics. If you pass the reference data, Evidently will automatically derive all relevant statistics (e.g., number of columns, rows, share of missing values etc.) and apply default test conditions. You can also pass custom test conditions.
Defaults for Missing Values. The metrics that calculate the number or share of missing values detect four types of the values by default: Pandas nulls (None, NAN, etc.), "" (empty string), Numpy "-inf" value, Numpy "inf" value. You can also pass a custom missing values as a parameter and specify if you want to replace the default list. Example:
TestNumberOfMissingValues(missing_values=["", 0, "n/a", -9999, None], replace=True)
Test name
Description
Parameters
Default test condition
TestNumberOfRows()
Dataset-level. Tests the number of rows against the reference or a defined condition.
Required: N/A Optional: N/A Test conditions:
  • standard parameters
Expects +/-10% or >30. With reference: the test fails if the number of rows differs by over 10% from the reference. No reference: the test fails if the number of rows is <= 30.
TestNumberOfColumns()
Dataset-level. Tests the number of columns against the reference or a defined condition.
Required: N/A Optional: N/A Test conditions:
  • standard parameters
Expects the same or non-zero. With reference: the test fails if the number of columns differs from the reference. No reference: the test fails if the number of columns is 0.
TestNumberOfMissingValues()
Dataset-level. Tests the number of missing values in the dataset against the reference or a defined condition.
Required: N/A Optional:
  • missing_values = [], replace = True/False (default = default list)
Test conditions:
  • standard parameters
Expects up to +10% or 0. With reference: the test fails if the share of missing values is over 10% higher than in reference. No reference: the test fails if the dataset contains missing values.
TestShareOfMissingValues()
Dataset-level. Tests the share of missing values in the dataset against the reference or a defined condition.
Required: N/A Optional:
  • missing_values = [], replace = True/False (default = default list)
Test conditions:
  • standard parameters
Expects up to +10% or 0. With reference: the test fails if the share of missing values is over 10% higher than in reference. No reference: the test fails if the dataset contains missing values.
TestNumberOfColumnsWithMissingValues()
Dataset-level. Tests the number of columns that contain missing values in the dataset against the reference or a defined condition.
Required: N/A Optional:
  • missing_values = [], replace = True/False (default = default list)
Test conditions:
  • standard parameters
Expects <= or 0. With reference: the test fails if the number of columns with missing values is higher than in reference. No reference: the test fails if the dataset contains columns with missing values.
TestShareOfColumnsWithMissingValues()
Dataset-level. Tests the share of columns that contain missing values in the dataset against the reference or a defined condition.
Required: N/A Optional:
  • missing_values = [], replace = True/False (default = default list)
Test conditions:
  • standard parameters
Expects <= or 0. With reference: the test fails if the share of columns with missing values is higher than in reference. No reference: the test fails if the dataset contains columns with missing values.
TestNumberOfRowsWithMissingValues()
Dataset-level. Tests the number of rows that contain missing values against the reference or a defined condition.
Required: N/A Optional:
  • missing_values = [], replace = True/False (default = default list)
Test conditions
  • standard parameters
Expects up to +10% or 0. With reference: the test fails if the share of rows with missing values is over 10% higher than in reference. No reference: the test fails if the dataset contains rows with missing values.
TestShareOfRowsWithMissingValues()
Dataset-level. Tests the share of rows that contain missing values against the reference or a defined condition.
Required: N/A Optional:
  • missing_values = [], replace = True/False (default = default list)
Test conditions
  • standard parameters
Expects up to +10% or 0. With reference: the test fails if the share of rows with missing values is over 10% higher than in reference. No reference: the test fails if the dataset contains rows with missing values.
TestNumberOfDifferentMissingValues()
Dataset-level. Tests the number of differently encoded missing values in the dataset against the reference or a defined condition. Detects 4 types of missing values by default and/or values from a user list.
Required: N/A Optional:
  • missing_values: list <br>replace: bool = True(default = default list)
Test conditions
  • standard parameters
Expects <= or none. With reference: the test fails if the current dataset has more types of missing values. No reference: the test fails if the current dataset contains missing values.
TestNumberOfConstantColumns()
Dataset-level. Tests the number of columns with all constant values against reference or a defined condition.
Required: N/A Optional: N/A Test conditions:
  • standard parameters
Expects =< or none. With reference: the test fails if the number of constant columns is higher than in the reference. No reference: the test fails if there is at least one constant column.
TestNumberOfEmptyRows()
Dataset-level. Tests the number of empty rows against reference or a defined condition.
Required: N/A Optional: N/A Test conditions:
  • standard parameters
Expects +/- 10% or none. With reference: the test fails if the share of empty rows is over 10% higher or lower than in the reference. No reference: the test fails if there is at least one empty row.
TestNumberOfEmptyColumns()
Dataset-level. Tests the number of empty columns against reference or a defined condition.
Required: N/A Optional: N/A Test conditions:
  • standard parameters
Expects =< or none. With reference: the test fails if the number of empty columns is higher than in the reference. No reference: the test fails if there is at least one empty column.
TestNumberOfDuplicatedRows()
Dataset-level. Tests the number of duplicate rows against reference or a defined condition.
Required: N/A Optional: N/A Test conditions:
  • standard parameters
Expects +/- 10% or none. With reference: the test fails if the share of duplicate rows is over 10% higher or lower than in the reference. No reference: the test fails if there is at least one duplicate row.
TestNumberOfDuplicatedColumns()
Dataset-level. Tests the number of duplicate columns against reference or a defined condition.
Required: N/A Optional: N/A Test conditions:
  • standard parameters
Expects =< or none. With reference: the test fails if the number of duplicate columns is higher than in the reference. No reference: the test fails if there is at least one duplicate column.
TestColumnsType()
Dataset-level. Tests the types of all columns against the reference.
Required: N/A Optional: columns_type: dict Test conditions: N/A
Expects types to match. With reference: the test fails if at least one column type does not match. No reference: N/A
TestColumnNumberOfMissingValues(column_name='name')
Column-level. Tests the number of missing values in a given column against the reference or a defined condition.
Required:
  • column_name
Optional:
  • missing_values = [], replace = True/False (default = default list)
Test conditions
  • standard parameters
Expects up to 10% or none. With reference: the test fails if the share of missing values in a column is over 10% higher than in reference. No reference: the test fails if the column contains missing values.
TestColumnShareOfMissingValues(column_name='name')
Column-level. Tests the share of missing values in a given column against the reference or a defined condition.
Required:
  • column_name
Optional:
  • missing_values = [], replace = True/False (default = default list)
Test conditions
  • standard parameters
Expects up to 10% or none. With reference: the test fails if the share of missing values in a column is over 10% higher than in reference. No reference: the test fails if the column contains missing values.
TestColumnNumberOfDifferentMissingValues(column_name='name')
Column-level. Tests the number of differently encoded missing values in the column against reference or a defined condition. Detects 4 types of missing values by default and/or values from a user list.
Required:
  • column_name
Optional:
  • missing_values = [], replace = True/False (default = default list)
Test conditions:
  • standard parameters
Expects <= or none. With reference: the test fails if the current column has more types of missing values. No reference: The test fails if the column contains missing values.
TestColumnAllConstantValues(column_name='name')
Column-level. Tests if all the values in a given column are constant.
Required:
  • column_name
Optional: N/A Test conditions: N/A
Expects non-constant. The test fails if all values in a given column are constant.
TestColumnAllUniqueValues(column_name='name')
Column-level. Tests if all the values in a given column are unique.
Required:
  • column_name
Optional: N/A Test conditions: N/A
Expects all unique (e.g., IDs). The test fails if at least one value in a given column is not unique.
TestColumnRegExp(column_name='name, reg_exp='^[0..9]')
Column-level. Tests the number of values in a column that do not match a defined regular expression, against reference or a defined condition.
Required:
  • column_name
  • reg_exp
Optional: N/A Test conditions:
  • standard parameters
With reference: the test fails if the share of values that match a regular expression is over 10% higher or lower than in the reference. No reference: the test fails if at least one of the values does not match a regular expression.

Data Quality

Defaults for data quality. If there is no reference data or defined conditions, data quality will be checked against a set of heuristics. If you pass the reference data, Evidently will automatically derive all relevant statistics (e.g., min value, max value, value range, value list, etc.) and apply default test conditions. You can also pass custom test conditions.
Test name
Description
Parameters
Default test conditions
TestConflictTarget()
Dataset-level. Tests if there are conflicts in the target (instances where a different label is assigned for an identical input).
N/A
Expects no conflicts in the target (with or without reference).
TestConflictPrediction()
Dataset-level. Tests if there are conflicts in the prediction (instances where a different prediction is made for an identical input).
N/A
Expects no conflicts in the target (with or without reference).
TestTargetPredictionCorrelation()
Dataset-level. Tests the strength of correlation between the target and prediction.
Required: N/A Optional:
  • method (default = pearson, available = pearson, spearman, kendall, cramer_v)
Test conditions:
  • standard parameters
Expects +/- 0.25 in correlation strength, or > 0. With reference: the test fails if there is a 0.25+ change in the correlation strength between target and prediction. No reference: the test fails if the correlation between target and prediction <=0
TestHighlyCorrelatedColumns()
Dataset-level. Tests the strongest correlation between a pair of features, against reference or a defined condition.
Required: N/A Optional:
  • method (default = pearson, available = pearson, spearman, kendall, cramer_v)
Test conditions:
  • standard parameters
Expects +/- 10% in max correlation strength, or < 0.9. With reference: the test fails if there is a 10%+ change in the correlation strength for the most correlated feature pair. No reference: the test fails if there is at least one pair of features with the correlation >= 0.9
TestTargetFeaturesCorrelations()
Dataset-level. Tests if any of the features is highly correlated with the target. Example use: to detect target leak.
Required: N/A Optional:
  • 'method (default = pearson, available = pearson, spearman, kendall, cramer_v)
Test conditions:
  • standard parameters
Expects +/- 10% in max correlation strength, or < 0.9. With reference: the test fails if there is a 10%+ change in the correlation strength for the feature most correlated with the target. No reference: the test fails if at least one feature is correlated with the target >= 0.9
TestPredictionFeaturesCorrelations()
Dataset-level. Tests if any of the features is highly correlated with the prediction Example use: to detect when predictions rely on a single feature.
Required: N/A Optional:
  • method (default = pearson, available = pearson, spearman, kendall, cramer_v)
Test conditions:
  • standard parameters
Expects +/- 10% in max correlation strength, or < 0.9. With reference: the test fails if there is a 10%+ change in the correlation strength for the feature most correlated with the prediction. No reference: the test fails if at least one feature is correlated with the prediction >= 0.9
TestCorrelationChanges()
Dataset-level. Tests the number of correlation violations (significant change in the correlation strength between the two features).
Required: N/A Optional:
  • method (default = pearson, available = pearson, spearman, kendall, cramer_v)
  • corr_diff (default = 0.25)
Test conditions:
  • standard parameters
Expects none. With reference: the test fails if at least 1 correlation violation is detected. No reference: N/A
TestColumnValueMin(column_name='num-column')
Column-level. Tests the minimum value of a given numerical column against reference or a defined condition.
Required:
  • column_name
Optional: N/A Test conditions:
  • standard parameters
Expects not lower. With reference: the test fails if the minimum value is lower than in the reference. No reference: N/A
TestColumnValueMax(column_name='num-column')
Column-level. Tests the maximum value of a given numerical column against reference or a defined condition.
Required:
  • column_name
Optional: N/A Test conditions:
  • standard parameters
Expects not higher. With reference: the test fails if the maximum value is higher than in the reference. No reference: N/A
TestColumnValueMean(column_name='num-column')
Column-level. Tests the mean value of a given numerical column against reference or a defined condition.
Required:
  • column_name
Optional: N/A Test conditions:
  • standard parameters
Expects +/-10%. With reference: the test fails if the mean value is different by more than 10%. No reference: N/A
TestColumnValueMedian(column_name='num-column')
Column-level. Tests the median value of a given numerical column against reference or a defined condition.
Required:
  • column_name
Optional: N/A Test conditions:
  • standard parameters
Expects +/-10%. With reference: the test fails if the median value is different by more than 10%. No reference: N/A
TestColumnValueStd(column_name='num-column')
Column-level. Tests the standard deviation of a given numerical column against reference or a defined condition.
Required:
  • column_name
Optional: N/A Test conditions:
  • standard parameters
Expects +/-10%. With reference: the test fails if the standard deviation is different by more than 10%. No reference: N/A
TestNumberOfUniqueValues(column_name='name')
Column-level. Tests the number of unique values in a given column against reference or a defined condition.
Required:
  • column_name
Optional: N/A Test conditions:
  • standard parameters
Expects +/-10%. With reference: the test fails if the share of unique values is different by more than 10%. No reference: N/A
TestUniqueValuesShare(column_name='name')
Column-level. Tests the share of unique values in a given column against reference or a defined condition.
Required:
  • column_name
Optional: N/A Test conditions:
  • standard parameters
Expects +/-10%. With reference: the test fails if the share of unique values is different by more than 10%. No reference: N/A
TestMostCommonValueShare(column_name='name')
Column-level. Tests the share of the most common value in a given column against reference or a defined condition.
Required:
  • column_name
Optional: N/A Test conditions:
  • standard parameters
Expects +/-10%. With reference: the test fails if the share of the most common value is different by more than 10% from the reference. No reference: the test fails if the share of the most common value is >= 80%.
TestMeanInNSigmas(column_name='num-column')
Column-level. Tests if the mean value in a given numerical column is within the expected range , defined in standard deviations. This test requires reference.
Required:
  • column_name
Optional:
  • n_sigmas
Expects +/- 2 std dev. With reference: the test fails if the current mean value is out of the +/- 2 std dev interval from the reference mean value. No reference: N/A
TestValueRange(column_name='num_column')
Column-level. Tests if a numerical column contains values out of the min-max range.
Required:
  • column_name
Optional:
  • left
  • right
Test conditions: N/A
Expects all values to be in range. With reference: the test fails if the column contains values out of the min-max range as seen in the reference. No reference: N/A
TestShareOfOutRangeValues(column_name='num_column')
Column-level. Tests the share of values out of the min-max range against reference or a defined condition.
Required:
  • column_name
Optional:
  • left
  • right
Test conditions:
  • standard parameters
Expects all values to be in range.
TestNumberOfOutRangeValues(column_name='num_column')
Column-level. Tests the number of values out of the min-max range against reference or a defined condition.
Required:
  • column_name
Optional:
  • left
  • right
Test conditions:
  • standard parameters
Expects all values to be in range. With reference: the test fails if at least 1 value is out of range (as seen in reference). No reference: N/A
TestValueList(column_name='cat_column')
Column-level. Tests if a categorical column contains values out of the list.
Required:
  • column_name
Optional:
  • values: List[str]
Test conditions: N/A
Expects all values to be in the list. With reference: the test fails if the column contains values out of the list (as seen in reference). No reference: N/A
TestNumberOfOutListValues(column_name='cat_column')
Column-level. Tests the number of values in a given column that are out of list, against reference or a defined condition.
Required:
  • column_name
Optional:
  • values: List[str]
Test conditions:
  • standard parameters
Expects all values to be in the list. With reference: the test fails if the column contains values out of the list (as seen in reference). No reference: N/A
TestShareOfOutListValues(column_name='cat_column')
Column-level. Tests the share of values in a given column that are out of list against reference or a defined condition.
Required:
  • column_name
Optional:
  • values: List[str]
Test conditions:
  • standard parameters
Expects all values to be in the list. With reference: the test fails if the column contains values out of the list (as seen in reference). No reference: N/A
TestColumnQuantile(column_name='num_column', quantile=0.25)
Column-level. Computes a quantile value and compares it to the reference or against a defined condition.
Required:
  • column_name
  • quantile
Optional: N/A Test conditions:
  • standard parameters
Expects +/-10%. With reference: the test fails if the quantile value is over 10% higher or lower. No reference: N/A

Data Drift

Defaults for Data Drift. By default, all data drift tests use the Evidently drift detection logic that selects a different statistical test or metric based on feature type and volume. You always need a reference dataset.
To modify the logic or select a different test, you should set data drift parameters.
Test name
Description
Parameters
Default test conditions
TestNumberOfDriftedColumns()
Dataset-level. Compares the distribution of each column in the current dataset to the reference and tests the number of drifting features against a defined condition.
Required: N/A Optional:
  • сolumns
  • stattest(default=automated selection)
  • cat_stattest
  • num_stattest
  • per_column_stattest
  • stattest_threshold(default=test default)
  • cat_stattest_threshold
  • num_stattest_threshold
  • per_column_stattest_threshold
Test conditions:
  • standard parameters
Expects =< ⅓ features to drift. With reference: If > 1/3 of features drifted, the test fails. No reference: N/A
TestShareOfDriftedColumns()
Dataset-level. Compares the distribution of each column in the current dataset to the reference and tests the share of drifting features against a defined condition.
Required: N/A Optional:
  • сolumns
  • stattest(default=automated selection)
  • cat_stattest
  • num_stattest
  • per_column_stattest
  • stattest_threshold(default=test default)
  • cat_stattest_threshold
  • num_stattest_threshold
  • per_column_stattest_threshold
Test conditions:
  • standard parameters
Expects =< ⅓ features to drift. With reference: If > 1/3 of features drifted, the test fails. No reference: N/A
TestColumnDrift(column_name='name')
Column-level. Tests if there is a distribution shift in a given column compared to the reference.
Required:
  • column_name
Optional:
  • stattest(default=automated selection)
  • stattest_threshold(default=test default)
Expects no drift. With reference: the test fails if the distribution drift is detected in a given column. No reference: N/A

Regression

Defaults for Regression tests: if there is no reference data or defined conditions, Evidently will compare the model performance to a dummy model that predicts the optimal constant (varies by the metric). You can also pass the reference dataset and run the test with default conditions, or define custom test conditions.
Test name
Description
Parameters
Default test conditions
TestValueMAE()
Dataset-level. Computes the Mean Absolute Error (MAE) and compares it to the reference or against a defined condition.
Required: N/A Optional: N/A Test conditions:
  • standard parameters
Expects +/-10% or better than a dummy model. With reference: if MAE is higher or lower by over 10%, the test fails. No reference: the test fails if the MAE value is higher than the MAE of the dummy model that predicts the optimal constant (median of the target value).
TestValueRMSE()
Dataset-level. Computes the Root Mean Square Error (RMSE) and compares it to the reference or against a defined condition.
Required: N/A Optional: N/A Test conditions
  • standard parameters
Expects +/-10% or better than a dummy model. With reference: if RMSE is higher or lower by over 10%, the test fails. No reference: the test fails if the RMSE value is higher than the RMSE of the dummy model that predicts the optimal constant (mean of the target value).
TestValueMeanError()
Dataset-level. Computes the Mean Error (ME) and tests if it is near zero or compares it against a defined condition.
Required: N/A Optional: N/A Test conditions
  • standard parameters
Expects the Mean Error to be near zero. With/without reference: the test fails if the Mean Error is skewed and the condition is violated. Condition: eq = approx(absolute=0.1*error_std) error_std = (curr_true - curr_preds).std()
TestValueMAPE()
Dataset-level. Computes the Mean Absolute Percentage Error (MAPE) and compares it to the reference or against a defined condition.
Required: N/A Optional: N/A Test conditions:
  • standard parameters
Expects +/-10% or better than a dummy model. With reference: if MAPE is higher or lower by over 10%, the test fails. No reference: the test fails if the MAPE value is higher than the MAPE of the dummy model that predicts the optimal constant (weighted median of the target value).
TestValueAbsMaxError()
Dataset-level. Computes the absolute maximum error and compares it to the reference or against a defined condition.
Required: N/A Optional: N/A Test conditions:
  • standard parameters
Expects +/-10% or better than a dummy model. With reference: if the absolute maximum error is higher or lower by over 10%, the test fails. No reference: the test fails if the absolute maximum error is higher than the absolute maximum error of the dummy model that predicts the optimal constant (median of the target value).
TestValueR2Score()
Dataset-level. Computes the R2 Score (coefficient of determination) and compares it to the reference or against a defined condition.
Required: N/A Optional: N/A Test conditions:
  • standard parameters
Expects +/-10% or > 0. With reference: if R2 is higher or lower by over 10%, the test fails. No reference: the test fails if the R2 value is =< 0.

Classification

You can apply the tests for non-probabilistic, probabilistic classification, and ranking. The underlying metrics will be calculated slightly differently depending on the provided inputs: only labels, probabilities, decision threshold, and/or K (to compute, e.g., [email protected]).
Defaults for Classification tests. If there is no reference data or defined conditions, Evidently will compare the model performance to a dummy model. It is based on a set of heuristics to verify that the quality is better than random. You can also pass the reference dataset and run the test with default conditions, or define custom test conditions.
Test name
Description
Parameters
Default test conditions
TestAccuracyScore()
Dataset-level. Computes the Accuracy and compares it to the reference or against a defined condition.
Required: N/A Optional:
  • threshold_probas(default for classification = None; default for probabilistic classification = 0.5)
  • k
Test conditions:
  • standard parameters
Expects +/-20% or better than a dummy model. With reference: if the Accuracy is over 20% higher or lower, the test fails. No reference: if the Accuracy is lower than the Accuracy of the dummy model, the test fails.
TestPrecisionScore()
Dataset-level. Computes the Precision and compares it to the reference or against a defined condition.
Required: N/A Optional:
  • threshold_probas(default for classification = None; default for probabilistic classification = 0.5)
  • k
Test conditions:
  • standard parameters
Expects +/-20% or better than a dummy model. With reference: if the Precision is over 20% higher or lower, the test fails. No reference: if the Precision is lower than the Precision of the dummy mode, the test fails.
TestRecallScore()
Dataset-level. Computes the Recall and compares it to the reference or against a defined condition.
Required: N/A Optional:
  • threshold_probas(default for classification = None; default for probabilistic classification = 0.5)
  • k
Test conditions:
  • standard parameters
Expects +/-20% or better than a dummy model. With reference: if the Recall is over 20% higher or lower, the test fails. No reference: if the Recall is lower than the Recall of the dummy model, the test fails.
TestF1Score()
Dataset-level. Computes the F1 score and compares it to the reference or against a defined condition.
Required: N/A Optional:
  • threshold_probas(default for classification = None; default for probabilistic classification = 0.5)
  • k
Test conditions:
  • standard parameters
Expects +/-20% or better than a dummy model. With reference: if the F1 is over 20% higher or lower, the test fails. No reference: if the F1 is lower than the F1 of the dummy model, the test fails.
TestPrecisionByClass(label='classN')
Dataset-level. Computes the Precision for the specified class and compares it to the reference or against a defined condition.
Required:
  • label
Optional:
  • probas_threshold(default for classification = None; default for probabilistic classification = 0.5)