All tests
List of all Tests and Test Presets available in Evidently.
We are doing our best to maintain this page up to date. In case of discrepancies, consult the API reference or the "All tests" notebook in the Examples section. If you notice an error, please send us a pull request to update the documentation!
Test Presets
Default conditions for each Test in the Preset match the Test's defaults. You can see them in the tables below. The listed Preset parameters apply to the relevant individual Tests inside the Preset.
How to set custom Test conditions? Use parameters (e.g. equal, not equal, greater than, etc.) to set Test Conditions.
Data Quality
Data Integrity
TestNumberOfRows()
Dataset-level. Tests the number of rows against the reference or a defined condition.
Required: N/A Optional: N/A Test conditions:
standard parameters
Expects +/-10% or >30. With reference: the test fails if the number of rows differs by over 10% from the reference. No reference: the test fails if the number of rows is <= 30.
TestNumberOfColumns()
Dataset-level. Tests the number of columns against the reference or a defined condition.
Required: N/A Optional: N/A Test conditions:
standard parameters
Expects the same or non-zero. With reference: the test fails if the number of columns differs from the reference. No reference: the test fails if the number of columns is 0.
TestNumberOfConstantColumns()
Dataset-level. Tests the number of columns with all constant values against reference or a defined condition.
Required: N/A Optional: N/A Test conditions:
standard parameters
Expects =< or none. With reference: the test fails if the number of constant columns is higher than in the reference. No reference: the test fails if there is at least one constant column.
TestNumberOfEmptyRows()
Dataset-level. Tests the number of empty rows against reference or a defined condition.
Required: N/A Optional: N/A Test conditions:
standard parameters
Expects +/- 10% or none. With reference: the test fails if the share of empty rows is over 10% higher or lower than in the reference. No reference: the test fails if there is at least one empty row.
TestNumberOfEmptyColumns()
Dataset-level. Tests the number of empty columns against reference or a defined condition.
Required: N/A Optional: N/A Test conditions:
standard parameters
Expects =< or none. With reference: the test fails if the number of empty columns is higher than in the reference. No reference: the test fails if there is at least one empty column.
TestNumberOfDuplicatedRows()
Dataset-level. Tests the number of duplicate rows against reference or a defined condition.
Required: N/A Optional: N/A Test conditions:
standard parameters
Expects +/- 10% or none. With reference: the test fails if the share of duplicate rows is over 10% higher or lower than in the reference. No reference: the test fails if there is at least one duplicate row.
TestNumberOfDuplicatedColumns()
Dataset-level. Tests the number of duplicate columns against reference or a defined condition.
Required: N/A Optional: N/A Test conditions:
standard parameters
Expects =< or none. With reference: the test fails if the number of duplicate columns is higher than in the reference. No reference: the test fails if there is at least one duplicate column.
TestConflictTarget()
Dataset-level. Tests if there are conflicts in the target (instances where a different label is assigned for an identical input).
N/A
Expects no conflicts in the target (with or without reference).
TestConflictPrediction()
Dataset-level. Tests if there are conflicts in the prediction (instances where a different prediction is made for an identical input).
N/A
Expects no conflicts in the target (with or without reference).
TestColumnsType()
Dataset-level. Tests the types of all columns against the reference.
Required:
N/A
Optional:
columns_type: dict
Test conditions:
N/A
Expects types to match. With reference: the test fails if at least one column type does not match. No reference: N/A
TestColumnAllConstantValues(column_name='name')
Column-level. Tests if all the values in a given column are constant.
Required:
column_name
Optional: N/A Test conditions: N/A
Expects non-constant. The test fails if all values in a given column are constant.
TestColumnAllUniqueValues(column_name='name')
Column-level. Tests if all the values in a given column are unique.
Required:
column_name
Optional: N/A Test conditions: N/A
Expects all unique (e.g., IDs). The test fails if at least one value in a given column is not unique.
TestNumberOfUniqueValues(column_name='name')
Column-level. Tests the number of unique values in a given column against reference or a defined condition.
Required:
column_name
Optional: N/A Test conditions:
standard parameters
Expects +/-10%. With reference: the test fails if the share of unique values is different by more than 10%. No reference: N/A
TestUniqueValuesShare(column_name='name')
Column-level. Tests the share of unique values in a given column against reference or a defined condition.
Required:
column_name
Optional: N/A Test conditions:
standard parameters
Expects +/-10%. With reference: the test fails if the share of unique values is different by more than 10%. No reference: N/A
TestMostCommonValueShare(column_name='name')
Column-level. Tests the share of the most common value in a given column against reference or a defined condition.
Required:
column_name
Optional: N/A Test conditions:
standard parameters
Expects +/-10%. With reference: the test fails if the share of the most common value is different by more than 10% from the reference. No reference: the test fails if the share of the most common value is >= 80%.
TestColumnRegExp(column_name='name, reg_exp='^[0..9]')
Column-level. Tests the number of values in a column that do not match a defined regular expression, against reference or a defined condition.
Required:
column_name
reg_exp
Optional: N/A Test conditions:
standard parameters
With reference: the test fails if the share of values that match a regular expression is over 10% higher or lower than in the reference. No reference: the test fails if at least one of the values does not match a regular expression.
Missing Values
Defaults for Missing Values. The metrics that calculate the number or share of missing values detect four types of the values by default: Pandas nulls (None, NAN, etc.), "" (empty string), Numpy "-inf" value, Numpy "inf" value. You can also pass a custom missing values as a parameter and specify if you want to replace the default list. Example:
TestNumberOfMissingValues()
Dataset-level. Tests the number of missing values in the dataset against the reference or a defined condition.
Required: N/A Optional:
missing_values = [], replace = True/False
(default = default list)
Test conditions:
standard parameters
Expects up to +10% or 0. With reference: the test fails if the share of missing values is over 10% higher than in reference. No reference: the test fails if the dataset contains missing values.
TestShareOfMissingValues()
Dataset-level. Tests the share of missing values in the dataset against the reference or a defined condition.
Required: N/A Optional:
missing_values = [], replace = True/False
(default = default list)
Test conditions:
standard parameters
Expects up to +10% or 0. With reference: the test fails if the share of missing values is over 10% higher than in reference. No reference: the test fails if the dataset contains missing values.
TestNumberOfColumnsWithMissingValues()
Dataset-level. Tests the number of columns that contain missing values in the dataset against the reference or a defined condition.
Required: N/A Optional:
missing_values = [], replace = True/False
(default = default list)
Test conditions:
standard parameters
Expects <= or 0. With reference: the test fails if the number of columns with missing values is higher than in reference. No reference: the test fails if the dataset contains columns with missing values.
TestShareOfColumnsWithMissingValues()
Dataset-level. Tests the share of columns that contain missing values in the dataset against the reference or a defined condition.
Required: N/A Optional:
missing_values = [], replace = True/False
(default = default list)
Test conditions:
standard parameters
Expects <= or 0. With reference: the test fails if the share of columns with missing values is higher than in reference. No reference: the test fails if the dataset contains columns with missing values.
TestNumberOfRowsWithMissingValues()
Dataset-level. Tests the number of rows that contain missing values against the reference or a defined condition.
Required: N/A Optional:
missing_values = [], replace = True/False
(default = default list)
Test conditions
standard parameters
Expects up to +10% or 0. With reference: the test fails if the share of rows with missing values is over 10% higher than in reference. No reference: the test fails if the dataset contains rows with missing values.
TestShareOfRowsWithMissingValues()
Dataset-level. Tests the share of rows that contain missing values against the reference or a defined condition.
Required: N/A Optional:
missing_values = [], replace = True/False
(default = default list)
Test conditions
standard parameters
Expects up to +10% or 0. With reference: the test fails if the share of rows with missing values is over 10% higher than in reference. No reference: the test fails if the dataset contains rows with missing values.
TestNumberOfDifferentMissingValues()
Dataset-level. Tests the number of differently encoded missing values in the dataset against the reference or a defined condition. Detects 4 types of missing values by default and/or values from a user list.
Required: N/A Optional:
missing_values: list <br>replace: bool = True
(default = default list)
Test conditions
standard parameters
Expects <= or none. With reference: the test fails if the current dataset has more types of missing values. No reference: the test fails if the current dataset contains missing values.
TestColumnNumberOfMissingValues(column_name='name')
Column-level. Tests the number of missing values in a given column against the reference or a defined condition.
Required:
column_name
Optional:
missing_values = [], replace = True/False
(default = default list)
Test conditions
standard parameters
Expects up to 10% or none. With reference: the test fails if the share of missing values in a column is over 10% higher than in reference. No reference: the test fails if the column contains missing values.
TestColumnShareOfMissingValues(column_name='name')
Column-level. Tests the share of missing values in a given column against the reference or a defined condition.
Required:
column_name
Optional:
missing_values = [], replace = True/False
(default = default list)
Test conditions
standard parameters
Expects up to 10% or none. With reference: the test fails if the share of missing values in a column is over 10% higher than in reference. No reference: the test fails if the column contains missing values.
TestColumnNumberOfDifferentMissingValues(column_name='name')
Column-level. Tests the number of differently encoded missing values in the column against reference or a defined condition. Detects 4 types of missing values by default and/or values from a user list.
Required:
column_name
Optional:
missing_values = [], replace = True/False
(default = default list)
Test conditions:
standard parameters
Expects <= or none. With reference: the test fails if the current column has more types of missing values. No reference: The test fails if the column contains missing values.
Correlations
TestTargetPredictionCorrelation()
Dataset-level. Tests the strength of correlation between the target and prediction.
Required: N/A Optional:
method
(default =pearson
, available =pearson
,spearman
,kendall
,cramer_v
)
Test conditions:
standard parameters
Expects +/- 0.25 in correlation strength, or > 0. With reference: the test fails if there is a 0.25+ change in the correlation strength between target and prediction. No reference: the test fails if the correlation between target and prediction <=0
TestHighlyCorrelatedColumns()
Dataset-level. Tests the strongest correlation between a pair of features, against reference or a defined condition.
Required: N/A Optional:
method
(default =pearson
, available =pearson
,spearman
,kendall
,cramer_v
)
Test conditions:
standard parameters
Expects +/- 10% in max correlation strength, or < 0.9. With reference: the test fails if there is a 10%+ change in the correlation strength for the most correlated feature pair. No reference: the test fails if there is at least one pair of features with the correlation >= 0.9
TestTargetFeaturesCorrelations()
Dataset-level. Tests if any of the features is highly correlated with the target. Example use: to detect target leak.
Required: N/A Optional:
'
method
(default =pearson
, available =pearson
,spearman
,kendall
,cramer_v
)
Test conditions:
standard parameters
Expects +/- 10% in max correlation strength, or < 0.9. With reference: the test fails if there is a 10%+ change in the correlation strength for the feature most correlated with the target. No reference: the test fails if at least one feature is correlated with the target >= 0.9
TestPredictionFeaturesCorrelations()
Dataset-level. Tests if any of the features is highly correlated with the prediction Example use: to detect when predictions rely on a single feature.
Required: N/A Optional:
method
(default =pearson
, available =pearson
,spearman
,kendall
,cramer_v
)
Test conditions:
standard parameters
Expects +/- 10% in max correlation strength, or < 0.9. With reference: the test fails if there is a 10%+ change in the correlation strength for the feature most correlated with the prediction. No reference: the test fails if at least one feature is correlated with the prediction >= 0.9
TestCorrelationChanges()
Dataset-level. Tests the number of correlation violations (significant change in the correlation strength between any two columns).
Required: N/A Optional:
method
(default =pearson
, available =pearson
,spearman
,kendall
,cramer_v
)corr_diff
(default = 0.25)column_name
(checks for correlation changes only between a chosen column and other columns in the dataset)
Test conditions:
standard parameters
Expects none. With reference: the test fails if at least 1 correlation violation is detected. No reference: N/A
Column Values
TestColumnValueMin(column_name='num-column')
Column-level. Tests the minimum value of a given numerical column against reference or a defined condition.
Required:
column_name
Optional: N/A Test conditions:
standard parameters
Expects not lower. With reference: the test fails if the minimum value is lower than in the reference. No reference: N/A
TestColumnValueMax(column_name='num-column')
Column-level. Tests the maximum value of a given numerical column against reference or a defined condition.
Required:
column_name
Optional: N/A Test conditions:
standard parameters
Expects not higher. With reference: the test fails if the maximum value is higher than in the reference. No reference: N/A
TestColumnValueMean(column_name='num-column')
Column-level. Tests the mean value of a given numerical column against reference or a defined condition.
Required:
column_name
Optional: N/A Test conditions:
standard parameters
Expects +/-10%. With reference: the test fails if the mean value is different by more than 10%. No reference: N/A
TestColumnValueMedian(column_name='num-column')
Column-level. Tests the median value of a given numerical column against reference or a defined condition.
Required:
column_name
Optional: N/A Test conditions:
standard parameters
Expects +/-10%. With reference: the test fails if the median value is different by more than 10%. No reference: N/A
TestColumnValueStd(column_name='num-column')
Column-level. Tests the standard deviation of a given numerical column against reference or a defined condition.
Required:
column_name
Optional: N/A Test conditions:
standard parameters
Expects +/-10%. With reference: the test fails if the standard deviation is different by more than 10%. No reference: N/A
TestColumnQuantile(column_name='num_column', quantile=0.25)
Column-level. Computes a quantile value and compares it to the reference or against a defined condition.
Required:
column_name
quantile
Optional: N/A Test conditions:
standard parameters
Expects +/-10%. With reference: the test fails if the quantile value is over 10% higher or lower. No reference: N/A
TestMeanInNSigmas(column_name='num-column')
Column-level. Tests if the mean value in a given numerical column is within the expected range , defined in standard deviations. This test requires reference.
Required:
column_name
Optional:
n_sigmas
Expects +/- 2 std dev. With reference: the test fails if the current mean value is out of the +/- 2 std dev interval from the reference mean value. No reference: N/A
TestValueRange(column_name='num_column')
Column-level. Tests if a numerical column contains values out of the min-max range.
Required:
column_name
Optional:
left
right
Test conditions: N/A
Expects all values to be in range. With reference: the test fails if the column contains values out of the min-max range as seen in the reference. No reference: N/A
TestShareOfOutRangeValues(column_name='num_column')
Column-level. Tests the share of values out of the min-max range against reference or a defined condition.
Required:
column_name
Optional:
left
right
Test conditions:
standard parameters
Expects all values to be in range.
TestNumberOfOutRangeValues(column_name='num_column')
Column-level. Tests the number of values out of the min-max range against reference or a defined condition.
Required:
column_name
Optional:
left
right
Test conditions:
standard parameters
Expects all values to be in range. With reference: the test fails if at least 1 value is out of range (as seen in reference). No reference: N/A
TestCategoryShare(column_name='education', category='Some-college', lt=0.5))
Column-level. Tests if the number of objects belonging to a defined category (or having a defined numerical value) is within the threshold.
Required:
column_name
category
Optional: N/A Test conditions:
standard parameters
Expects the category to be present. The test fails if the category is not present.
TestCategoryCount(column_name='education', category='Some-college', lt=0.5))
Column-level. Tests if the share of objects belonging to a defined category (or having a defined numerical value) is within the threshold.
Required:
column_name
category
Optional: N/A Test conditions:
standard parameters
Expects the category to be present. The test fails if the category is not present.
TestValueList(column_name='cat_column')
Column-level. Tests if a categorical column contains values out of the list.
Required:
column_name
Optional:
values: List[str]
Test conditions: N/A
Expects all values to be in the list. With reference: the test fails if the column contains values out of the list (as seen in reference). No reference: N/A
TestNumberOfOutListValues(column_name='cat_column')
Column-level. Tests the number of values in a given column that are out of list, against reference or a defined condition.
Required:
column_name
Optional:
values: List[str]
Test conditions:
standard parameters
Expects all values to be in the list. With reference: the test fails if the column contains values out of the list (as seen in reference). No reference: N/A
TestShareOfOutListValues(column_name='cat_column')
Column-level. Tests the share of values in a given column that are out of list against reference or a defined condition.
Required:
column_name
Optional:
values: List[str]
Test conditions:
standard parameters
Expects all values to be in the list. With reference: the test fails if the column contains values out of the list (as seen in reference). No reference: N/A
Data Drift
By default, all data drift tests use the Evidently drift detection logic that selects a different statistical test or metric based on feature type and volume. You always need a reference dataset. To modify the logic or select a different test, you should set data drift parameters.
TestNumberOfDriftedColumns()
Dataset-level. Compares the distribution of each column in the current dataset to the reference and tests the number of drifting features against a defined condition.
Required: N/A Optional:
сolumns
stattest
(default=automated selection)cat_stattest
num_stattest
per_column_stattest
stattest_threshold
(default=test default)cat_stattest_threshold
num_stattest_threshold
per_column_stattest_threshold
Test conditions:
standard parameters
Expects =< ⅓ features to drift. With reference: If > 1/3 of features drifted, the test fails. No reference: N/A
TestShareOfDriftedColumns()
Dataset-level. Compares the distribution of each column in the current dataset to the reference and tests the share of drifting features against a defined condition.
Required: N/A Optional:
сolumns
stattest
(default=automated selection)cat_stattest
num_stattest
per_column_stattest
stattest_threshold
(default=test default)cat_stattest_threshold
num_stattest_threshold
per_column_stattest_threshold
Test conditions:
standard parameters
Expects =< ⅓ features to drift. With reference: If > 1/3 of features drifted, the test fails. No reference: N/A
TestColumnDrift(column_name='name')
Column-level. Tests if there is a distribution shift in a given column compared to the reference.
Required:
column_name
Optional:
stattest
(default=automated selection)stattest_threshold
(default=test default)
Expects no drift. With reference: the test fails if the distribution drift is detected in a given column. No reference: N/A
TestEmbeddingsDrift(embeddings_name='small_subset')
Column-level. Tests if there is drift in embeddings compared to reference.
Required:
embeddings_name
Optional:
drift_method
(default=model)
Expects no drift. With reference: the test fails if the drift is detected in a given subset of columns. No reference: N/A
Regression
Defaults for Regression tests: if there is no reference data or defined conditions, Evidently will compare the model performance to a dummy model that predicts the optimal constant (varies by the metric). You can also pass the reference dataset and run the test with default conditions, or define custom test conditions.
TestValueMAE()
Dataset-level. Computes the Mean Absolute Error (MAE) and compares it to the reference or against a defined condition.
Required: N/A Optional: N/A Test conditions:
standard parameters
Expects +/-10% or better than a dummy model. With reference: if MAE is higher or lower by over 10%, the test fails. No reference: the test fails if the MAE value is higher than the MAE of the dummy model that predicts the optimal constant (median of the target value).
TestValueRMSE()
Dataset-level. Computes the Root Mean Square Error (RMSE) and compares it to the reference or against a defined condition.
Required: N/A Optional: N/A Test conditions
standard parameters
Expects +/-10% or better than a dummy model. With reference: if RMSE is higher or lower by over 10%, the test fails. No reference: the test fails if the RMSE value is higher than the RMSE of the dummy model that predicts the optimal constant (mean of the target value).
TestValueMeanError()
Dataset-level. Computes the Mean Error (ME) and tests if it is near zero or compares it against a defined condition.
Required: N/A Optional: N/A Test conditions
standard parameters
Expects the Mean Error to be near zero. With/without reference: the test fails if the Mean Error is skewed and the condition is violated. Condition: eq = approx(absolute=0.1*error_std) error_std = (curr_true - curr_preds).std()
TestValueMAPE()
Dataset-level. Computes the Mean Absolute Percentage Error (MAPE) and compares it to the reference or against a defined condition.
Required: N/A Optional: N/A Test conditions:
standard parameters
Expects +/-10% or better than a dummy model. With reference: if MAPE is higher or lower by over 10%, the test fails. No reference: the test fails if the MAPE value is higher than the MAPE of the dummy model that predicts the optimal constant (weighted median of the target value).
TestValueAbsMaxError()
Dataset-level. Computes the absolute maximum error and compares it to the reference or against a defined condition.
Required: N/A Optional: N/A Test conditions:
standard parameters
Expects +/-10% or better than a dummy model. With reference: if the absolute maximum error is higher or lower by over 10%, the test fails. No reference: the test fails if the absolute maximum error is higher than the absolute maximum error of the dummy model that predicts the optimal constant (median of the target value).
TestValueR2Score()
Dataset-level. Computes the R2 Score (coefficient of determination) and compares it to the reference or against a defined condition.
Required: N/A Optional: N/A Test conditions:
standard parameters
Expects +/-10% or > 0. With reference: if R2 is higher or lower by over 10%, the test fails. No reference: the test fails if the R2 value is =< 0.
Classification
You can apply the tests for non-probabilistic, probabilistic classification, and ranking. The underlying metrics will be calculated slightly differently depending on the provided inputs: only labels, probabilities, decision threshold, and/or K (to compute, e.g., precision@K).
Defaults for Classification tests. If there is no reference data or defined conditions, Evidently will compare the model performance to a dummy model. It is based on a set of heuristics to verify that the quality is better than random. You can also pass the reference dataset and run the test with default conditions, or define custom test conditions.
TestAccuracyScore()
Dataset-level. Computes the Accuracy and compares it to the reference or against a defined condition.
Required: N/A Optional:
threshold_probas
(default for classification = None; default for probabilistic classification = 0.5)k
Test conditions:
standard parameters
Expects +/-20% or better than a dummy model. With reference: if the Accuracy is over 20% higher or lower, the test fails. No reference: if the Accuracy is lower than the Accuracy of the dummy model, the test fails.
TestPrecisionScore()
Dataset-level. Computes the Precision and compares it to the reference or against a defined condition.
Required: N/A Optional:
threshold_probas
(default for classification = None; default for probabilistic classification = 0.5)k
Test conditions:
standard parameters
Expects +/-20% or better than a dummy model. With reference: if the Precision is over 20% higher or lower, the test fails. No reference: if the Precision is lower than the Precision of the dummy mode, the test fails.
TestRecallScore()
Dataset-level. Computes the Recall and compares it to the reference or against a defined condition.
Required: N/A Optional:
threshold_probas
(default for classification = None; default for probabilistic classification = 0.5)k
Test conditions:
standard parameters
Expects +/-20% or better than a dummy model. With reference: if the Recall is over 20% higher or lower, the test fails. No reference: if the Recall is lower than the Recall of the dummy model, the test fails.
TestF1Score()
Dataset-level. Computes the F1 score and compares it to the reference or against a defined condition.
Required: N/A Optional:
threshold_probas
(default for classification = None; default for probabilistic classification = 0.5)k
Test conditions:
standard parameters
Expects +/-20% or better than a dummy model. With reference: if the F1 is over 20% higher or lower, the test fails. No reference: if the F1 is lower than the F1 of the dummy model, the test fails.
TestPrecisionByClass(label='classN')
Dataset-level. Computes the Precision for the specified class and compares it to the reference or against a defined condition.
Required:
label
Optional:
probas_threshold
(default for classification = None; default for probabilistic classification = 0.5)k
(default = None)
Test conditions:
standard parameters
Expects +/-20% or better than a dummy model. With reference: if the Precision is over 20% higher or lower, the test fails. No reference: if the Precision is lower than the Precision of the dummy model, the test fails.
TestRecallByClass(label='classN')
Dataset-level. Computes the Recall for the specified class and compares it to the reference or against a defined condition.
Required:
label
Optional:
probas_threshold
(default for classification = None; default for probabilistic classification = 0.5)k
(default = None)
Test conditions:
standard parameters
Expects +/-20% or better than a dummy model. With reference: if the Recall is over 20% higher or lower, the test fails. No reference: if the Recall is lower than the Recall of the dummy model, the test fails.
TestF1ByClass(label='classN')
Dataset-level. Computes the F1 for the specified class and compares it to the reference or against a defined constraint.
Required:
label
Optional:
probas_threshold
(default for classification = None; default for probabilistic classification = 0.5)k
(default = None)
Test conditions:
standard parameters
Expects +/-20% or better than a dummy model. With reference: the test fails if the F1 is over 20% higher or lower. No reference: the test fails if the F1 is lower than the F1 of the dummy model.
TestTPR()
Dataset-level. Computes the True Positive Rate and compares it to the reference or against a defined condition.
Required: N/A Optional:
probas_threshold
(default for classification = None; default for probabilistic classification = 0.5)k
(default = None)
Test conditions:
standard parameters
Expects +/-20% or better than a dummy model. With reference: the test fails if the TPR is over 20% higher or lower. No reference: the test fails if the TPR is lower than the TPR of the dummy model.
TestTNR()
Dataset-level. Computes the True Negative Rate and compares it to the reference or against a defined condition.
Required: N/A Optional:
probas_threshold
(default for classification = None; default for probabilistic classification = 0.5)k
(default = None)
Test conditions:
standard parameters
Expects +/-20% or better than a dummy model. With reference: the test fails if the TNR is over 20% higher or lower. No reference: the test fails if the TNR is lower than the TNR of the dummy model.
TestFPR()
Dataset-level. Computes the False Positive Rate and compares it to the reference or against a defined condition.
Required: N/A Optional:
probas_threshold
(default for classification = None; default for probabilistic classification = 0.5)k
(default = None)
Test conditions:
standard parameters
Expects +/-20% or better than a dummy model. With reference: the test fails if the FPR is over 20% higher or lower. No reference: the test fails if the FPR is higher than the FPR of the dummy model.
TestFNR()
Dataset-level. Computes the False Negative Rate and compares it to the reference or against a defined condition.
Required: N/A Optional:
probas_threshold
(default for classification = None; default for probabilistic classification = 0.5)k
(default = None)
Test conditions:
standard parameters
Expects +/-20% or better than a dummy model. With reference: the test fails if the FNR is over 20% higher or lower. No reference: the test fails if the FNR is higher than the FNR of the dummy model.
TestRocAuc()
Dataset-level. Applies to probabilistic classification. Computes the ROC AUC and compares it to the reference or against a defined condition.
Required: N/A Optional: N/A Test conditions:
standard parameters
Expects +/-20% or > 0.5 With reference: the test fails if the ROC AUC is over 20% higher or lower than in the reference. No reference: the test fails if ROC AUC is <= 0.5.
TestLogLoss()
Dataset-level. Applies to probabilistic classification. Computes the LogLoss and compares it to the reference or against a defined condition.
Required: N/A Optional: N/A Test conditions:
standard parameters
Expects +/-20% or better than a dummy model. With reference: the test fails if the LogLoss is over 20% higher or lower than in the reference. No reference: the test fails if LogLoss is higher than the LogLoss of the dummy model (equals 0.5 for a constant model).
Ranking and Recommendations
Check individual metric descriptions here.
Optional shared parameters:
no_feedback_users: bool = False
. Specifies whether to include the users who did not select any of the items, when computing the quality metrics. Default: False.min_rel_score: Optional[int] = None
. Specifies the minimum relevance score to consider relevant when calculating the quality metrics for non-binary targets (e.g., if a target is a rating or a custom score).
TestPrecisionTopK(k=k)
Dataset-level. Computes the Precision at the top K and compares it to the reference or against a defined condition.
Required:
k
Optional:
no_feedback_users
min_rel_score
Test conditions:
standard parameters
Expects +/-10% from reference. With reference: if the Precision at the top K is over 10% higher or lower, the test fails. No reference: Tests if precision > 0.
TestRecallTopK(k=k)
Dataset-level. Computes the Recall at the top K and compares it to the reference or against a defined condition.
Required:
k
Optional:
no_feedback_users
min_rel_score
Test conditions:
standard parameters
Expects +/-10% from reference. With reference: if the Recall at the top K is over 10% higher or lower, the test fails. No reference: Tests if recall > 0.
TestFBetaTopK(k=k)
Dataset-level. Computes the F-beta score at the top K and compares it to the reference or against a defined condition.
Required:
k
Optional:
no_feedback_users
min_rel_score
Test conditions:
standard parameters
Expects +/-10% from reference. With reference: if the F-beta score at the top K is over 10% higher or lower, the test fails. No reference: Tests if F-beta > 0.
TestHitRateK(k=k)
Dataset-level. Computes the Hit Rate at the top K recommendations and compares it to the reference or against a defined condition.
Required:
k
Optional:
no_feedback_users
min_rel_score
Test conditions:
standard parameters
Expects +/-10% from reference. With reference: if the Hit Rate at the top K is over 10% higher or lower, the test fails. No reference: Tests if Hit Rate > 0.
TestMAPK(k=k)
Dataset-level. Computes the Mean Average Precision at the top K and compares it to the reference or against a defined condition.
Required:
k
Optional:
no_feedback_users
min_rel_score
Test conditions:
standard parameters
Expects +/-10% from reference. With reference: if the MAP at the top K is over 10% higher or lower, the test fails. No reference: Tests if MAP > 0.
TestMRRK(k=k)
Dataset-level. Computes the Mean Reciprocal Rank at the top K and compares it to the reference or against a defined condition.
Required:
k
Optional:
no_feedback_users
min_rel_score
Test conditions:
standard parameters
Expects +/-10% from reference. With reference: if the MRR at the top K is over 10% higher or lower, the test fails. No reference: Tests if MRR > 0.
TestNDCGK(k=k)
Dataset-level. Computes the Normalized Discounted Cumulative Gain at the top K and compares it to the reference or against a defined condition.
Required:
k
Optional:
no_feedback_users
min_rel_score
Test conditions:
standard parameters
Expects +/-10% from reference. With reference: if the Normalized Discounted Cumulative Gain at the top K is over 10% higher or lower, the test fails. No reference: Tests if NDCG > 0.
TestNovelty(k=k)
Dataset-level. Computes the Novelty at the top K recommendations and compares it to the reference or against a defined condition. Requires a training dataset.
Required:
k
Optional: N/A Test conditions:
standard parameters
Expects +/-10% from reference. With reference: if the Novelty at the top K is over 10% higher or lower, the test fails. No reference: Tests if novelty > 0.
TestPersonalization(k=k)
Dataset-level. Computes the Personalization at the top K recommendations and compares it to the reference or against a defined condition.
Required:
k
Optional: N/A Test conditions:
standard parameters
Expects +/-10% from reference. With reference: if the Personalization at the top K is over 10% higher or lower, the test fails. No reference: Tests if personalization > 0.
TestSerendipity(k=k, item_features=item_features)
Dataset-level. Computes the Serendipity at the top K recommendations considering item features and compares it to the reference or against a defined condition. Requires a training dataset.
Required:
k
item_features
Optional:
min_rel_score
Test conditions:
standard parameters
Expects +/-10% from reference. With reference: if the Serendipity at the top K is over 10% higher or lower, the test fails. No reference: Tests if serendipity > 0.
TestDiversity(k=k, item_features=item_features)
Dataset-level. Computes the Diversity at the top K recommendations considering item features and compares it to the reference or against a defined condition.
Required:
k
item_features
Optional: N/A Test conditions:
standard parameters
Expects +/-10% from reference. With reference: if the Diversity at the top K is over 10% higher or lower, the test fails. No reference: Tests if diversity > 0.
TestARP(k=k)
Dataset-level. Computes the Average Recommendation Popularity at the top K recommendations and compares it to the reference or against a defined condition. Requires a training dataset.
Required:
k
Optional:
normalize_arp
(default: False)
Test conditions:
standard parameters
Expects +/-10% from reference. With reference: if the ARP at the top K is over 10% higher or lower, the test fails. No reference: Tests if ARP > 0.
TestGiniIndex(k=k)
Dataset-level. Computes the Gini Index at the top K recommendations and compares it to the reference or against a defined condition. Requires a training dataset.
Required:
k
Optional: N/A Test conditions:
standard parameters
Expects +/-10% from reference. With reference: if the Gini Index at the top K is over 10% higher or lower, the test fails. No reference: Tests if Gini Index < 1.
TestCoverage(k=k)
Dataset-level. Computes the Coverage at the top K recommendations and compares it to the reference or against a defined condition. Requires a training dataset.
Required:
k
Optional: N/A Test conditions:
standard parameters
Expects +/-10% from reference. With reference: if the Coverage at the top K is over 10% higher or lower, the test fails. No reference: Tests if Coverage > 0.
Last updated