All Metrics
Reference page for all dataset-level evals.
For an intro, read Core Concepts and check quickstarts for LLMs or ML.
Text Evals
Summarizes results of text or LLM evals. To score individual inputs, first use descriptors.
Data definition. You may need to map text columns.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
TextEvals() |
| Optional:
| As in Metrics included in ValueStats |
Columns
Use to aggregate descriptor results or check data quality on column level.
You may need to map column types using Data definition.
Value stats
Descriptive statistics.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
ValueStats() |
| Required:
|
|
MinValue() |
| Required:
|
|
StdValue() |
| Required:
|
|
MeanValue() |
| Required:
|
|
MaxValue() |
| Required:
|
|
MedianValue() |
| Required:
|
|
QuantileValue() |
| Required:
|
|
CategoryCount() Example: CategoryCount( column="city", category="NY") |
| Required:
|
|
Column data quality
Column-level data quality metrics.
Data definition. You may need to map column types.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
MissingValueCount() |
| Required:
|
|
NewCategoriesCount() (Coming soon) |
| Required:
| Expect 0. |
MissingCategoriesCount() (Coming soon) |
| Required:
| Expect 0. |
InRangeValueCount() Example: InRangeValueCount( column="age", left="1", right="18") |
| Required:
|
|
OutRangeValueCount() |
| Required:
|
|
InListValueCount() |
| Required:
|
|
OutListValueCount() Example: OutListValueCount( column="city", values=["Lon", "NY"]) |
| Required:
|
|
UniqueValueCount() |
| Required:
|
|
MostCommonValueCount() (Coming soon) |
| Required:
|
|
Dataset
Use for exploratory data analysis and data quality checks.
Data definition. You may need to map column types, ID and timestamp.
Dataset stats
Descriptive statistics.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
DataSummaryPreset() |
| Optional:
| As in individual Metrics. |
DatasetStats() |
| None |
|
RowCount() |
| Optional: |
|
ColumnCount() |
| Optional: |
|
Dataset data quality
Dataset-level data quality metrics.
Data definition. You may need to map column types, ID and timestamp.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
ConstantColumnsCount() |
| Optional: |
|
EmptyRowsCount() |
| Optional: |
|
EmptyColumnsCount() |
| Optional: |
|
DuplicatedRowCount() |
| Optional: |
|
DuplicatedColumnsCount() |
| Optional: |
|
DatasetMissingValueCount() |
| Required:
|
|
AlmostEmptyColumnCount() (Coming soon) |
| Optional: |
|
AlmostConstantColumnsCount() |
| Optional: |
|
RowsWithMissingValuesCount() (Coming soon) |
| Optional: |
|
ColumnsWithMissingValuesCount() |
| Optional: |
|
Data Drift
Use to detect distribution drift for text, tabular, embeddings data or over computed text descriptors. 20+ drift methods listed separately: text and tabular, embeddings.
Data definition. You may need to map column types, ID and timestamp.
Metrics explainers. Understand data drift works.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
DataDriftPreset() |
| Optional:
|
|
DriftedColumnsCount() |
| Optional:
|
|
ValueDrift() |
| Required:
|
|
MultivariateDrift() (Coming soon) |
| Optional:
|
|
EmbeddingDrift() (Coming soon) |
| Required:
|
|
Correlations
Use for exploratory data analysis, drift monitoring (correlation changes) or to check alignment between scores (e.g. LLM-based descriptors against human labels).
Data definition. You may need to map column types.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
DatasetCorrelations() (Coming soon) |
| Optional:
| N/A |
Correlation() (Coming soon) |
| Required:
| N/A |
CorrelationChanges() (Coming soon) |
| Optional:
|
|
Classification
Use to evaluate quality on a classification task (probabilistic, non-probabilistic, binary and multi-class).
Data definition. You may need to map prediction, target columns and classification type.
General
Use for binary classification and aggregated results for multi-class.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
ClassificationPreset() |
| Optional: probas_threshold . | As in individual Metrics. |
ClassificationQuality() |
| Optional: probas_threshold | As in individual Metrics. |
LabelCount() (Coming soon) |
| Required:
| N/A |
Accuracy() |
| Optional: |
|
Precision() |
| Required:
|
|
Recall() |
| Required:
|
|
F1Score() |
| Required:
|
|
TPR() |
| Required:
|
|
TNR() |
| Required:
|
|
FPR() |
| Required:
|
|
FNR() |
| Required:
|
|
LogLoss() |
| Required:
|
|
RocAUC() |
| Required:
|
|
Lift() (Coming soon) |
| Required:
| N/A |
Dummy metrics:
By label
Use when you have multiple classes and want to evaluate quality separately.
Metric | Description | Parameters | Test Defaults | |
---|---|---|---|---|
ClassificationQualityByLabel() |
| None | As in individual Metrics. | |
PrecisionByLabel() |
| Optional:
|
| |
F1ByLabel() |
| Optional:
|
| |
RecallByLabel() |
| Optional:
|
| |
RocAUCByLabel() |
| Optional:
|
|
Regression
Use to evaluate the quality of a regression model.
Data definition. You may need to map prediction and target columns.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
RegressionPreset |
| None. | As in individual metrics. |
RegressionQuality |
| None. | As in individual metrics. |
MeanError() |
| Required:
|
|
MAE() |
| Required:
|
|
RMSE() |
| Optional: |
|
MAPE() |
| Required:
|
|
R2Score() |
| Optional: |
|
AbsMaxError() |
| Optional: |
|
Dummy metrics:
Ranking
Use to evaluate ranking, search / retrieval or recommendations.
Data definition. You may need to map prediction and target columns and ranking type. Some metrics require additional training data.
Metric explainers. Check ranking metrics explainers.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
RecSysPreset() |
| None. | As in individual metrics. |
RecallTopK() |
| Required:
|
|
FBetaTopK() |
| Required:
|
|
PrecisionTopK() |
| Required:
|
|
MAP() |
| Required:
|
|
NDCG() |
| Required:
|
|
MRR() |
| Required:
|
|
HitRate() |
| Required:
|
|
ScoreDistribution() |
| Required:
|
|
Personalization() (Coming soon) |
| Required:
|
|
ARP() (Coming soon) |
| Required:
|
|
Coverage()(Coming soon) |
| Required:
|
|
GiniIndex()(Coming soon) |
| Required:
|
|
Diversity() (Coming soon) |
| Required:
|
|
Serendipity()(Coming soon) |
| Required:
|
|
Novelty() (Coming soon) |
| Required:
|
|
Relevant for RecSys metrics:
-
no_feedback_user: bool = False
. Specifies whether to include the users who did not select any of the items, when computing the quality metric. Default: False. -
min_rel_score: Optional[int] = None
. Specifies the minimum relevance score to consider relevant when calculating the quality metrics for non-binary targets (e.g., if a target is a rating or a custom score).