For an intro, read Core Concepts and check quickstarts for LLMs or ML. For a reference code example, see this Metric cookbook.
How to read the tables
How to read the tables
-
Metric: the name of Metric or Preset you can pass to
Report. - Description: what it does. Complex Metrics link to explainer pages.
-
Parameters: available options. You can also add conditional
teststo any Metric with standard operators likeeq(equal),gt(greater than), etc. How Tests work. -
Test defaults are conditions that apply when you invoke Tests but do not set a pass/fail condition yourself.
-
With reference: if you provide a reference dataset during the Report
run, the conditions are set relative to reference. - No reference: if you do not provide a reference, Tests will use fixed heuristics (like expect no missing values).
-
With reference: if you provide a reference dataset during the Report
Text Evals
Summarizes results of text or LLM evals. To score individual inputs, first use descriptors.Data definition. You may need to map text columns.
| Metric | Description | Parameters | Test Defaults |
|---|---|---|---|
| TextEvals() |
| Optional:
| As in Metrics included in ValueStats |
Columns
Use to aggregate descriptor results or check data quality on column level.You may need to map column types using Data definition.
Value stats
Descriptive statistics.| Metric | Description | Parameters | Test Defaults |
|---|---|---|---|
| ValueStats() |
| Required:
|
|
| MinValue() |
| Required:
|
|
| StdValue() |
| Required:
|
|
| MeanValue() |
| Required:
|
|
| MaxValue() |
| Required:
|
|
| MedianValue() |
| Required:
|
|
| QuantileValue() |
| Required:
|
|
| CategoryCount() Example: CategoryCount(column="city", category="NY") |
| Required:
|
|
Column data quality
Column-level data quality metrics.Data definition. You may need to map column types.
| Metric | Description | Parameters | Test Defaults |
|---|---|---|---|
| MissingValueCount() |
| Required:
|
|
| InRangeValueCount() Example: InRangeValueCount(column="age",left="1", right="18") |
| Required:
|
|
| OutRangeValueCount() |
| Required:
|
|
| InListValueCount() |
| Required:
|
|
| OutListValueCount() Example: OutListValueCount(column="city", values=["Lon", "NY"]) |
| Required:
|
|
| UniqueValueCount() |
| Required:
|
|
Dataset
Use for exploratory data analysis and data quality checks.Data definition. You may need to map column types, ID and timestamp.
Dataset stats
Descriptive statistics.| Metric | Description | Parameters | Test Defaults |
|---|---|---|---|
| DataSummaryPreset() |
| Optional:
| As in individual Metrics. |
| DatasetStats() |
| None |
|
| RowCount() |
| Optional: |
|
| ColumnCount() |
| Optional: |
|
Dataset data quality
Dataset-level data quality metrics.Data definition. You may need to map column types, ID and timestamp.
| Metric | Description | Parameters | Test Defaults |
|---|---|---|---|
| ConstantColumnsCount() |
| Optional: |
|
| EmptyRowsCount() |
| Optional: |
|
| EmptyColumnsCount() |
| Optional: |
|
| DuplicatedRowCount() |
| Optional: |
|
| DuplicatedColumnsCount() |
| Optional: |
|
| DatasetMissingValueCount() |
| Required:
|
|
| AlmostConstantColumnsCount() |
| Optional: |
|
| ColumnsWithMissingValuesCount() |
| Optional: |
|
Data Drift
Use to detect distribution drift for text and tabular data or over computed text descriptors. Checks 20+ drift methods listed separately: text and tabular.Data definition. You may need to map column types, ID and timestamp.
Metrics explainers. Understand how data drift works.
| Metric | Description | Parameters | Test Defaults |
|---|---|---|---|
| DataDriftPreset() |
| Optional:
|
|
| DriftedColumnsCount() |
| Optional:
|
|
| ValueDrift() |
| Required:
|
|
Classification
Use to evaluate quality on a classification task (probabilistic, non-probabilistic, binary and multi-class).Data definition. You may need to map prediction, target columns and classification type.
General
Use for binary classification and aggregated results for multi-class.| Metric | Description | Parameters | Test Defaults |
|---|---|---|---|
| ClassificationPreset() |
| Optional: probas_threshold . | As in individual Metrics. |
| ClassificationQuality() |
| Optional: probas_threshold | As in individual Metrics. |
| Accuracy() |
| Optional: |
|
| Precision() |
| Required:
|
|
| Recall() |
| Required:
|
|
| F1Score() |
| Required:
|
|
| TPR() |
| Required:
|
|
| TNR() |
| Required:
|
|
| FPR() |
| Required:
|
|
| FNR() |
| Required:
|
|
| LogLoss() |
| Required:
|
|
| RocAUC() |
| Required:
|
|
Dummy model quality
Dummy model quality
Use these Metics to get the quality of a dummy model created on the same data (based on heuristics). You can compare your model quality to verify that it’s better than random. These Metrics serve as a baseline in automated testing.
| Metric | Description | Parameters | Test Defaults |
|---|---|---|---|
| ClassificationDummyQuality() |
| N/A | N/A |
| DummyPrecision() |
| N/A | N/A |
| DummyRecall() |
| N/A | N/A |
| DummyF1() |
| N/A | N/A |
By label
Use when you have multiple classes and want to evaluate quality separately.| Metric | Description | Parameters | Test Defaults | |
|---|---|---|---|---|
| ClassificationQualityByLabel() |
| None | As in individual Metrics. | |
| PrecisionByLabel() |
| Optional:
|
| |
| F1ByLabel() |
| Optional:
|
| |
| RecallByLabel() |
| Optional:
|
| |
| RocAUCByLabel() |
| Optional:
|
|
Regression
Use to evaluate the quality of a regression model.Data definition. You may need to map prediction and target columns.
| Metric | Description | Parameters | Test Defaults |
|---|---|---|---|
| RegressionPreset |
| None. | As in individual metrics. |
| RegressionQuality |
| None. | As in individual metrics. |
| MeanError() |
| Required:
|
|
| MAE() |
| Required:
|
|
| RMSE() |
| Optional: |
|
| MAPE() |
| Required:
|
|
| R2Score() |
| Optional: |
|
| AbsMaxError() |
| Optional: |
|
Dummy model quality
Dummy model quality
Use these Metics to get the baseline quality for regression: they use optimal constants (varies by the Metric). These Metrics serve as a baseline in automated testing.
| Metric | Description | Parameters | Test Defaults |
|---|---|---|---|
| RegressionDummyQuality() |
| N/A | N/A |
| DummyMeanError() |
| N/A | N/A |
| DummyMAE() |
| N/A | N/A |
| DummyMAPE() |
| N/A | N/A |
| DummyRMSE() |
| N/A | N/A |
| DummyR2() |
| N/A | N/A |
Ranking
Use to evaluate ranking, search / retrieval or recommendations.Data definition. You may need to map prediction and target columns and ranking type.
Metric explainers. Check ranking metrics explainers.
| Metric | Description | Parameters | Test Defaults |
|---|---|---|---|
| RecallTopK() |
| Required:
|
|
| FBetaTopK() |
| Required:
|
|
| PrecisionTopK() |
| Required:
|
|
| MAP() |
| Required:
|
|
| NDCG() |
| Required:
|
|
| MRR() |
| Required:
|
|
| HitRate() |
| Required:
|
|
| ScoreDistribution() |
| Required:
|
|