For an intro, read Core Concepts and check quickstarts for LLMs or ML. For a reference code example, see this Metric cookbook.
How to read the tables
How to read the tables
-
Metric: the name of Metric or Preset you can pass to
Report
. - Description: what it does. Complex Metrics link to explainer pages.
-
Parameters: available options. You can also add conditional
tests
to any Metric with standard operators likeeq
(equal),gt
(greater than), etc. How Tests work. -
Test defaults are conditions that apply when you invoke Tests but do not set a pass/fail condition yourself.
-
With reference: if you provide a reference dataset during the Report
run
, the conditions are set relative to reference. - No reference: if you do not provide a reference, Tests will use fixed heuristics (like expect no missing values).
-
With reference: if you provide a reference dataset during the Report
Text Evals
Summarizes results of text or LLM evals. To score individual inputs, first use descriptors.Data definition. You may need to map text columns.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
TextEvals() |
| Optional:
| As in Metrics included in ValueStats |
Columns
Use to aggregate descriptor results or check data quality on column level.You may need to map column types using Data definition.
Value stats
Descriptive statistics.Metric | Description | Parameters | Test Defaults |
---|---|---|---|
ValueStats() |
| Required:
|
|
MinValue() |
| Required:
|
|
StdValue() |
| Required:
|
|
MeanValue() |
| Required:
|
|
MaxValue() |
| Required:
|
|
MedianValue() |
| Required:
|
|
QuantileValue() |
| Required:
|
|
CategoryCount() Example: CategoryCount( column="city", category="NY") |
| Required:
|
|
Column data quality
Column-level data quality metrics.Data definition. You may need to map column types.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
MissingValueCount() |
| Required:
|
|
InRangeValueCount() Example: InRangeValueCount( column="age", left="1", right="18") |
| Required:
|
|
OutRangeValueCount() |
| Required:
|
|
InListValueCount() |
| Required:
|
|
OutListValueCount() Example: OutListValueCount( column="city", values=["Lon", "NY"]) |
| Required:
|
|
UniqueValueCount() |
| Required:
|
|
Dataset
Use for exploratory data analysis and data quality checks.Data definition. You may need to map column types, ID and timestamp.
Dataset stats
Descriptive statistics.Metric | Description | Parameters | Test Defaults |
---|---|---|---|
DataSummaryPreset() |
| Optional:
| As in individual Metrics. |
DatasetStats() |
| None |
|
RowCount() |
| Optional: |
|
ColumnCount() |
| Optional: |
|
Dataset data quality
Dataset-level data quality metrics.Data definition. You may need to map column types, ID and timestamp.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
ConstantColumnsCount() |
| Optional: |
|
EmptyRowsCount() |
| Optional: |
|
EmptyColumnsCount() |
| Optional: |
|
DuplicatedRowCount() |
| Optional: |
|
DuplicatedColumnsCount() |
| Optional: |
|
DatasetMissingValueCount() |
| Required:
|
|
AlmostConstantColumnsCount() |
| Optional: |
|
ColumnsWithMissingValuesCount() |
| Optional: |
|
Data Drift
Use to detect distribution drift for text and tabular data or over computed text descriptors. Checks 20+ drift methods listed separately: text and tabular.Data definition. You may need to map column types, ID and timestamp.
Metrics explainers. Understand how data drift works.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
DataDriftPreset() |
| Optional:
|
|
DriftedColumnsCount() |
| Optional:
|
|
ValueDrift() |
| Required:
|
|
Classification
Use to evaluate quality on a classification task (probabilistic, non-probabilistic, binary and multi-class).Data definition. You may need to map prediction, target columns and classification type.
General
Use for binary classification and aggregated results for multi-class.Metric | Description | Parameters | Test Defaults |
---|---|---|---|
ClassificationPreset() |
| Optional: probas_threshold . | As in individual Metrics. |
ClassificationQuality() |
| Optional: probas_threshold | As in individual Metrics. |
Accuracy() |
| Optional: |
|
Precision() |
| Required:
|
|
Recall() |
| Required:
|
|
F1Score() |
| Required:
|
|
TPR() |
| Required:
|
|
TNR() |
| Required:
|
|
FPR() |
| Required:
|
|
FNR() |
| Required:
|
|
LogLoss() |
| Required:
|
|
RocAUC() |
| Required:
|
|
Dummy model quality
Dummy model quality
Use these Metics to get the quality of a dummy model created on the same data (based on heuristics). You can compare your model quality to verify that it’s better than random. These Metrics serve as a baseline in automated testing.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
ClassificationDummyQuality() |
| N/A | N/A |
DummyPrecision() |
| N/A | N/A |
DummyRecall() |
| N/A | N/A |
DummyF1() |
| N/A | N/A |
By label
Use when you have multiple classes and want to evaluate quality separately.Metric | Description | Parameters | Test Defaults | |
---|---|---|---|---|
ClassificationQualityByLabel() |
| None | As in individual Metrics. | |
PrecisionByLabel() |
| Optional:
|
| |
F1ByLabel() |
| Optional:
|
| |
RecallByLabel() |
| Optional:
|
| |
RocAUCByLabel() |
| Optional:
|
|
Regression
Use to evaluate the quality of a regression model.Data definition. You may need to map prediction and target columns.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
RegressionPreset |
| None. | As in individual metrics. |
RegressionQuality |
| None. | As in individual metrics. |
MeanError() |
| Required:
|
|
MAE() |
| Required:
|
|
RMSE() |
| Optional: |
|
MAPE() |
| Required:
|
|
R2Score() |
| Optional: |
|
AbsMaxError() |
| Optional: |
|
Dummy model quality
Dummy model quality
Use these Metics to get the baseline quality for regression: they use optimal constants (varies by the Metric). These Metrics serve as a baseline in automated testing.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
RegressionDummyQuality() |
| N/A | N/A |
DummyMeanError() |
| N/A | N/A |
DummyMAE() |
| N/A | N/A |
DummyMAPE() |
| N/A | N/A |
DummyRMSE() |
| N/A | N/A |
DummyR2() |
| N/A | N/A |
Ranking
Use to evaluate ranking, search / retrieval or recommendations.Data definition. You may need to map prediction and target columns and ranking type.
Metric explainers. Check ranking metrics explainers.
Metric | Description | Parameters | Test Defaults |
---|---|---|---|
RecallTopK() |
| Required:
|
|
FBetaTopK() |
| Required:
|
|
PrecisionTopK() |
| Required:
|
|
MAP() |
| Required:
|
|
NDCG() |
| Required:
|
|
MRR() |
| Required:
|
|
HitRate() |
| Required:
|
|
ScoreDistribution() |
| Required:
|
|