Data drift parameters
How to set custom data drift detection conditions and thresholds.
Pre-requisites:
- You know how to generate reports or test suites with default parameters.
- You know how to pass custom parameters for reports or test suites.
All presets, tests, and metrics that include data or prediction drift evaluation use the default Data Drift algorithm. It automatically selects an appropriate statistical test based on the feature type and volume.
You can override the defaults by passing a custom parameter to the chosen test, metric, or preset. You can define the drift method, the threshold, or both.
To set a custom drift method and threshold on the column level:
ColumnDriftMetric(column_name=”feature1”, stattest=wasserstein, stattest_threshold=0.2)
If you have a preset, test or metric that checks for drift in multiple columns at the same time, you can set a custom drift method for all columns, all numerical/categorical columns, or for each column individually.
Here is how you set the drift detection method for all categorical columns:
DataDriftPreset(cat_stattest=ks, cat_statest_threshold=0.05)
To set a custom condition for the dataset drift (share of drifting features) in the relevant metrics or presets:
DatasetDriftMetric(drift_share=0.7)
Note that this works slightly differently for the individual tests. The reason is that tests expect you to define a condition to the test output. You should use standard test parameters like
lt
and gt
to set the condition.To set a custom condition for the dataset drift when you run a relevant test, you should set a condition for the share of drifted features using standard parameters:
TestShareOfDriftedColumns(lt=0.5)
Parameter | Description |
---|---|
stattest | Defines the drift detection method for a given column (if a single column is tested), or all columns in the dataset (if multiple columns are tested). |
stattest_threshold | Sets the drift threshold in a given column or all columns.
The threshold meaning varies based on the drift detection method, e.g., it can be the value of a distance metric or a p-value of a statistical test. |
drift_share | Defines the share of drifting columns as a condition for Dataset Drift metric or inside a preset. |
cat_stattest
cat_stattest_threshold | Sets the drift method and/or threshold for all categorical columns in the dataset. |
num_stattest
num_stattest_threshold | Sets the drift method and/or threshold for all numerical columns in the dataset. |
per_column_stattest
per_column_stattest_threshold | Sets the drift method and/or threshold for the listed columns (accepts a dictionary). |
How to check available parameters. You can verify which parameters are available for a specific test, metric, or preset in the All tests or All metrics tables or consult the API reference
ks
- Kolmogorov–Smirnov (K-S) test- default for numerical features
- only for numerical features
- returns
p_value
- drift detected when
p_value < threshold
chisquare
- Chi-Square test- default for categorical features if the number of labels for feature > 2
- only for categorical features
- returns
p_value
- drift detected when
p_value < threshold
z
- Z-test- default for categorical features if the number of labels for feature <= 2
- only for categorical features
- returns
p_value
- drift detected when
p_value < threshold
wasserstein
- Wasserstein distance (normed)- only for numerical features
- returns
distance
- drift detected when
distance >= threshold
kl_div
- Kullback-Leibler divergence- for numerical and categorical features
- returns
divergence
- drift detected when
divergence >= threshold
psi
- Population Stability Index (PSI)- for numerical and categorical features
- returns
psi_value
- drift detected when
psi_value >= threshold
jensenshannon
- Jensen-Shannon distance- for numerical and categorical features
- returns
distance
- drift detected when
distance >= threshold
anderson
- Anderson-Darling test- only for numerical features
- returns
p_value
- drift detected when
p_value < threshold
fisher_exact
- Fisher's Exact test- only for categorical features
- returns
p_value
- drift detected when
p_value < threshold
cramer_von_mises
- Cramer-Von-Mises test- only for numerical features
- returns
p-value
- drift detected when
p_value < threshold
g-test
- G-test- only for categorical features
- returns
p-value
- drift detected when
p_value < threshold
hellinger
- Hellinger Distance (normed)- for numerical and categorical features
- returns
distance
- drift detected when
distance >= threshold
mannw
- Mann-Whitney U-rank test- only for numerical features
- returns
p_value
- drift detected when
p_value < threshold
ed
- Energy distance- only for numerical features
- returns
distance
- drift detected when
distance >= threshold
es
- Epps-Singleton test- only for numerical features
- returns
p_value
- drift detected when
p_value < threshold
t_test
- T-Test- only for numerical features
- returns
p-value
- drift detected when
p_value < threshold
emperical_mmd
- Emperical-MMD- only for numerical features
- returns
p_value
-drift detected whenp_value < threshold
TVD
- Total-Variation-Distance- only for categorical features
- returns
p-value
- drift detected when
p_value < threshold
text_content_drift
- Text content drift (domain classifier)- default for text features
- only for text features
- returns
roc_auc
as drift_score - drift detected when roc_auc > roc_auc of the random classifier at a set percentile (
threshold
) - default threshold: 0.05
roc_auc
values can be 0 to 1 (typically 0.5 to 1); higher value mean more confident drift detection
Last modified 4d ago