Options for Statistical Tests
You can modify the statistical tests used to calculate Data and Target Drift.
Available Options
(deprecated)
feature_stattest_func
(default:None
): define the Statistical Test for features in the DataDrift Dashboard or Profile:None
- use default Statistical Tests for all features (based on internal logic)You can define a Statistical Test to be used for all the features in the dataset:
str
- the name of StatTest to use across all features (see the available names below)Callable[[pd.Series, pd.Series, str, float], Tuple[float, bool]]
- custom StatTest function added (see the requirements for the custom StatTest function below)StatTest
- an instance ofStatTest
You can define a Statistical Test to be used for individual features by passing a
dict
object where the key is a feature name and the value is one from the previous options (str
,Callable
orStatTest
)Deprecated: Use
all_features_stattest
orper_feature_statttest
options.
all_features_stattest
(default:None
): defines a custom statistical test for all features in DataDrift Dashboard or Profile.cat_features_stattest
(default:None
): defines a custom statistical test for categorical features in DataDrift Dashboard or Profile.num_features_stattest
(default:None
): defines a custom statistical test for numerical features in DataDrift Dashboard or Profile.per_feature_stattest
(default:None
): defines a custom statistical test per feature in DataDrift Dashboard or Profile asdict
object where key is feature name and values is statistical test.cat_target_stattest_func
(default:None
): defines a custom statistical test to detect target drift in the Categorical Target Drift report. It follows the same logic as thefeature_stattest_func
, but without thedict
option.num_target_stattest_func
(default:None
): defines a custom statistical test to detect target drift in the Numerical Target Drift report. It follows the same logic as thefeature_stattest_func
, but without thedict
option.
Example:
Change the StatTest for all the features in the Data Drift report:
Change the StatTest for a single feature to a Custom (user-defined) function:
Change the StatTest for a single feature to Custom function (using a StatTest object):
Custom StatTest function requirements:
The StatTest function should match (reference_data: pd.Series, current_data: pd.Series, threshold: float) -> Tuple[float, bool]
signature:
reference_data: pd.Series
- reference data seriescurrent_data: pd.Series
- current data series to comparefeature_type: str
- feature typethreshold: float
- Stat Test threshold for drift detection
Returns:
score: float
- Stat Test score (actual value)drift_detected: bool
- indicates is drift detected with given threshold
Example:
StatTest meta information (StatTest class):
To use the StatTest function, we recommended writing a specific instance of the StatTest class for that function:
To create the instance of the StatTest
class, you need:
name: str
- a short name used to reference the Stat Test from the options (the StatTest should be registered globally)display_name: str
- a long name displayed in the Dashboard and Profilefunc: Callable
- a StatTest functionallowed_feature_types: List[str]
- the list of allowed feature types to which this function can be applied (available values:cat
,num
)
Example:
Available StatTest Functions:
ks
- Kolmogorov–Smirnov (K-S) testdefault for numerical features
only for numerical features
returns
p_value
drift detected when
p_value < threshold
chisquare
- Chi-Square testdefault for categorical features if the number of labels for feature > 2
only for categorical features
returns
p_value
drift detected when
p_value < threshold
z
- Z-testdefault for categorical features if the number of labels for feature <= 2
only for categorical features
returns
p_value
drift detected when
p_value < threshold
wasserstein
- Wasserstein distance (normed)only for numerical features
returns
distance
drift detected when
distance >= threshold
kl_div
- Kullback-Leibler divergencefor numerical and categorical features
returns
divergence
drift detected when
divergence >= threshold
psi
- Population Stability Index (PSI)for numerical and categorical features
returns
psi_value
drift detected when
psi_value >= threshold
jensenshannon
- Jensen-Shannon distancefor numerical and categorical features
returns
distance
drift detected when
distance >= threshold
Last updated