> ## Documentation Index
> Fetch the complete documentation index at: https://docs.evidentlyai.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Customize Data Drift

> How to change data drift detection methods and conditions.

All Metrics and Presets that evaluate shift in data distributions use the default [Data Drift algorithm](/metrics/explainer_drift). It automatically selects the drift detection method based on the column type (text, categorical, numerical) and volume.

You can override the defaults by passing a custom parameter to the chosen Metric or Preset. You can modify the drift detection method (choose from 20+ available), thresholds, or both.&#x20;

You can also implement fully custom drift detection methods.

**Pre-requisites**:

* You know how to use [Data Definition ](/docs/library/data_definition)to map column types.

* You know how to create [Reports](/docs/library/report) and run [Tests](/docs/library/tests).

## Data drift parameters

<Note>
  Setting conditions for data drift works differently from the usual Test  API (with `gt`, `lt`, etc.) This accounts for nuances like varying role of thresholds across drift detection methods, where "greater"  can be better or worse depending on the method.&#x20;
</Note>

### Dataset-level

**Dataset drift share**. You can set the share of drifting columns that signals **dataset drift** (default: 0.5) in the relevant Metrics or Presets. For example, to set it at 70%:

```python theme={null}
report = Report([
    DataDriftPreset(drift_share=0.7)
]
```

This will detect dataset drift if over 70% columns are drifting, using defaults for each column.

**Drift methods**. You can also specify the drift detection methods used on the column level. For example, to use PSI (Population Stability Index) for all columns in the dataset:

```python theme={null}
report = Report([
    DataDriftPreset(drift_share=0.7, method="psi")
]
```

This will check if over 70% columns are drifting, using PSI method with default thresholds.

<Tip>
  See all available methods in the table below.
</Tip>

**Drift thresholds**. You can set thresholds for each method. For example, use PSI with a threshold of 0.3 for categorical columns.

```python theme={null}
report = Report([
    DataDriftPreset(cat_method="psi", cat_threshold="0.3")
]
```

In this case, if PSI is ≥ 0.3 for any categorical column, drift will be detected for that column. The rest of the checks will use defaults: default methods for numerical and text columns (if present), and 50% as the `drift_share` threshold.

### Column-level

For column-level metrics, you can set the drift method/threshold directly for each column:

```python theme={null}
report = Report([
    ValueDrift(column="Salary", method="psi"),
]
```

### All parameters

Use the following parameters to pass chosen drift methods. See methods and their defaults below.

| Parameter                                       | Description                                                                                                                                                                                                                | Applies To                                                   |
| ----------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------ |
| `method`                                        | Defines the drift detection method for a given column (if one column is tested), or all columns in the dataset (if multiple columns are tested and the method can apply to all columns).                                   | `ValueDrift()`, `DriftedColumnsCount()`, `DataDriftPreset()` |
| `threshold`                                     | Sets the drift threshold in a given column or all columns.<br /><br />The threshold meaning varies based on the drift detection method, e.g., it can be the value of a distance metric or a p-value of a statistical test. | `ValueDrift()`, `DriftedColumnsCount()`, `DataDriftPreset()` |
| `drift_share`                                   | Defines the share of drifting columns as a condition for Dataset Drift. Default: 0.5                                                                                                                                       | `DriftedColumnsCount()`, `DataDriftPreset()`                 |
| `cat_method` <br />`cat_threshold`              | Sets the drift method and/or threshold for all categorical columns.                                                                                                                                                        | `DriftedColumnsCount()`, `DataDriftPreset()`                 |
| `num_method` <br />`num_threshold`              | Sets the drift method and/or threshold for all numerical columns.                                                                                                                                                          | `DriftedColumnsCount()`, `DataDriftPreset()`                 |
| `per_column_method`<br />`per_column_threshold` | Sets the drift method and/or threshold for the listed columns (accepts a dictionary).                                                                                                                                      | `DriftedColumnsCount()`, `DataDriftPreset()`                 |
| `text_method` <br /> `text_threshold`           | Defines the drift detection method and threshold for all text columns.                                                                                                                                                     | `DriftedColumnsCount()`, `DataDriftPreset()`                 |

## Data drift detection methods

### Tabular data

The following methods apply to **tabular** data: numerical or categorical columns in data definition. Pass them using the `stattest` (or `num_stattest`, etc.) parameter.

| StatTest                                          | Applicable to                                                                                                       | Drift score                                                                                          |
| ------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
| `ks`<br />Kolmogorov–Smirnov (K-S) test           | tabular data<br />only numerical <br /><br />**Default method for numerical data, if ≤ 1000 objects**               | returns `p_value`<br />drift detected when `p_value < threshold`<br />default threshold: 0.05        |
| `chisquare`<br />Chi-Square test                  | tabular data<br />only categorical<br /><br />**Default method for categorical with > 2 labels, if ≤ 1000 objects** | returns `p_value`<br />drift detected when `p_value < threshold`<br />default threshold: 0.05        |
| `z`<br /> Z-test                                  | tabular data<br />only categorical<br /><br />**Default method for binary data, if ≤ 1000 objects**                 | returns `p_value`<br />drift detected when `p_value < threshold`<br />default threshold: 0.05        |
| `wasserstein`<br /> Wasserstein distance (normed) | tabular data<br />only numerical<br /><br />**Default method for numerical data, if > 1000 objects**                | returns `distance`<br />drift detected when `distance` ≥ `threshold`<br />default threshold: 0.1     |
| `kl_div`<br />Kullback-Leibler divergence         | tabular data<br />numerical and categorical                                                                         | returns `divergence`<br />drift detected when `divergence` ≥ `threshold`<br />default threshold: 0.1 |
| `psi`<br /> Population Stability Index (PSI)      | tabular data<br />numerical and categorical                                                                         | returns `psi_value`<br />drift detected when `psi_value` ≥ `threshold`<br />default threshold: 0.1   |
| `jensenshannon`<br /> Jensen-Shannon distance     | tabular data<br />numerical and categorical<br /><br />**Default method for categorical, if > 1000 objects**        | returns `distance`<br />drift detected when `distance` ≥ `threshold`<br />default threshold: 0.1     |
| `anderson`<br /> Anderson-Darling test            | tabular data<br />only numerical                                                                                    | returns `p_value`<br />drift detected when `p_value < threshold`<br />default threshold: 0.05        |
| `fisher_exact`<br /> Fisher's Exact test          | tabular data<br />only categorical                                                                                  | returns `p_value`<br />drift detected when `p_value < threshold`<br />default threshold: 0.05        |
| `cramer_von_mises`<br /> Cramer-Von-Mises test    | tabular data<br />only numerical                                                                                    | returns `p_value`<br />drift detected when `p_value < threshold`<br />default threshold: 0.05        |
| `g-test`<br /> G-test                             | tabular data<br />only categorical                                                                                  | returns `p_value`<br />drift detected when `p_value < threshold`<br />default threshold: 0.05        |
| `hellinger`<br /> Hellinger Distance (normed)     | tabular data<br />numerical and categorical                                                                         | returns `distance`<br />drift detected when `distance` >= `threshold`<br />default threshold: 0.1    |
| `mannw`<br /> Mann-Whitney U-rank test            | tabular data<br />only numerical                                                                                    | returns `p_value`<br />drift detected when `p_value < threshold`<br />default threshold: 0.05        |
| `ed`<br /> Energy distance                        | tabular data<br />only numerical                                                                                    | returns `distance`<br />drift detected when `distance >= threshold`<br />default threshold: 0.1      |
| `es`<br /> Epps-Singleton test                    | tabular data<br />only numerical                                                                                    | returns `p_value`<br />drift detected when `p_value < threshold`<br />default threshold: 0.05        |
| `t_test`<br /> T-Test                             | tabular data<br />only numerical                                                                                    | returns `p_value`<br />drift detected when `p_value < threshold`<br />default threshold: 0.05        |
| `empirical_mmd`<br /> Empirical-MMD               | tabular data<br />only numerical                                                                                    | returns `p_value`<br />drift detected when `p_value < threshold`<br />default threshold: 0.05        |
| `TVD`<br /> Total-Variation-Distance              | tabular data<br />only categorical                                                                                  | returns `p_value`<br />drift detected when `p_value` \< `threshold`<br />default threshold: 0.05     |

### Text data

Text drift detection applies to columns with **raw text data**, as specified in data definition. Pass them using the `stattest` (or `text_stattest`) parameter.

| StatTest                                                                                                    | Description                                                                                                                                                                       | Drift score                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| ----------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `perc_text_content_drift`<br /> Text content drift (domain classifier, with statistical hypothesis testing) | Applies only to text data. Trains a classifier model to distinguish between text in “current” and “reference” datasets.<br /><br />**Default for text data ≤ 1000 objects.**      | <ul><li>returns `roc_auc` of the classifier as a `drift_score`</li><li>drift detected when `roc_auc` > possible ROC AUC of the random classifier at a set percentile</li><li>`threshold` sets the percentile of the possible ROC AUC values of the random classifier to compare against</li><li>default threshold: 0.95 (95th percentile)</li><li> `roc_auc` values can be 0 to 1 (typically 0.5 to 1); a higher value means more confident drift detection</li></ul> |
| `abs_text_content_drift`<br /> Text content drift (domain classifier)                                       | Applies only to text data. Trains a classifier model to distinguish between text in “current” and “reference” datasets.<br /><br />**Default for text data when > 1000 objects.** | <ul><li>returns `roc_auc` of the classifier as a `drift_score`</li><li>drift detected when `roc_auc` > `threshold` </li><li>`threshold` sets the ROC AUC threshold</li><li>default threshold: 0.55</li><li> `roc_auc` values can be 0 to 1 (typically 0.5 to 1); a higher value means more confident drift detection</li></ul>                                                                                                                                        |

<Tip>
  **Text descriptors drift**. If you work with raw text data, you can also check for distribution drift in text descriptors (such as text length, etc.) To use this method, first compute the selected [text descriptors](/docs/library/descriptors). Then, use numerical / categorical drift detection methods as usual.
</Tip>

## Add a custom method

If you do not find a suitable drift detection method, you can implement a custom function:

```python theme={null}
import pandas as pd
from scipy.stats import anderson_ksamp

from evidently import Dataset
from evidently import DataDefinition
from evidently import Report
from evidently import ColumnType
from evidently.metrics import ValueDrift
from evidently.metrics import DriftedColumnsCount
from evidently.legacy.calculations.stattests import register_stattest
from evidently.legacy.calculations.stattests import StatTest

#toy data 
data = pd.DataFrame(data={
    "column_1": [1, 2, 3, 4, -1, 5],
    "target": [1, 1, 0, 0, 1, 1],
    "prediction": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6],
})

definition = DataDefinition(
    numerical_columns=["column_1", "target", "prediction"],
    )
dataset = Dataset.from_pandas(
    data,
    data_definition=definition,
)

#implement method
def _addd(
    reference_data: pd.Series,
    current_data: pd.Series,
    feature_type: ColumnType,
    threshold: float,
):
    p_value = anderson_ksamp([reference_data.values, current_data.values])[2]
    return p_value, p_value < threshold


adt = StatTest(
    name="adt",
    display_name="Anderson-Darling",
    allowed_feature_types=[ColumnType.Numerical],
    default_threshold=0.1,
)

register_stattest(adt, default_impl=_addd)


report = Report([
    # ValueDrift(column="column_1"),
    ValueDrift(column="column_1", method="adt"),
    DriftedColumnsCount(),
])

snapshot = report.run(dataset, dataset)
snapshot
```

We recommended writing a specific instance of the **StatTest class** for that function. You need:

| Parameter               | Type        | Description                                                                          |
| ----------------------- | ----------- | ------------------------------------------------------------------------------------ |
| `name`                  | `str`       | A short name used to reference the Stat Test from the options (registered globally). |
| `display_name`          | `str`       | A long name displayed in the Report.                                                 |
| `func`                  | `Callable`  | The StatTest function.                                                               |
| `allowed_feature_types` | `List[str]` | The list of allowed feature types for this function (`cat`, `num`).                  |

The **StatTest function** itself should match `(reference_data: pd.Series, current_data: pd.Series, threshold: float) -> Tuple[float, bool]` signature.

Accepts:

* `reference_data: pd.Series` - The reference data series.

* `current_data: pd.Series` - The current data series to compare.

* `feature_type: str` - The type of feature being analyzed.

* `threshold: float` - The test threshold for drift detection.

Returns:

* `score: float` - Stat Test score (actual value)

* `drift_detected: bool` - indicates is drift detected with given threshold
