Run Evidently on Spark
How to run calculations on Spark.
You can run distributed computation using Spark if you work with large datasets.
Supported metrics
Currently, the following Tests, Metrics and Presets are supported:
ColumnDriftMetric()
DataDriftTable()
DatasetDriftMetric()
DataDriftPreset()
TestColumnDrift()
TestShareOfDriftedColumns()
TestNumberOfDriftedColumns()
DataDriftTestPreset()
For drift calculation, the following methods are supported:
chisquare
jensen shannon
psi
wasserstein
The following data types are supported:
numerical_features
categorical_features
Code example
You can refer to an example How-to-notebook showing how to use Evidently on Spark:
Run Evidently with Spark
To run Evidently on a Spark DataFrame, you need to specify the corresponding engine in the run()
method for the Report calculation:
To import SparkEngine
from Evidently, use the following command:
Pass the SparkEngine
to the run
method when you create the Report:
Last updated