Evidently + MLflow

TL;DR: You can use Evidently to calculate metrics, and MLflow Tracking to log and view the results. Here is a sample Jupyter notebook.


Many machine learning teams use MLflow for experiment management, deployment, and as a model registry. If you are already familiar with MLflow, you can integrate it with Evidently to track the performance of production models.

In this case, you use Evidently to calculate the metrics and MLflow to log the results. You can then access the metrics in the MLflow interface.

How it works

Evidently calculates a rich set of metrics and statistical tests. You can choose any of the pre-built reports to define the metrics you’d want to get.

You can then generate a JSON profile that will contain the defined metrics output. You can combine several profile sections (e.g., Data and Prediction Drift together).

You might not always need all metrics from the profile. You should explicitly define which parts of the output to send to MLflow Tracking.

Tutorial 1: Evaluating Data Drift with MLFlow and Evidently

In this example, we will use Evidently to check input features for Data Drift and log and visualize the results with MLflow.

Here is a Jupyter notebook with the example: link.

Step 1. Install MLflow and Evidently

Evidently is available as a PyPI package:

$ pip install evidently

For more details, refer to the Evidently installation guide.

To install MLflow, run:

$ pip install mlflow

Or install MLflow with scikit-learn via

$pip install mlflow[extras]

For more details, refer to MLflow documentation.

Step 2. Load the data

Load the data from UCI repository (link) and save it locally.

For demonstration purposes, we treat this data as the input data for a live model. To use with production models, you should make your prediction logs available.

This is how it looks:

Step 3. Define column mapping

We specify the categorical and numerical features so that Evidently performs the correct statistical test for each of them.

data_columns = {}
data_columns['numerical_features'] = ['weather', 'temp', 'atemp', 'humidity', 'windspeed']
data_columns['categorical_features'] = ['holiday', 'workingday']

Step 4. Define what to log

We specify which metrics we want to see. In this case, we want to get the p-value of the statistical test performed to evaluate the drift for each feature.

def eval_drift(reference, production, column_mapping):
data_drift_profile = Profile(sections=[DataDriftProfileSection])
data_drift_profile.calculate(reference, production, column_mapping=column_mapping)
report = data_drift_profile.json()
json_report = json.loads(report)
drifts = []
for feature in column_mapping['numerical_features'] + column_mapping['categorical_features']:
drifts.append((feature, json_report['data_drift']['data']['metrics'][feature]['p_value']))
return drifts

Step 5. Define the comparison windows

We specify the period that is considered reference: we will use it as the base for the comparison. Then, we choose the periods that we treat as experiments that emulate production model runs.

#set reference dates
reference_dates = ('2011-01-01 00:00:00','2011-01-28 23:00:00')
#set experiment batches dates
experiment_batches = [
('2011-01-01 00:00:00','2011-01-29 23:00:00'),
('2011-01-29 00:00:00','2011-02-07 23:00:00'),
('2011-02-07 00:00:00','2011-02-14 23:00:00'),
('2011-02-15 00:00:00','2011-02-21 23:00:00'),

Step 6. Run and log experiments in MLflow

We initiate the experiments and log the metrics calculated with Evidently on each run.

#log into MLflow
client = MlflowClient()
#set experiment
mlflow.set_experiment('Data Drift Evaluation with Evidently')
#start new run
for date in experiment_batches:
with mlflow.start_run() as run: #inside brackets run_name='test'
# Log parameters
mlflow.log_param("begin", date[0])
mlflow.log_param("end", date[1])
# Log metrics
metrics = eval_drift(raw_data.loc[reference_dates[0]:reference_dates[1]],
for feature in metrics:
mlflow.log_metric(feature[0], round(feature[1], 3))

Step 7. View the results in MLflow web UI

You can then use the MLflow UI to see the results of the runs.

With a large number of metrics, you can use the expanded view.

Tutorial 2: Evaluating Historical Data Drift with Evidently, Plptly and MLFlow

See a tutorial here.