Evidently and Metaflow
Run model evaluation or data drift analysis as Metaflow Flow and save the Evidently metrics in S3, visualizing it with the optional Metaflow UI.
This is a community-contributed integration. Author: Marcello Victorino.
Metaflow is an open-source framework to helps scientists and engineers build and manage real-life data science projects.
You can use this integration to generate Evidently HTML reports and test suites, executed via a Metaflow Flow and visualize it as a Card - using the metaflow-card-html plugin.
Overview
Many machine learning teams use Metaflow to orchestrate the multiple stages of ML lifecycle, such as data preparation, training, deployment, serving predictions, and as a model registry.
If you are already familiar with Metaflow, here is an example on how to integrate it with Evidently to track the quality of data and the data drift.
In this case, Metaflow will orchestrate the execution of the Flow, using Evidently to calculate the metrics/tests and generate the visual report and Metaflow to log the HTML results as an artefact. You can then access the metrics in the Metaflow UI interface - or retrieve it via the cards api.
How it works
With Metaflow, you can organize your Batch process into multiple Flows, such as:
TrainingFlow: retrieve data, split into train/test, train multiple models in parallel, identify the best and store it as an artifact
ServingFlow: from the latest successful TrainingFlow, retrieves the best model and use it to make predictions on the new data
MonitoringFlow: triggered by the
ServingFlow
, retrieves the data used in each last successful Flow and calculates the desired metrics, such as data quality and data drift, wherereference
is the data used in theTrainingFlow
andcurrent
comes from theServingFlow
Note: Evidently calculates a rich set of metrics and statistical tests. You can choose any of the pre-built reports and test suites to define the type of analysis you’d want to get.
Within every Flow, it is possible to store artifacts that can be visualised with the card
feature. This way, you can save the HTML content of the Evidently reports to be visualized with the metaflow-card-html
plugin.
Tutorial: Evaluating Data Drift with Metaflow and Evidently
In this example, we will use Evidently to check input features for Data Drift and log and visualize the resulting report with Metaflow.
Step 1. Install Metaflow and Evidently
Evidently is available as a PyPI package:
For more details, refer to the Evidently installation guide.
To install Metaflow, run:
For more details, refer to the Metaflow documentation.
Install the metaflow-card-html
plugin:
And any other dependencies, such as scikit-learn.
Step 2. Define the helper function to obtain rendered HTML
We will use the following helper function to simplify obtaining the final fully rendered HTML content for the Evidently reports.
Step 3. Define the Metaflow Flow
The start
step is based on the Evidently getting started tutorial, preparing the data to be used in the following steps.
The monitoring_data_quality
behaves as a step, due to the @mf.step
decorator. The @mf.card(type='html')
decorator adds behavior, ensuring the attribute self.html
will be stored and properly rendered as HTML in the Card.
Which can be executed with the command:
Step 4. Visualize the report in the respective Card
The respective card can be visualized in multiple ways, such as via the optional Metaflow UI, the api client, or just simply using the command line interface:
Last updated