Get a Report

How to generate Reports using Evidently Python library.

Code examples

Check the sample notebooks in Examples.

Imports

After installing Evidently, import the Report component and the necessary metric_presets or metrics you plan to use:

from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, DataQualityPreset
from evidently.metrics import *

How it works

Here is the general flow:

  • Input data. Prepare data as a Pandas DataFrame. This will be your current data to run evaluations for. For some checks, you may need a second reference dataset. Check the input data requirements.

  • Schema mapping. Define your data schema using Column Mapping. Optional, but highly recommended.

  • Define the Report. Create a Report object and list the selected metrics.

  • Run the Report. Run the Report on your current_data. If applicable, pass the reference_data and column_mapping.

  • Get the results. View the Report in Jupyter notebook, export the metrics, or upload to Evidently Platform.

You can use Metric Presets, which are pre-built Reports that work out of the box, or create a custom Report selecting Metrics one by one.

Metric Presets

To generate a Report using Metric Preset, simply include the selected Metric Preset in the metrics list.

Example 1. To generate the Data Quality Report for a single dataset and get the visual output in Jupyter notebook or Colab:

data_quality_report = Report(metrics=[
    DataQualityPreset()
])

data_quality_report.run(current_data=my_dataset,
                        reference_data=None,
                        column_mapping=None)
data_quality_report

If nothing else is specified, the Report will run with the default parameters for all columns in the dataset.

Available Presets. There are other Presets: for example, DataDriftPreset, RegressionPreset and ClassificationPreset. Check the list of Presets to understand individual, and all available Metrics.

Example 2. You can include multiple Presets in a Report. To combine Data Drift and Data Quality and run them over two datasets, including a reference dataset necessary for data drift evaluation:

drift_report = Report(metrics=[
     DataDriftPreset(),
     DataQualityPreset()
])
 
drift_report.run(reference_data=my_ref_dataset,
                 current_data=my_cur_dataset)
drift_report

It will display the combined Report in Jupyter notebook or Colab.

Raw data in visuals. Visuals in the Reports are aggregated. This reduces load time and Report size for larger datasets, even with millions of rows. If you work with small datasets or samples, you can generate plots with raw data.

Example 3. To export the values computed inside the Report, export it as a Python dictionary.

drift_report.as_dict()

There are more output formats!. You can also export Report results in formats like HTML, JSON, dataframe, and more. Refer to the Output Formats for details.

Example 4. You can customize some of the Metrics inside the Preset. For example, set a custom decision threshold (instead of default 0.5) when computing classification quality metrics:

dataset_report = Report(metrics=[
    ClassificationPreset(probas_threshold=0.7),
])

Example 5. You can pass a list of columns to the Preset, so column-specific Metrics are generated only for those columns, not the entire dataset.

drift_report = Report(metrics=[
    DataDriftPreset(columns=["age", "position"]),
])

Refer to the All metrics table to see defaults and available parameters that you can pass for each Preset.

Get a custom Report

While Presets are a great starting point, you may want to customize the Report by choosing Metrics or adjusting their parameters even more. To do this, create a custom Report.

1. Choose metrics

First, define which Metrics you want to include in your custom Report. Metrics can be either dataset-level or column-level.

Available Metrics: See the All metrics table. For a preview, check Example notebooks.

Row-level evals: To generate row-level scores for text data, check Text Descriptors.

Dataset-level metrics. Some Metrics evaluate the entire dataset. For example, a Metric that checks for data drift across the whole dataset or calculates accuracy.

To create a custom Report with dataset-level metrics, create a Report object and list the metrics:

data_drift_dataset_report = Report(metrics=[
    DatasetDriftMetric(),
    DataseSummaryMetric(),  
])

Column-level Metrics. Some Metrics focus on individual columns, like evaluating distribution drift or summarizing specific columns. To include column-level Metrics, pass the name of the column to each such Metric:

data_drift_column_report = Report(metrics=[
    ColumnSummaryMetric(column_name="age"),
    ColumnDriftMetric(column_name="age"),   
])

Generating multiple column-level Metrics: You can use a helper function to easily generate multiple column-level Metrics for a list of columns. See the page on Metric Generator.

Combining Metrics and Presets. You can mix Metrics Presets and individual Metrics in the same Report, and also combine column-level and dataset-level Metrics.

my_report = Report(metrics=[
    DataQualityPreset(),
    DatasetDriftMetric(),
    ColumnDriftMetric(column_name="age"),
])

2. Set metric parameters

Metrics can have optional or required parameters. For example, the data drift detection algorithm selects a method automatically, but you can override this by specifying your preferred method (Optional). To calculate the number of values matching a regular expression, you must always define this expression (Required).

Example 1. How to specify a regular expression (required parameter):

data_integrity_column_report = Report(metrics=[
    ColumnRegExpMetric(column_name="education", reg_exp=r".*-.*", top=5),
    ColumnRegExpMetric(column_name="relationship", reg_exp=r".*child.*")
])

data_integrity_column_report.run(reference_data=adult_ref, current_data=adult_cur)
data_integrity_column_report

Example 2. How to specify a custom Data Drift test (optional parameter).

data_drift_column_report = Report(metrics=[
    ColumnDriftMetric('age'),
    ColumnDriftMetric('age', stattest='psi'),
])
data_drift_column_report.run(reference_data=adult_ref, current_data=adult_cur)

data_drift_column_report

Reference: The available parameters for each Metric are listed in the All metrics table.

Last updated