GET STARTED

Step-by-step guides

Powered By GitBook

Regression Performance

Works for a **single model** or helps compare the **two**

Displays a variety of plots related to the **performance** and **errors**

Helps explore areas of **under-** and **overestimation **

Summary

The **Regression Performance** report evaluates the quality of a regression model.

It can also compare it to the past performance of the same model, or the performance of an alternative model.

Requirements

To run this report, you need to have input features, and **both target and prediction** columns available.

To generate a comparative report, you will need **two** datasets. The **reference** dataset serves as a benchmark. We analyze the change by comparing the **current** production data to the **reference** data.

You can also run this report for a** single **

`DataFrame`

, with no comparison performed. In this case, pass it as `reference_data`

.How it looks

The report includes 12 components. All plots are interactive.

We calculate a few standard model quality metrics: Mean Error (ME), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE).

For each quality metric, we also show one standard deviation of its value (in brackets) to estimate the stability of the performance.

2. **Predicted vs Actual **

Predicted versus actual values in a scatter plot.

3. **Predicted vs Actual in Time**

Predicted and Actual values over time or by index, if no datetime is provided.

4. Error (Predicted - Actual)

Model error values over time or by index, if no datetime is provided.

5. Absolute Percentage Error

Absolute percentage error values over time or by index, if no datetime is provided.

6. Error Distribution

Distribution of the model error values.

7. Error Normality

We show a summary of the model quality metrics for each of the two groups: mean Error (ME), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE).

We plot the predictions, coloring them by the group they belong to. It visualizes the regions where the model underestimates and overestimates the target function.

This table helps quickly see the differences in feature values between the 3 groups:

For the numerical features, it shows the mean value per group. For the categorical features, it shows the most common value.

If you have two datasets, the table displays the values for both REF (reference) and CURR (current).

If you observe a large difference between the groups, it means that the model error is sensitive to the values of a given feature.

Here is the formula used to calculate the Range %:

$Range = 100*|(Vover-Vunder)/(Vmax-Vmin)|$

For each feature, we show a histogram to visualize the **distribution of its values in the segments with extreme errors** and in the rest of the data. You can visually explore if there is a relationship between the high error and the values of a given feature.

Here is an example where extreme errors are dependent on the "temperature" feature.

12. Predicted vs Actual per Feature

For each feature, we also show the Predicted vs Actual scatterplot. We use colors to show the distribution of the values of a given feature. It helps visually detect and explore underperforming segments which might be sensitive to the values of the given feature.

When to use the report

Here are our suggestions on when to use itâ€”you can also combine it with the Data Drift and Numerical Target Drift reports to get a comprehensive picture.
**1. To analyze the results of the model test. **You can explore the results of an online or offline test and contrast it to the performance in training. Though this is not the primary use case, you can use this report to compare the model performance in an A/B test, or during a shadow model deployment.
**2. To generate regular reports on the performance of a production model. **You can run this report as a regular job (e.g. weekly or at every batch model run) to analyze its performance and share it with other stakeholders.

JSON Profile

If you choose to generate a JSON profile, it will contain the following information:

1

{

2

"regression_performance": {

3

"name": "regression_performance",

4

"datetime": "datetime",

5

"data": {

6

"utility_columns": {

7

"date": "date",

8

"id": null,

9

"target": "target",

10

"prediction": "prediction"

11

},

12

"cat_feature_names": [],

13

"num_feature_names": [],

14

"metrics": {

15

"reference": {

16

"mean_error": mean_error,

17

"mean_abs_error": mean_abs_error,

18

"mean_abs_perc_error": mean_abs_perc_error,

19

"error_std": error_std,

20

"abs_error_std": abs_error_std,

21

"abs_perc_error_std": abs_perc_error_std,

22

"error_normality": {

23

"order_statistic_medians": [],

24

"slope": slope,

25

"intercept": intercept,

26

"r": r

27

},

28

"underperformance": {

29

"majority": {

30

"mean_error": mean_error,

31

"std_error": std_error

32

},

33

"underestimation": {

34

"mean_error": mean_error,

35

"std_error": std_error

36

},

37

"overestimation": {

38

"mean_error": mean_error,

39

"std_error": std_error

40

}

41

}

42

},

43

"current": {

44

"mean_error": mean_error,

45

"mean_abs_error": mean_abs_error,

46

"mean_abs_perc_error": mean_abs_perc_error,

47

"error_std": error_std,

48

"abs_error_std": abs_error_std,

49

"abs_perc_error_std": abs_perc_error_std,

50

"error_normality": {

51

"order_statistic_medians": [],

52

"slope": slope,

53

"intercept": intercept,

54

"r": r

55

},

56

"underperformance": {

57

"majority": {

58

"mean_error": mean_error,

59

"std_error": std_error

60

},

61

"underestimation": {

62

"mean_error": mean_error,

63

"std_error": std_error

64

},

65

"overestimation": {

66

"mean_error": mean_error,

67

"std_error": std_error

68

}

69

}

70

},

71

"error_bias": {

72

"feature_name": {

73

"feature_type": "num",

74

"ref_majority": ref_majority,

75

"ref_under": ref_under,

76

"ref_over": ref_over,

77

"ref_range": ref_range,

78

"prod_majority": prod_majority,

79

"prod_under": prod_under,

80

"prod_over": prod_over,

81

"prod_range": prod_range

82

},

83

84

"holiday": {

85

"feature_type": "cat",

86

"ref_majority": 0,

87

"ref_under": 0,

88

"ref_over": 0,

89

"ref_range": 0,

90

"prod_majority": 0,

91

"prod_under": 0,

92

"prod_over": 1,

93

"prod_range": 1

94

},

95

}

96

}

97

}

98

},

99

"timestamp": "timestamp"

100

}

Copied!

Examples

See a tutorial "How to break a model in 20 days" where we create a demand prediction model and analyze its gradual decay.

Last modified 3mo ago

Copy link

Contents

Summary

Requirements

How it looks

1. Model Quality Summary Metrics

2. Predicted vs Actual

3. Predicted vs Actual in Time

4. Error (Predicted - Actual)

5. Absolute Percentage Error

6. Error Distribution

7. Error Normality

8. Mean Error per Group

9. Predicted vs Actual per Group

10. Error Bias: Mean/Most Common Feature Value per Group

11. Error Bias per Feature

12. Predicted vs Actual per Feature

When to use the report

JSON Profile

Examples