Skip to main content

Model Optimization in Imbalanced Regression

  • Conference paper
  • First Online:
Discovery Science (DS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13601))

Included in the following conference series:

Abstract

Imbalanced domain learning aims to produce accurate models in predicting instances that, though underrepresented, are of utmost importance for the domain. Research in this field has been mainly focused on classification tasks. Comparatively, the number of studies carried out in the context of regression tasks is negligible. One of the main reasons for this is the lack of loss functions capable of focusing on minimizing the errors of extreme (rare) values. Recently, an evaluation metric was introduced: Squared Error Relevance Area (SERA). This metric posits a bigger emphasis on the errors committed at extreme values while also accounting for the performance in the overall target variable domain, thus preventing severe bias. However, its effectiveness as an optimization metric is unknown. In this paper, our goal is to study the impacts of using SERA as an optimization criterion in imbalanced regression tasks. Using gradient boosting algorithms as proof of concept, we perform an experimental study with 36 data sets of different domains and sizes. Results show that models that used SERA as an objective function are practically better than the models produced by their respective standard boosting algorithms at the prediction of extreme values. This confirms that SERA can be embedded as a loss function into optimization-based learning algorithms for imbalanced regression scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/anibalsilva1/IRModelOptimization.

References

  1. Benavoli, A., Corani, G., Demšar, J., Zaffalon, M.: Time for a change: a tutorial for comparing multiple classifiers through bayesian analysis. J. Mach. Learn. Res. 18(77), 1–36 (2017)

    MathSciNet  MATH  Google Scholar 

  2. Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2), 1–50 (2016). https://doi.org/10.1145/2907070

  3. Branco, P., Torgo, L., Ribeiro, R.P.: Pre-processing approaches for imbalanced distributions in regression. Neurocomputing 343, 76–99 (2019). https://doi.org/10.1016/j.neucom.2018.11.100

    Article  Google Scholar 

  4. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002)

    MATH  Google Scholar 

  5. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39804-2_12

    Chapter  Google Scholar 

  6. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 785–794. ACM (2016). https://doi.org/10.1145/2939672.2939785

  7. Chen, T., et al.: XGBoost: Extreme Gradient Boosting (2022). https://CRAN.R-project.org/package=xgboost

  8. Christoffersen, P.F., Diebold, F.X.: Further results on forecasting and model selection under asymmetric loss. J. Appl. Economet. 11(5), 561–571 (1996)

    Article  Google Scholar 

  9. Ehrig, L., Atzberger, D., Hagedorn, B., Klimke, J., Döllner, J.: Customizable asymmetric loss functions for machine learning-based predictive maintenance. In: 2020 8th International Conference on Condition Monitoring and Diagnosis (CMD), pp. 250–253 (2020). https://doi.org/10.1109/CMD48350.2020.9287246

  10. Elkan, C.: The foundations of cost-sensitive learning. In: 17th International Conference on Artificial Intelligence, vol. 1, pp. 973–978 (2001)

    Google Scholar 

  11. Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: AdaCost: misclassification cost-sensitive boosting. In: 16th International Conference on Machine Learning, pp. 97–105. ICML 1999. Morgan Kaufmann Publishers Inc. (1999)

    Google Scholar 

  12. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Statist. 29(5), 1189–1232 (2001). http://www.jstor.org/stable/2699986

  13. Granger, C.W.J.: Outline of forecast theory using generalized cost functions. SpanEconRev 1(2), 161–173 (1999). https://doi.org/10.1007/s101080050007

    Article  Google Scholar 

  14. Hubert, M., Vandervieren, E.: An adjusted boxplot for skewed distributions. Comput. Statist. Data Anal. 52, 5186–5201 (2008). https://doi.org/10.1016/j.csda.2007.11.008

    Article  MathSciNet  MATH  Google Scholar 

  15. Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: 31st International Conference on Neural Information Processing Systems, pp. 3149–3157. NIPS 2017, Curran Associates Inc. (2017)

    Google Scholar 

  16. Kruschke, J., Liddell, T.: The Bayesian new statistics: two historical trends converge. SSRN Electron. J. (2015). https://doi.org/10.2139/ssrn.2606016

    Article  Google Scholar 

  17. Moniz, N.: Prediction and Ranking of Highly Popular Web Content. Ph.D. thesis, Faculty of Sciences, University of Porto (2017)

    Google Scholar 

  18. Moniz, N., Monteiro, H.: No free lunch in imbalanced learning. Knowl. Based Syst. 227, 107222 (2021). https://doi.org/10.1016/j.knosys.2021.107222

    Article  Google Scholar 

  19. Moniz, N., Ribeiro, R., Cerqueira, V., Chawla, N.: SMOTEBoost for regression: improving the prediction of extreme values. In: IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), pp. 150–159 (2018). https://doi.org/10.1109/DSAA.2018.00025

  20. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2020). https://www.R-project.org/

  21. Rengasamy, D., Rothwell, B., Figueredo, G.P.: Asymmetric loss functions for deep learning early predictions of remaining useful life in aerospace gas turbine engines. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2020). https://doi.org/10.1109/IJCNN48605.2020.9207051

  22. Ribeiro, R., Moniz, N.: Imbalanced regression and extreme value prediction. Mach. Learn. 109, 1–33 (2020). https://doi.org/10.1007/s10994-020-05900-9

  23. Shi, Y., et al.: LightGBM: Light Gradient Boosting Machine (2022). https://CRAN.R-project.org/package=lightgbm

  24. Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007). https://doi.org/10.1016/j.patcog.2007.04.009

    Article  MATH  Google Scholar 

  25. Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: SMOTE for regression. In: Correia, L., Reis, L.P., Cascalho, J. (eds.) EPIA 2013. LNCS (LNAI), vol. 8154, pp. 378–389. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40669-0_33

    Chapter  Google Scholar 

  26. Torgo, L., Ribeiro, R.: Utility-based regression. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 597–604. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74976-9_63

    Chapter  Google Scholar 

  27. Varian, H.R.: A bayesian approach to real estate assessment. Studies in Bayesian Econometric and Statistics in Honor of Leonard J. Savage, pp. 195–208 (1975)

    Google Scholar 

  28. Yang, Y., Zha, K., Chen, Y., Wang, H., Katabi, D.: Delving into deep imbalanced regression. CoRR abs/2102.09554 (2021). arXiv:abs/2102.09554

Download references

Acknowledgements

This work was supported by the CHIST-ERA grant CHIST-ERA-19-XAI-012, and project CHIST-ERA/0004/2019 funded by FCT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aníbal Silva .

Editor information

Editors and Affiliations

Appendices

A SERA numerical approximation

SERA and its derivatives are approximated by the trapezoidal rule with a uniform grid of T equally spaced intervals with a step of 0.001, as follows.

$$\begin{aligned} \begin{aligned} SERA&= \int _0^1 SER_{t}\, dt \\&\approx \frac{1}{2T} \sum _{k=1}^T \left( SER_{t_{k-1}} + SER_{t_{k}}\right) \\&= \frac{1}{2T} \left( SER_{t_{0}} + 2SER_{t_{1}} +...+ 2SER_{t_{T-1}} + SER_{t_{T}}\right) \\&= \frac{1}{T} \bigg (\sum _{k=1}^{T-1} SER_{t_{k}} + \frac{SER_{t_{0}} + SER_{t_{T}}}{2}\bigg ) \\&= \frac{1}{T} \bigg (\frac{1}{2}\!\sum _{y_i \in \mathcal {D}^{t_0}}(\hat{y}_{i} - y_i)^2 +\!\sum _{y_i \in \mathcal {D}^{t_1}}(\hat{y}_{i} - y_i)^2 + ... \\&\qquad +\!\sum _{y_i \in \mathcal {D}^{t_{T-1}}}(\hat{y}_{i} - y_i)^2 + \frac{1}{2}\!\sum _{y_i \in \mathcal {D}^{t_T}}(\hat{y}_{i} - y_i)^2 \bigg ) \end{aligned} \end{aligned}$$
(9)

Similarly, the derivative of SERA w.r.t. a given prediction \(\hat{y}_j\) is obtained by

$$\begin{aligned} \begin{aligned} \frac{\partial SERA}{\partial \hat{y}_j}&\approx \frac{1}{T}\frac{\partial }{\partial \hat{y}_j} \bigg (\frac{1}{2}\!\sum _{y_i \in \mathcal {D}^{t_0}}(\hat{y}_{i} - y_i)^2 +\!\sum _{y_i \in \mathcal {D}^{t_1}}(\hat{y}_{i} - y_i)^2 + ...\\ {}&\qquad +\!\sum _{y_i \in \mathcal {D}^{t_{T-1}}}(\hat{y}_{i} - y_i)^2 + \frac{1}{2}\!\sum _{y_i \in \mathcal {D}^{t_T}}(\hat{y}_{i} - y_i)^2 \bigg )\\&=\frac{1}{T} \bigg (\sum _{y_i \in \mathcal {D}^{t_0}} (\hat{y}_i - y_i)\delta _{ij} + 2\sum _{y_i \in \mathcal {D}^{t_1}} (\hat{y}_i - y_i)\delta _{ij} + ... \\&\qquad +\!\sum _{y_i \in \mathcal {D}^{t_{T-1}}} (\hat{y}_i - y_i)\delta _{ij} + \frac{1}{2}\!\sum _{y_i \in \mathcal {D}^{t_T}} (\hat{y}_i - y_i)\delta _{ij}\bigg ) \\&= \frac{1}{T} \bigg ((\hat{y}_j - y_j) \bigg |_{y_j \in \mathcal {D}^{t_0}} + 2(\hat{y}_j - y_j) \bigg |_{y_j \in \mathcal {D}^{t_1}} + ... \\&\qquad + 2(\hat{y}_j - y_j) \bigg |_{y_j \in \mathcal {D}^{t_{T-1}}} + (\hat{y}_j - y_j) \bigg |_{y_j \in \mathcal {D}^{t_T}} \bigg ) \\&= \frac{1}{T} \bigg ((\hat{y}_j - y_j) \bigg |_{y_j \in \mathcal {D}^{t_0}} + 2\sum _{k=1}^{T-1} (\hat{y}_j - y_j) \bigg |_{y_j \in \mathcal {D}^{t_k}} + (\hat{y}_j - y_j) \bigg |_{y_j \in \mathcal {D}^{t_T}} \bigg ) \\&= \frac{1}{T} \bigg (\boldsymbol{1}\left( y_j \in \mathcal {D}^{t_0}\right) + 2\sum _{k=1}^{T-1} \boldsymbol{1}\left( y_j \in \mathcal {D}^{t_k}\right) + \boldsymbol{1}\left( y_j \in \mathcal {D}^{t_T} \right) \bigg ) (\hat{y}_j - y_j) \end{aligned} \end{aligned}$$
(10)

Note that any given instance \(y_j\) will always have at least zero relevance, i.e. \(\phi (y_j) \ge 0\), so the first term of Eq. (10) will always be taken into account. Nevertheless, not all the summation terms will be considered for cases where \(\phi (y_j) < 1\). With this in mind, we define

$$\begin{aligned} n_j = \sum _{k=1}^{K_j} \boldsymbol{1}\left( y_j \in \mathcal {D}^{t_k}\right) ~~ \end{aligned}$$
(11)

where \(n_j\) is the number of times the instance \(y_j\) contributes to SERA derivative, where \(K_j \in \left[ 1, T-1\right] \). Equation (10) becomes then

$$\begin{aligned} \frac{\partial SERA}{\partial \hat{y}_j} \approx \frac{1}{T} \left( 1 + 2 n_j + \boldsymbol{1}\left( y_j \in \mathcal {D}^{t_T} \right) \right) (\hat{y}_j - y_j)~~ \end{aligned}$$
(12)

In this context, the second derivative for a given prediction \(\hat{y}_j\) is obtained by

$$\begin{aligned} \frac{\partial ^2 SERA}{\partial \hat{y}_j^2} \approx \frac{1}{T} \left( 1 + 2 n_j + \boldsymbol{1}\left( y_j \in \mathcal {D}^{t_T} \right) \right) ~~ \end{aligned}$$
(13)

We now proceed to a study on the degree of error committed by using the approximations above (Eqs. (12), (13)) against the use of the trapezoidal rule directly on Eqs. (4) and (5). For that, we resort to the predictions obtained for \(\text {XGBoost}^{S}\). The rationale is the following: 1) for each data set, we compute the first and second derivatives using both methods; 2) we calculate the absolute difference between the results obtained from both methods; 3) we average these differences over all instances.

The results obtained using this evaluation are depicted in the left box plot of Fig. 5.

Fig. 5.
figure 5

Left: Absolute error differences for the first and second derivatives between our approximations and the trapezoidal rule. Right: Execution time (in seconds) of the first and second derivative for a given data set, under our approximation and the trapezoidal rule.

Results show that both first and second derivative approximations have a minor error (\(\approx 10^{-12}\)). Sim On the right box plot of Fig. 5, we also show the difference in execution time using both methods for each data set. Here, the execution time is measured as the time taken to evaluate both the first and second derivatives. As we can see, there is a non-negligible difference between our approximation and the trapezoidal rule as the size of a data set increases.

B Tables of Results

In this section, we report the MSE and SERA results obtained in out-of-sample of each data set.

Table 3. MSE results in out-of-sample, with the best models per data set in bold. Model (#wins): \(\text {XGBoost}^{M}\) (27), \(\text {XGBoost}^{S}\) (2), \(\text {LGBM}^{M}\) (6), \(\text {LGBM}^{S}\) (1).
Table 4. SERA results in out-of-sample, with the best models per data set in bold. Model (#wins): \(\text {XGBoost}^{M}\) (16), \(\text {XGBoost}^{S}\) (14), \(\text {LGBM}^{S}\) (5), \(\text {LGBM}^{M}\) (1).

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Silva, A., Ribeiro, R.P., Moniz, N. (2022). Model Optimization in Imbalanced Regression. In: Pascal, P., Ienco, D. (eds) Discovery Science. DS 2022. Lecture Notes in Computer Science(), vol 13601. Springer, Cham. https://doi.org/10.1007/978-3-031-18840-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-18840-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-18839-8

  • Online ISBN: 978-3-031-18840-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics