Model Optimization in Imbalanced Regression

Silva, Aníbal; Ribeiro, Rita P.; Moniz, Nuno

doi:10.1007/978-3-031-18840-4_1

Aníbal Silva⁹,
Rita P. Ribeiro^9,10 &
Nuno Moniz^9,10

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13601))

Included in the following conference series:

International Conference on Discovery Science

1063 Accesses
2 Citations

Abstract

Imbalanced domain learning aims to produce accurate models in predicting instances that, though underrepresented, are of utmost importance for the domain. Research in this field has been mainly focused on classification tasks. Comparatively, the number of studies carried out in the context of regression tasks is negligible. One of the main reasons for this is the lack of loss functions capable of focusing on minimizing the errors of extreme (rare) values. Recently, an evaluation metric was introduced: Squared Error Relevance Area (SERA). This metric posits a bigger emphasis on the errors committed at extreme values while also accounting for the performance in the overall target variable domain, thus preventing severe bias. However, its effectiveness as an optimization metric is unknown. In this paper, our goal is to study the impacts of using SERA as an optimization criterion in imbalanced regression tasks. Using gradient boosting algorithms as proof of concept, we perform an experimental study with 36 data sets of different domains and sizes. Results show that models that used SERA as an objective function are practically better than the models produced by their respective standard boosting algorithms at the prediction of extreme values. This confirms that SERA can be embedded as a loss function into optimization-based learning algorithms for imbalanced regression scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/anibalsilva1/IRModelOptimization.

References

Benavoli, A., Corani, G., Demšar, J., Zaffalon, M.: Time for a change: a tutorial for comparing multiple classifiers through bayesian analysis. J. Mach. Learn. Res. 18(77), 1–36 (2017)
MathSciNet MATH Google Scholar
Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2), 1–50 (2016). https://doi.org/10.1145/2907070
Branco, P., Torgo, L., Ribeiro, R.P.: Pre-processing approaches for imbalanced distributions in regression. Neurocomputing 343, 76–99 (2019). https://doi.org/10.1016/j.neucom.2018.11.100
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002)
MATH Google Scholar
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39804-2_12
Chapter Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 785–794. ACM (2016). https://doi.org/10.1145/2939672.2939785
Chen, T., et al.: XGBoost: Extreme Gradient Boosting (2022). https://CRAN.R-project.org/package=xgboost
Christoffersen, P.F., Diebold, F.X.: Further results on forecasting and model selection under asymmetric loss. J. Appl. Economet. 11(5), 561–571 (1996)
Article Google Scholar
Ehrig, L., Atzberger, D., Hagedorn, B., Klimke, J., Döllner, J.: Customizable asymmetric loss functions for machine learning-based predictive maintenance. In: 2020 8th International Conference on Condition Monitoring and Diagnosis (CMD), pp. 250–253 (2020). https://doi.org/10.1109/CMD48350.2020.9287246
Elkan, C.: The foundations of cost-sensitive learning. In: 17th International Conference on Artificial Intelligence, vol. 1, pp. 973–978 (2001)
Google Scholar
Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: AdaCost: misclassification cost-sensitive boosting. In: 16th International Conference on Machine Learning, pp. 97–105. ICML 1999. Morgan Kaufmann Publishers Inc. (1999)
Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Statist. 29(5), 1189–1232 (2001). http://www.jstor.org/stable/2699986
Granger, C.W.J.: Outline of forecast theory using generalized cost functions. SpanEconRev 1(2), 161–173 (1999). https://doi.org/10.1007/s101080050007
Article Google Scholar
Hubert, M., Vandervieren, E.: An adjusted boxplot for skewed distributions. Comput. Statist. Data Anal. 52, 5186–5201 (2008). https://doi.org/10.1016/j.csda.2007.11.008
Article MathSciNet MATH Google Scholar
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: 31st International Conference on Neural Information Processing Systems, pp. 3149–3157. NIPS 2017, Curran Associates Inc. (2017)
Google Scholar
Kruschke, J., Liddell, T.: The Bayesian new statistics: two historical trends converge. SSRN Electron. J. (2015). https://doi.org/10.2139/ssrn.2606016
Article Google Scholar
Moniz, N.: Prediction and Ranking of Highly Popular Web Content. Ph.D. thesis, Faculty of Sciences, University of Porto (2017)
Google Scholar
Moniz, N., Monteiro, H.: No free lunch in imbalanced learning. Knowl. Based Syst. 227, 107222 (2021). https://doi.org/10.1016/j.knosys.2021.107222
Article Google Scholar
Moniz, N., Ribeiro, R., Cerqueira, V., Chawla, N.: SMOTEBoost for regression: improving the prediction of extreme values. In: IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), pp. 150–159 (2018). https://doi.org/10.1109/DSAA.2018.00025
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2020). https://www.R-project.org/
Rengasamy, D., Rothwell, B., Figueredo, G.P.: Asymmetric loss functions for deep learning early predictions of remaining useful life in aerospace gas turbine engines. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2020). https://doi.org/10.1109/IJCNN48605.2020.9207051
Ribeiro, R., Moniz, N.: Imbalanced regression and extreme value prediction. Mach. Learn. 109, 1–33 (2020). https://doi.org/10.1007/s10994-020-05900-9
Shi, Y., et al.: LightGBM: Light Gradient Boosting Machine (2022). https://CRAN.R-project.org/package=lightgbm
Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007). https://doi.org/10.1016/j.patcog.2007.04.009
Article MATH Google Scholar
Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: SMOTE for regression. In: Correia, L., Reis, L.P., Cascalho, J. (eds.) EPIA 2013. LNCS (LNAI), vol. 8154, pp. 378–389. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40669-0_33
Chapter Google Scholar
Torgo, L., Ribeiro, R.: Utility-based regression. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 597–604. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74976-9_63
Chapter Google Scholar
Varian, H.R.: A bayesian approach to real estate assessment. Studies in Bayesian Econometric and Statistics in Honor of Leonard J. Savage, pp. 195–208 (1975)
Google Scholar
Yang, Y., Zha, K., Chen, Y., Wang, H., Katabi, D.: Delving into deep imbalanced regression. CoRR abs/2102.09554 (2021). arXiv:abs/2102.09554

Download references

Acknowledgements

This work was supported by the CHIST-ERA grant CHIST-ERA-19-XAI-012, and project CHIST-ERA/0004/2019 funded by FCT.

Author information

Authors and Affiliations

Faculty of Sciences - University of Porto, Porto, Portugal
Aníbal Silva, Rita P. Ribeiro & Nuno Moniz
INESC TEC, Porto, Portugal
Rita P. Ribeiro & Nuno Moniz

Authors

Aníbal Silva
View author publications
You can also search for this author in PubMed Google Scholar
Rita P. Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Nuno Moniz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aníbal Silva .

Editor information

Editors and Affiliations

University of Montpellier, Montpellier, France
Poncelet Pascal
INRAE, Montpellier, France
Dino Ienco

Appendices

A SERA numerical approximation

SERA and its derivatives are approximated by the trapezoidal rule with a uniform grid of T equally spaced intervals with a step of 0.001, as follows.

$$\begin{aligned} \begin{aligned} SERA&= \int _0^1 SER_{t}\, dt \\&\approx \frac{1}{2T} \sum _{k=1}^T \left( SER_{t_{k-1}} + SER_{t_{k}}\right) \\&= \frac{1}{2T} \left( SER_{t_{0}} + 2SER_{t_{1}} +...+ 2SER_{t_{T-1}} + SER_{t_{T}}\right) \\&= \frac{1}{T} \bigg (\sum _{k=1}^{T-1} SER_{t_{k}} + \frac{SER_{t_{0}} + SER_{t_{T}}}{2}\bigg ) \\&= \frac{1}{T} \bigg (\frac{1}{2}\!\sum _{y_i \in \mathcal {D}^{t_0}}(\hat{y}_{i} - y_i)^2 +\!\sum _{y_i \in \mathcal {D}^{t_1}}(\hat{y}_{i} - y_i)^2 + ... \\&\qquad +\!\sum _{y_i \in \mathcal {D}^{t_{T-1}}}(\hat{y}_{i} - y_i)^2 + \frac{1}{2}\!\sum _{y_i \in \mathcal {D}^{t_T}}(\hat{y}_{i} - y_i)^2 \bigg ) \end{aligned} \end{aligned}$$

(9)

Similarly, the derivative of SERA w.r.t. a given prediction $\hat{y}_j$ is obtained by

$$\begin{aligned} \begin{aligned} \frac{\partial SERA}{\partial \hat{y}_j}&\approx \frac{1}{T}\frac{\partial }{\partial \hat{y}_j} \bigg (\frac{1}{2}\!\sum _{y_i \in \mathcal {D}^{t_0}}(\hat{y}_{i} - y_i)^2 +\!\sum _{y_i \in \mathcal {D}^{t_1}}(\hat{y}_{i} - y_i)^2 + ...\\ {}&\qquad +\!\sum _{y_i \in \mathcal {D}^{t_{T-1}}}(\hat{y}_{i} - y_i)^2 + \frac{1}{2}\!\sum _{y_i \in \mathcal {D}^{t_T}}(\hat{y}_{i} - y_i)^2 \bigg )\\&=\frac{1}{T} \bigg (\sum _{y_i \in \mathcal {D}^{t_0}} (\hat{y}_i - y_i)\delta _{ij} + 2\sum _{y_i \in \mathcal {D}^{t_1}} (\hat{y}_i - y_i)\delta _{ij} + ... \\&\qquad +\!\sum _{y_i \in \mathcal {D}^{t_{T-1}}} (\hat{y}_i - y_i)\delta _{ij} + \frac{1}{2}\!\sum _{y_i \in \mathcal {D}^{t_T}} (\hat{y}_i - y_i)\delta _{ij}\bigg ) \\&= \frac{1}{T} \bigg ((\hat{y}_j - y_j) \bigg |_{y_j \in \mathcal {D}^{t_0}} + 2(\hat{y}_j - y_j) \bigg |_{y_j \in \mathcal {D}^{t_1}} + ... \\&\qquad + 2(\hat{y}_j - y_j) \bigg |_{y_j \in \mathcal {D}^{t_{T-1}}} + (\hat{y}_j - y_j) \bigg |_{y_j \in \mathcal {D}^{t_T}} \bigg ) \\&= \frac{1}{T} \bigg ((\hat{y}_j - y_j) \bigg |_{y_j \in \mathcal {D}^{t_0}} + 2\sum _{k=1}^{T-1} (\hat{y}_j - y_j) \bigg |_{y_j \in \mathcal {D}^{t_k}} + (\hat{y}_j - y_j) \bigg |_{y_j \in \mathcal {D}^{t_T}} \bigg ) \\&= \frac{1}{T} \bigg (\boldsymbol{1}\left( y_j \in \mathcal {D}^{t_0}\right) + 2\sum _{k=1}^{T-1} \boldsymbol{1}\left( y_j \in \mathcal {D}^{t_k}\right) + \boldsymbol{1}\left( y_j \in \mathcal {D}^{t_T} \right) \bigg ) (\hat{y}_j - y_j) \end{aligned} \end{aligned}$$

(10)

Note that any given instance $y_j$ will always have at least zero relevance, i.e. $\phi (y_j) \ge 0$, so the first term of Eq. (10) will always be taken into account. Nevertheless, not all the summation terms will be considered for cases where $\phi (y_j) < 1$. With this in mind, we define

$$\begin{aligned} n_j = \sum _{k=1}^{K_j} \boldsymbol{1}\left( y_j \in \mathcal {D}^{t_k}\right) ~~ \end{aligned}$$

(11)

where $n_j$ is the number of times the instance $y_j$ contributes to SERA derivative, where $K_j \in \left[ 1, T-1\right] $. Equation (10) becomes then

$$\begin{aligned} \frac{\partial SERA}{\partial \hat{y}_j} \approx \frac{1}{T} \left( 1 + 2 n_j + \boldsymbol{1}\left( y_j \in \mathcal {D}^{t_T} \right) \right) (\hat{y}_j - y_j)~~ \end{aligned}$$

(12)

In this context, the second derivative for a given prediction $\hat{y}_j$ is obtained by

$$\begin{aligned} \frac{\partial ^2 SERA}{\partial \hat{y}_j^2} \approx \frac{1}{T} \left( 1 + 2 n_j + \boldsymbol{1}\left( y_j \in \mathcal {D}^{t_T} \right) \right) ~~ \end{aligned}$$

(13)

We now proceed to a study on the degree of error committed by using the approximations above (Eqs. (12), (13)) against the use of the trapezoidal rule directly on Eqs. (4) and (5). For that, we resort to the predictions obtained for $\text {XGBoost}^{S}$. The rationale is the following: 1) for each data set, we compute the first and second derivatives using both methods; 2) we calculate the absolute difference between the results obtained from both methods; 3) we average these differences over all instances.

The results obtained using this evaluation are depicted in the left box plot of Fig. 5.

Results show that both first and second derivative approximations have a minor error ($\approx 10^{-12}$). Sim On the right box plot of Fig. 5, we also show the difference in execution time using both methods for each data set. Here, the execution time is measured as the time taken to evaluate both the first and second derivatives. As we can see, there is a non-negligible difference between our approximation and the trapezoidal rule as the size of a data set increases.

B Tables of Results

In this section, we report the MSE and SERA results obtained in out-of-sample of each data set.

Table 3. MSE results in out-of-sample, with the best models per data set in bold. Model (#wins): $\text {XGBoost}^{M}$ (27), $\text {XGBoost}^{S}$ (2), $\text {LGBM}^{M}$ (6), $\text {LGBM}^{S}$ (1).

Full size table

Table 4. SERA results in out-of-sample, with the best models per data set in bold. Model (#wins): $\text {XGBoost}^{M}$ (16), $\text {XGBoost}^{S}$ (14), $\text {LGBM}^{S}$ (5), $\text {LGBM}^{M}$ (1).

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Silva, A., Ribeiro, R.P., Moniz, N. (2022). Model Optimization in Imbalanced Regression. In: Pascal, P., Ienco, D. (eds) Discovery Science. DS 2022. Lecture Notes in Computer Science(), vol 13601. Springer, Cham. https://doi.org/10.1007/978-3-031-18840-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-18840-4_1
Published: 06 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18839-8
Online ISBN: 978-3-031-18840-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Model Optimization in Imbalanced Regression