Cost-Sensitive Learning based on Performance Metric for Imbalanced Data

Aurelio, Yuri Sousa; de Almeida, Gustavo Matheus; de Castro, Cristiano Leite; Braga, Antonio Padua

doi:10.1007/s11063-022-10756-2

Cost-Sensitive Learning based on Performance Metric for Imbalanced Data

Published: 09 February 2022

Volume 54, pages 3097–3114, (2022)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Yuri Sousa Aurelio¹,
Gustavo Matheus de Almeida ORCID: orcid.org/0000-0002-2898-5177¹,
Cristiano Leite de Castro¹ &
…
Antonio Padua Braga¹

439 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Performance metrics are usually evaluated only after the neural network learning process using an error cost function. This procedure can result in suboptimal model selection, particularly for imbalanced classification problems. This work proposes the direct use of these metrics as cost functions, which are often derived from the confusion matrix. Commonly used metrics are covered, namely AUC, G-mean, F1-score and AG-mean. The only implementation change for model training occurs in the backpropagation error term. The results were compared to the standard MLP using the Rprop learning algorithm, SMOTE, SMTTL, WWE and RAMOBoost. Sixteen classical benchmark datasets were used in the experiments. Based on average ranks, the proposed formulation outperformed Rprop and all sampling strategies, namely SMOTE, SMTTL and WWE, for all metrics. These results were statistically confirmed for AUC and G-mean in relation to Rprop. For F1-score and AG-mean, all algorithms were considered statistically equivalent. The proposal was also superior to RAMOBoost for G-mean given average ranks. However, it was statistically faster than RAMOBoost for all metrics. It was also faster than SMTTL and statistically equivalent to Rprop, SMOTE and WWE. More, the solutions obtained are generally non-dominated ones compared to all other techniques, for all metrics. The results showed that the direct use of performance metrics as cost functions for neural network training favors generalization capacity and also computation time in imbalanced classification problems. Its extension to other performance metrics derived directly from the confusion matrix is straightforward.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Supervised Classification Algorithms in Machine Learning: A Survey and Review

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

Article Open access 02 January 2020

Availability of data and material (data transparency):

All datasets used in this work are available in the public UCI Machine Learning Repository (http://archive.ics.uci.edu/ml).

Notes

Still, the post-hoc test was also computed for AG-mean.

References

Castro CL, Braga AP (2013) Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans Neural Netw Learn Syst 24(6):888
Article Google Scholar
Aurelio YS, Almeida GM, Castro CL, Braga AP (2019) Learning from imbalanced data sets with weighted cross-entropy function. Neural Process Lett 50:1937
Article Google Scholar
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73(1):220
Article Google Scholar
Lan J, Hu MY, Patuwo E, Zhang GP (2010) An investigation of neural network classifiers with unequal misclassification costs and group sizes. Decis Support Syst 48(4):582
Article Google Scholar
Thai-Nghe N, Gantner Z, Schmidt-Thieme L (2010) Cost-sensitive learning methods for imbalanced data, in Proc. International Joint Conference on Neural Networks (IEEE, 2010), pp. 1–8
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Know Data Eng 21(9):1263
Article Google Scholar
Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explor Newsl 6(1):1
Article Google Scholar
Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321
Article Google Scholar
Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning, neural and statistical classification. Machine learning neural and statistical classification. Prentice Hall, USA
MATH Google Scholar
Barandela R, Valdovinos RM, Sánchez JS, Ferri FJ (2004) The imbalanced training sample problem: Under or over sampling?, in Structural, Syntactic, and Statistical Pattern Recognition, LNCS, vol. 3138, ed. by A. Fred, T.M. Caelli, R.P.W. Duin, A.C. Campilho, D. de Ridder (Springer, 2004), pp. 806–814
He H, Bai Y, Garcia EA, Li S (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning, in Proc IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) (IEEE, 2008), pp. 1322–1328
Chen S, He H, Garcia EA (2010) RAMOBoost: ranked minority oversampling in boosting. IEEE Trans Neural Netw 21(10):1624
Article Google Scholar
Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358
Article Google Scholar
Tao X, Li Q, Guo W, Ren C, Li C, Liu R, Zou J (2019) Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification. Inform Sci 487:31
Article MathSciNet Google Scholar
Zhang C, Tan KC, Li H, Hong GS (2018) A cost-sensitive deep belief network for imbalanced classification. IEEE Trans Neural Netw Learn Syst 30(1):109
Article Google Scholar
Weiss GM (2004) Mining with rarity: a unifying framework. ACM SIGKDD Explor Newsl 6(1):7
Article Google Scholar
Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of supervised learning performance criteria, in Proc. 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2004), pp. 69–78
Durden JM, Hosking B, Bett BJ, Cline D, Ruhl HA (2021) Automated classification of fauna in seabed photographs: the impact of training and validation dataset size, with considerations for the class imbalance. Prog Oceanogr 196:102612
Article Google Scholar
Langenkämper D, van Kevelaer R, Purser A, Nattkemper TW (2020) Gear-induced concept drift in marine images and its effect on deep learning classification, Frontiers in Marine Science (2020)
Langenkämper D, van Kevelaer R, Nattkemper TW (2019) Strategies for Tackling the Class Imbalance Problem in Marine Image Classification, in Pattern Recognition and Information Forensics (ICPR 2018), vol. 11188, ed. by Z. Zhang, D. Suter, Y. Tian, A.A. Branzan, N. Sidère, E.H. Jair (Springer, 2019), vol. 11188
Mellor A, Boukir S, Haywood A, Jones S (2015) Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin. ISPRS J Photogramm Rem Sens 105:155
Article Google Scholar
Boughorbel S, Jarray F, El-Anbari M (2017) Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PloS one 12(6):e0177678
Article Google Scholar
Chawla NV (2009) Data mining for imbalanced datasets: an overview, Data mining and knowledge discovery handbook pp. 875–886
Gu Q, Zhu L, Cai Z (2009) Evaluation measures of the classification performance of imbalanced data sets, in International symposium on intelligence computation and applications (Springer, 2009), pp. 461–471
Kuncheva LI, Arnaiz-González Á, Díez-Pastor JF, Gunn IA (2019) Instance selection improves geometric mean accuracy: a study on imbalanced data classification. Prog Artif Intell 8(2):215
Article Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861
Article MathSciNet Google Scholar
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: One-sided selection, in Proc 14th International Conference on Machine Learning, vol. 97, pp. 179–186
Pazzani M, Billsus D (1997) Learning and revising user profiles: the identification of interesting web sites. Mach Learn 27:313
Article Google Scholar
Batuwita R, Palade V (2012) Adjusted geometric-mean: a novel performance measure for imbalanced bioinformatics datasets learning. J Bioinform Comput Biol 10(4):1250003
Article Google Scholar
Tomek I (1976) Two modifications of CNN IEEE transactions on systems man and cybernetics. SMC 6(11):769
MathSciNet MATH Google Scholar
Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42:203
Article Google Scholar
Riedmiller M, Braun H (1992) RPROP: A fast adaptive learning algorithm, in Proc. ISCIS VII (1992)
Hong X, Chen S, Harris CJ (2007) A kernel-based two-class classifier for imbalanced data sets. IEEE Trans Neural Netw 18(1):28
Article Google Scholar
Castro CL, Braga AP (2008) Optimization of the Area under the ROC Curve, in Proc. 10th Brazilian Symposium on Neural Networks (IEEE, 2008), pp. 141–146
Rakotomamonjy A (2004) Optimizing area under ROC curves with SVMs, in Proc. 1st International Workshop on ROC Analysis in Artificial Intelligence (2004), pp. 71–80
Yan L, Dodier RH, Mozer MC, Wolniewicz RH (2003) Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic, in Proc. 20th International Conference on Machine Learning (2003), pp. 848–855
Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30:195
Article Google Scholar
Antanasijević J, Antanasijević D, Pocajt V, Trišović N, Fodor-Csorba K (2016) A QSPR study on the liquid crystallinity of five-ring bent-core molecules using decision trees. MARS Artif Neural Netw, RSC Adv 6(22):18452
Google Scholar
Kim HJ, Jo NO, Shin KS (2016) Optimization of cluster-based evolutionary undersampling for the artificial neural networks in corporate bankruptcy prediction. Expert Syst Appl 59:226
Article Google Scholar
Nguyen GH, Bouzerdoum A, Phung SL (2009) Learning pattern classification tasks with imbalanced data sets, Learning pattern classification tasks with imbalanced data sets (2009)
Xu L, Chow M, Timmis J, Taylor LS (2007) Power distribution outage cause identification with imbalanced data using artificial immune recognition system (AIRS) algorithm. IEEE Trans Power Syst 22(1):198
Article Google Scholar
Xu L, Chow MY (2006) A classification approach for power distribution systems fault cause identification. IEEE Trans Power Syst 21(1):53
Article Google Scholar
van Rijsbergen CJ (1979) Information retrieval, Information retrieval. Butterworths, USA
MATH Google Scholar
Hripcsak G, Rothschild AS (2005) Agreement, the F-measure, and reliability in information retrieval. J Am Med Inform Assoc 12(3):296
Article Google Scholar
Sasaki Y (2007) The truth of the F-measure, Teach Tutor Mater (2007)
Joachims T (2005) A support vector method for multivariate performance measures, in Proc. 22nd International Conference on Machine Learning (ACM, 2005), pp. 377–384
Jansche M (2005) Maximum expected F-measure training of logistic regression models, in Proc. Conference on Human Language Technology and Empirical Methods in Natural Language Processing (ACL, 2005), pp. 692–699
Nan Y, Chai KMA, Lee WS, Chieu HL (2012) Optimizing F-measure: a tale of two approaches, in Proc. 29th International Conference on Machine Learning, ed. by J. Langford, J. Pineau (2012), pp. 1555–1562
Dembczynski K, Waegeman W, Cheng W, Hüllermeier E (2011) An exact algorithm for F-measure maximization, in Proc. 24th International Conference on Advances on Neural Information Processing Systems, ed. by J. Shawe-Taylor, R.S. Zemel, P.L. Bartlett, F. Pereira, K.Q. Weinberger (2011), pp. 1404–1412
Batuwita R, Palade V (2009) A new performance measure for class imbalance learning. Application to bioinformatics problems, in Proc. International Conference on Machine Learning and Applications (IEEE, 2009), pp. 545–550
Dua D, Graff C (2019) UCI Machine Learning Repository Uci machine learning repository (2019). http://archive.ics.uci.edu/ml
Trawiński B, Smketek M, Telec Z, Lasota T (2012) Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms. Int J Appl Mathe Comput Sci 22:867
Article MathSciNet Google Scholar
Adnan MN, Ip RH, Bewong M, Islam MZ (2021) BDF: a new decision forest algorithm. Inform Sci 569:687
Article MathSciNet Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Gibbons JD, Chakraborti S (2011) Nonparametric statistical inference, in International Encyclopedia of Statistical Science, ed. by M. Lovric (Springer, 2011), pp. 977–979
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Statist Assoc 32(200):675
Article Google Scholar
Dunn OJ (1961) Multiple comparisons among means. J Am Statist Assoc 56(293):52
Article MathSciNet Google Scholar
Sheskin DJ (2007) Handbook of parametric and nonparametric statistical procedures, Handbook of parametric and nonparametric statistical procedures
Parambath SP, Usunier N, Grandvalet Y (2014) Optimizing F-measures by cost-sensitive classification, in Proc. 27th International Conference on Neural Information Processing Systems, vol. 2, ed. by Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, K.Q. Weinberger (2014), vol. 2, pp. 2123–2131
Kaya E, Korkmaz S, Sahman MA, Cinar AC (2021) DEBOHID: a differential evolution based oversampling approach for highly imbalanced datasets. Expert Syst Appl 169:114482
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the following Brazilian research funding agencies for their nancial support, CNPq (The National Council for Scientic and Technological Development), FAPEMIG (The Minas Gerais Research Foundation) and CAPES (The Coordination for the Improvement of Higher Education Personnel).

Funding

This work was financially supported by the following Brazilian research funding agencies, CNPq (The National Council for Scientific and Technological Development), FAPEMIG (The Minas Gerais Research Foundation) and CAPES (The Coordination for the Improvement of Higher Education Personnel).

Author information

Authors and Affiliations

Federal University of Minas Gerais, Belo Horizonte, MG, Brazil
Yuri Sousa Aurelio, Gustavo Matheus de Almeida, Cristiano Leite de Castro & Antonio Padua Braga

Authors

Yuri Sousa Aurelio
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo Matheus de Almeida
View author publications
You can also search for this author in PubMed Google Scholar
Cristiano Leite de Castro
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Padua Braga
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

(optional: please review the submission guidelines from the journal whether statements are mandatory): The study conception and design were performed by Antonio Padua Braga and Cristiano Leite de Castro. Material preparation, data collection and results generation were carried out by Yuri Sousa Aurelio. Gustavo Matheus de Almeida contributed to the analysis of the results together with the other authors. All contributed and approved the final version of the manuscript.

Corresponding author

Correspondence to Gustavo Matheus de Almeida.

Ethics declarations

Conflicts of interest/Competing interests (include appropriate disclosures):

There are no conflicts of interest in this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aurelio, Y.S., de Almeida, G.M., de Castro, C.L. et al. Cost-Sensitive Learning based on Performance Metric for Imbalanced Data. Neural Process Lett 54, 3097–3114 (2022). https://doi.org/10.1007/s11063-022-10756-2

Download citation

Accepted: 24 January 2022
Published: 09 February 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s11063-022-10756-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cost-Sensitive Learning based on Performance Metric for Imbalanced Data

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Supervised Classification Algorithms in Machine Learning: A Survey and Review

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

Availability of data and material (data transparency):

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest/Competing interests (include appropriate disclosures):

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cost-Sensitive Learning based on Performance Metric for Imbalanced Data

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Supervised Classification Algorithms in Machine Learning: A Survey and Review

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

Availability of data and material (data transparency):

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest/Competing interests (include appropriate disclosures):

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation