Simulated annealing based undersampling (SAUS): a hybrid multi-objective optimization method to tackle class imbalance

Chennuru, Venkata Krishnaveni; Timmappareddy, Sobha Rani

doi:10.1007/s10489-021-02369-4

Simulated annealing based undersampling (SAUS): a hybrid multi-objective optimization method to tackle class imbalance

Published: 04 June 2021

Volume 52, pages 2092–2110, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Venkata Krishnaveni Chennuru ORCID: orcid.org/0000-0003-2209-483X¹ &
Sobha Rani Timmappareddy¹

607 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

Learning from imbalanced datasets is a challenging problem in machine learning research since the performance of the traditional classifiers suffer from biased classification towards the Majority class resulting in a low Minority class prediction rate. The inherent assumptions of equal class distribution and accuracy-driven evaluation are the identified reasons behind this degraded performance. Further, false negatives have higher penalty than the false positives. A simple logical solution to mitigate this issue is to construct a balanced training set from the imbalanced one. However, several such sets of balanced training sets can be formed for a given imbalanced set from which an optimal balanced training set has to be obtained. This is a computationally intractable problem and prone to local-optimal maxima/minima. To address these issues, a Simulated Annealing-based Under Sampling (SAUS) method is proposed. Simulated annealing is a popular meta-heuristic search algorithm, which implements a novel cost function in terms of Balanced Error Rate. This cost function strikes a balance between Sensitivity and Specificity measures while evaluating the solution at each iteration in the subsampling process and also is free from the local trap. The experimental results of SAUS demonstrate that the average Sensitivity measure on the test set has improved from 0.68 to 0.86 and proves its efficacy in tackling the imbalance issue in the dataset. Area Under the ROC Curve (AUC) results also demonstrate that SAUS outperforms several popular undersampling methods. SAUS works on par with state-of-the-art solutions for the class imbalance problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Overlap-Based Undersampling for Improving Imbalanced Data Classification

A Classification Method for Imbalanced Data Based on Ant Lion Optimizer

A Novel Clustering Based Undersampling Algorithm for Imbalanced Data Sets Using Artificial Bee Colony Algorithm

References

Japkowicz N (2000) Learning from imbalanced data sets: A comparison of various strategies. AAAI Technical Report WS-00-05 10–15
Japkowicz N, Stephen S (2002) The class imbalance problem: A systematic study. Intell Data Anal J 6(5):429–450
Article Google Scholar
Monard MC, Batista GEAPA (2002) Learning with skewed class distributions, in advances in logic. Artif Intell Robot 173–180
Barandela R, Sanchez S, Garcia V, Rangel E (2003) Strategies for learning in class imbalance problems. Pattern Recogn 36:849–851
Article Google Scholar
Gustavo EAPA, Prati BRC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. Sigkdd Explor 6(1):20–29
Article Google Scholar
Jo T, Japkowicz N (2004) Class imbalances versus small disjuncts. ACM SIGKDD Explor Newslett 6(1):40–49
Article Google Scholar
Nitesh V (2004) Chawla, data mining for imbalanced datasets: An overview, chapter 40. Data Mining and Knowledge Discovery Handbook 853–867
Visa S, Ralescu A (2005) Issues in mining imbalanced data sets - a review paper, proceedings of the sixteen midwest artificial intelligence and cognitive science conference, MAICS-2005. Dayton 67–73
Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA. IEEE Trans Evolution Comput 12(3):269–283
Article Google Scholar
Amine K (2019) Multiobjective simulated annealing: Principles and algorithm variants Advances in Operations Research, vol. 2019, Article ID 8134674, 13
Garcia V, Sanchez JS, Mollineda RA, Alejo R, Sotoca JM (2007) The class imbalance problem in pattern classification and learning. ISBN:, 978-84-9732-602-5 283–291
Guo X, Yin Y, Dong C, Yang G, Zhou G (2008) On the class imbalance problem, fourth international conference on natural computation. IEEE Computer Society 192–200
Sotoca JM, Sánchez JS, Mollineda RA (2005) A review of data complexity measures and their applicability to pattern classification problems. Actas del III Taller Nacional de Mineria de Datos y Aprendizaje 77–83
Sun Y, Wong AKC, Kamel MS (2009) Classification of imbalanced Data: A review. Int J Pattern Recognit Artif Intell 23(04):687–719
Article Google Scholar
Nguyen GH, Bouzerdoum A, Phung SL (2009) Learning pattern classification tasks with imbalanced data sets. Pattern Recogn 193–208
He Haibo, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Ramyachitra D, Manikandan P (2014) Imbalanced dataset classification and solutions: A review Int J Comput Business Res (IJCBR) 5(4)
Bekkar M, Alitouche TA (2013) Imbalanced data learning approaches review. Int J Data Mining Knowl Manag Process (IJDKP) 3(4):15–33
Article Google Scholar
Kanellopoulos SKD, Pintetas P (2006) Handling imbalanced datasets: A review, GESTS International Transactions On Computer Science And Engineering 30
Jayasree S, Alice Gavya A (2014) Addressing imbalance problem in the class – A survey. Int J Appl Innov Eng Manag (IJAIEM) 03(09):239–243. ISSN 2319-4847
Google Scholar
Krishna Veni CV, Sobha Rani T (2011) On the Classification of Imbalanced Datasets. Int J Comput Sci Technol (IJCST) 2(Spl):145–148
Google Scholar
Hart PE (1968) The condensed nearest neighbor rule, IEEE Transactions on Information Theory, IT-4 515-516
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: One Sided Selection. In: Proceedings of the fourteenth international conference on machine learning. Morgan Kaufmann, Tennesse, pp 179–186
Tomek I (1976) Two modifications of CNN. IEEE Transactions on Systems Man and Communications SMC-6 769–772
Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches to imbalanced data distributions. Expert Syst Appl 36:5718–5727
Article Google Scholar
Laurikkala J (2001) Improving identification of difficult small classes by balancing class distribution, technical report, a-2001-2 university of tampere
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and Hybrid-Based approaches. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(4):463–484. https://doi.org/10.1109/TSMCC.2011.2161285
Article Google Scholar
Chawla NV, Lazarevic A, Hall LO, Kegelmeyer WP (2012) SMOTE: Synthetic minority over-sampling technique. Appl Intell 36(3):664–684
Article Google Scholar
Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning, International Conference on Intelligent computing (ICIC). Lect Notes Comput Sci 3644:878–887
Article Google Scholar
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-SMOTE: Safe-level-synthetic minority over-sampling TEchnique for handling the class imbalanced problem. Procedings of the 13th Pacific Asia conference on advances in knowledge discovery and data mining PAKDD’09 475–482
He H, Bai Y, Garcia EA, Li S (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning: Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN’08) 1322–1328
Wilson DR, Martinez TR (2000) Reduction techniques for Instance-Based learning algorithms. Mach Learn 38:257–286
Article Google Scholar
Yoon K, Kwek S (2005) An unsupervised learning approach to resolving the data imbalance issue in supervised learning problems in functional genomics, Hybrid Fifth International Conference onIntelligent Systems,HIS ’05
Longadge R, Dongre SS, Malik L (2013) Multi-cluster based approach for skewed data in data mining. IOSR-JCE 12(6):66–73
Article Google Scholar
Sobhani P, Viktor H, Matwin S (2014) Learning from imbalanced data using ensemble methods and cluster-based undersampling, Workshop on New Frontiers in Mining Patterns, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD)
Mostafizur Rahman M, Davis DN (2013) Cluster based Under-Sampling for unbalanced cardiovascular data. Proceedings of the World Congress on Engineering Vol III
Wang CY, Hu LL, Guo MZ, Liu XY, Zou Q (2015) imDC:An ensemble learning method for imbalanced classification with miRNA data, Genetics and Molecular research (GMR). Online J 14(1):123–133
Google Scholar
Laith A (2018) Feature selection and enhanced Krill Herd algorithm for text document clustering
Zhang S, Sadaoui S, Mauhoub M (2015) An empirical analysis of imbalanced data classification. Comput Inform Sci 8(1)
Beyan C, Fisher R (2015) Classifying imbalanced data sets using similarity based hierarchical decomposition. Pattern Recogn 48(5):1653–1672
Article Google Scholar
Ng WWY, Hu J, Yeung DS, Yin S, Roli F (2014) Diversified sensitivity-based undersampling for imbalance classification problems. IEEE Transaction on Cybernetics
Barella VH, Costa EP, Carvalho ACPLF (2014) ClusterOSS: A new undersampling method for imbalanced learning
Mostafizur Rahman M, Davis DN (2013) Addressing the class imbalance problem in medical datasets. Int J Machine Learn Comput 3(2)
Manjula M, Seeniselvi T (2015) Ensembles of first order logical decision trees for imbalanced classification problems. Int J Innov Res Comput Commun Eng 3(1)
Garcia S, Fernandez A, Benitez AD, Herrera F (2007) Statistical comparisons by means of Non-Parametric tests: A case study on genetic based machine learning. II Congreso Espanol de Informatica 95–104
Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Machine Intell 4(3):289–300
Google Scholar
Alshomrani S, Bawakid A, Shim SO, Fernandez A, Herrera F (2015) A Proposal for evolutionary fuzzy systems using feature weighting: Dealing with Overlapping in imbalanced datasets. Knowl-Based Syst 73:1–17
Article Google Scholar
Francisco J, Pastor D, Rodriguez JJ, Garcia-Osorio C, Kuncheva LI (2015) Random Balance: Ensembles of variable priors classifiers for imbalanced data. Knowl Based Syst 85:96– 111
Article Google Scholar
Sun Z, Song Q, Zhu X, Sun H, Xu B, Zhou Y (2015) A Novel Ensemble method for classifying imbalanced data. Pattern Recogn 48(5):1623–1637
Article Google Scholar
Blaszczynski J, Stefonowski J (2015) Neighbourhood sampling in bagging for imbalanced data. NeuroComputing 150:529–542
Article Google Scholar
Knight K, Rich E, Nair B (2017) Atificial Intelligence (3e) Tata Mecgrahill
A Comparative Study of Simulated Annealing and Genetic Algorithm for Solving the Travelling Salesman Problem. Adewole A.P, Otubamowo K.Egunjobi T.O International journal of applied information systems (IJAIS)–ISSN : 2249-0868Foundation of computer science FCS, New York, USA, 4(4) (2012)
Learning from imbalanced data (2016) open challenges and future directions, Bartos Krawczyk. Prog Artif Intell 5:221–232
Article Google Scholar
Li J, Fong S, Wong RK, Chu VW (2018) Adaptive multi-objective swarm fusion for imbalanced data classification. Inform Fusion 39:1–24
Article Google Scholar
Czarnowski I, Kędrzejowicz PJ (2019) An Approach to Imbalanced Data Classification Based on Instance Selection and Over-Sampling. ICCCI 2019, LNAI 11683 601–610
Combining random subspace approach with smote oversampling for imbalanced data classification, Pawel Ksieniewicz HAIS 2019, LNAI, 11734 660–673 (2019)
Fernández JC, Carbonero M, Gutiérrez PA et al (2019) Multi-objective evolutionary optimization using the relationship between f1 and accuracy metrics in classification tasks. Appl Intell 49:3447–3463
Article Google Scholar
Ali H, Salleh MNM, Saedudin R, Hussain K, Mushtaq MF (2019) Imbalance class problems in data mining: A review. Indonesian J Electric Eng Comput Sci 14(3):1560–1571
Article Google Scholar
An Improved Oversampling Algorithm Based on the Samples’ Selection Strategy for Classifying Imbalanced Data, Wenhao Xie, Gongqian Liang, Zhonghui Dong, Baoyu Tan,and Baosheng Zhang, Hindawi, Mathematical Problems in Engineering, Article ID 3526539, 13 pages, Volume 2019. imbalanced datasets classification, Safa Abdellatif, Mohamed Ali Ben Hassine, Sadok Ben Yahia,and Amel Bouzeghoub. International conference on current trends in theory and practice of informatics, SOFSEM 2018:Theory and Practice of Computer Science, 569–580 (2018)
A Synthetic neighborhood generation based ensemble learning for the imbalanced data classification, Zhi Chan, Tao Lin, Xin Xia, Hongyan Xu, Sha Ding, Applied Intelligence 48, 2441–2457 (2018)
Maximum Margin of twin spheres machine with pinball loss for imbalanced data classification, Yintian Xu, Qian Wang, Xinying Pang, Ying Tian, Appied Intelligence 48, 23–34 (2018)
Mahmoud K, Youssef I, Andy J (2013) Phishing detection: A literature survey. IEEE Communications Surveys & Tutorials. PP. 1–31
Zheng W, Zhao H (2020) Cost-sensitive hierarchical classification for imbalance classes. Appl Intell 50:2328–2338
Article Google Scholar
Yi P, Guan Y, Zou F, Yao Y, Wang W, Zhu W (2018) Web phishing detection using a deep learning framework, Hindawi, Wireless communications and mobile computing Volume
Das A, Baki S, Aassal AE, Verma R, Dunbar A (2019) SOK: A comprehensive reexamination of Phishing research from the security perspective, IEEE
Kahksha J, Sameen N (2019) Detection of phishing website using machine learning approach, Int Confer Sustain Comput Sci Technol Manag
Aassal AE, Baki S, Das A, Verma RM (2020) An In-Depth Benchmarking and Evaluation of Phishing Detection Research for Security Needs, Special Section on Emerging Approaches to Cyber Security, IEEE Access
UCI Machine learning repository
KEEL data set. http://sci2s.ugr.es/keel

Download references

Author information

Authors and Affiliations

SCIS, University of Hyderabad, Hyderabad, India
Venkata Krishnaveni Chennuru & Sobha Rani Timmappareddy

Authors

Venkata Krishnaveni Chennuru
View author publications
You can also search for this author in PubMed Google Scholar
Sobha Rani Timmappareddy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Venkata Krishnaveni Chennuru.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chennuru, V.K., Timmappareddy, S.R. Simulated annealing based undersampling (SAUS): a hybrid multi-objective optimization method to tackle class imbalance. Appl Intell 52, 2092–2110 (2022). https://doi.org/10.1007/s10489-021-02369-4

Download citation

Accepted: 17 March 2021
Published: 04 June 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10489-021-02369-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simulated annealing based undersampling (SAUS): a hybrid multi-objective optimization method to tackle class imbalance

Abstract

Access this article

Similar content being viewed by others

Overlap-Based Undersampling for Improving Imbalanced Data Classification

A Classification Method for Imbalanced Data Based on Ant Lion Optimizer

A Novel Clustering Based Undersampling Algorithm for Imbalanced Data Sets Using Artificial Bee Colony Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Simulated annealing based undersampling (SAUS): a hybrid multi-objective optimization method to tackle class imbalance

Abstract

Access this article

Similar content being viewed by others

Overlap-Based Undersampling for Improving Imbalanced Data Classification

A Classification Method for Imbalanced Data Based on Ant Lion Optimizer

A Novel Clustering Based Undersampling Algorithm for Imbalanced Data Sets Using Artificial Bee Colony Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation