Abstract
Class imbalance problems have attracted the research community, but a few works have focused on feature selection with imbalanced datasets. To handle class imbalance problems, we developed a novel fitness function for feature selection using the chaotic salp swarm optimization algorithm, an efficient meta-heuristic optimization algorithm that has been successfully used in a wide range of optimization problems. This paper proposes an AdaBoost algorithm with chaotic salp swarm optimization. The most discriminating features are selected using salp swarm optimization, and AdaBoost classifiers are thereafter trained on the features selected. Experiments show the ability of the proposed technique to find the optimal features with performance maximization of AdaBoost.
Similar content being viewed by others
References
Ahmed S, Mafarja M, Faris H, Aljarah I (2018) Feature selection using salp swarm algorithm with chaos. In: Proceedings of the 2nd international conference on intelligent systems, metaheuristics and swarm intelligence. ACM, pp 65–69
Al-Ani A (2005) Feature subset selection using ant colony optimization. Int J Comput Intell 2(1):53–58
Amarendra C, Reddy KH (2019) Pso algorithm support switching pulse sequence isvm for six-phase matrix converter-fed drives. In: Smart intelligent computing and applications. Springer, pp 559–569
Bewoor LA, Chandra Prakash V, Sapkal SU (2017) Evolutionary hybrid particle swarm optimization algorithm for solving np-hard no-wait flow shop scheduling problems. Algorithms 10(4):121
Cao P, Li B, Zhao D, Zaiane O (2013) A novel cost sensitive neural network ensemble for multiclass imbalance data learning. In: The 2013 international joint conference on neural networks (IJCNN). IEEE, pp 1–8
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery. Springer, pp 107–119
Chung D, Kim H (2015) Accurate ensemble pruning with pl-bagging. Comput Stat Data Anal 83:1–13
Di Martino M, Fernández A, Iturralde P, Lecumberry F (2013) Novel classifier scheme for imbalanced problems. Pattern Recogn Lett 34(10):1146–1151
Dou P, Chen Y (2017) Remote sensing imagery classification using adaboost with a weight vector (wv adaboost). Remote Sens Lett 8(8):733–742
Dwiyanti E, Ardiyanti A et al. (2016) Handling imbalanced data in churn prediction using rusboost and feature selection (case study: Pt. telekomunikasi indonesia regional 7). In: International conference on soft computing and data mining. Springer, pp 376–385
Emary E, Zawbaa HM, Hassanien AE (2016) Binary ant lion approaches for feature selection. Neurocomputing 213:54–65
Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27(8):861–874
Fiore U (2020) Minority oversampling based on the attraction-repulsion Weber problem. Concurr Comput Pract Exp 32(18):e5601
Fiore U, De Santis A, Perla F, Zanetti P, Palmieri F (2020) Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf Sci 479:448–455
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(4):463–484
Galar M, Fernández A, Barrenechea E, Herrera F (2013) Eusboost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn 46(12):3460–3471
Gao M, Hong X, Chen S, Harris CJ (2011) A combined smote and pso based rbf classifier for two-class imbalanced problems. Neurocomputing 74(17):3456–3466
Haixiang G, Yijing L, Yanan L, Xiao L, Jinling L (2016) Bpso-adaboost-knn ensemble learning algorithm for multi-class imbalanced data classification. Eng Appl Artif Intell 49:176–193
Joshi MV, Kumar V, Agarwal RC (2001) Evaluating boosting algorithms to classify rare classes: comparison and improvements. In: Proceedings 2001 IEEE international conference on data mining. IEEE, pp 257–264
Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: Proceedings of the IEEE international conference on systems, man, and cybernetics. computational cybernetics and simulation, vol 5. IEEE, pp 4104–4108
Li Y, Wang S, Tian Q, Ding X (2015) A boosting approach to exploit instance correlations for multi-instance classification. IEEE Trans Neural Netw Learn Syst 27(12):2740–2747
Li K, Xie P, Liu W, Zha J (2017) An ensemble evolve algorithm for imbalanced data. J Comput Theor Nanosci 14(9):4624–4629
Li L, Wang C, Li W, Chen J (2018) Hyperspectral image classification by adaboost weighted composite kernel extreme learning machines. Neurocomputing 275:1725–1733
Li K, Zhou G, Zhai J, Li F, Shao M (2019) Improved pso\_adaboost ensemble algorithm for imbalanced data. Sensors 19(6):1476
Liu TY (2009) Easyensemble and feature selection for imbalance data sets. In: 2009 International joint conference on bioinformatics. Systems biology and intelligent computing. IEEE, pp 517–520
López V, Fernández A, Del Jesus MJ, Herrera F (2012) Cost sensitive and preprocessing for classification with imbalanced data-sets: similar behaviour and potential hybridizations. In: ICPRAM (2), pp 98–107
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Maldonado S, Weber R, Famili F (2014) Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf Sci 286:228–246
Menardi G, Torelli N (2014) Training and assessing classification rules with imbalanced data. Data Min Knowl Disc 28(1):92–122
Mirjalili S, Gandomi AH, Mirjalili SZ, Saremi S, Faris H, Mirjalili SM (2017) Salp swarm algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw 114:163–191
Namassivaya N, Pal S, Ratnam DV (2019) Modelling of fpga-particle swarm optimized gnss receiver for satellite applications. Wirel Pers Commun 106(2):879–895
Nikhath AK, Subrahmanyam K (2019) Feature selection, optimization and clustering strategies of text documents. Int J Electr Comput Eng 9(2):2088–8708
Ogiela L, Ogiela MR (2020) Cognitive security paradigm for cloud computing applications. Concurr Comput Pract Exp 32(8):e5316
Qiaojin G, Libin L, Ning L (2008) Novel modified adaboost algorithm for imbalanced data classification. Comput Eng Appl 44(21):217–221
Ramentol E, Caballero Y, Bello R, Herrera F (2012) Smote-rsb*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Knowl Inf Syst 33(2):245-265
Rekha G, Reddy VK (2018) A novel approach for handling outliers in imbalance data. Int J Eng Technol 7(3.1):1–5
Sayed GI, Khoriba G, Haggag MH (2018) A novel chaotic salp swarm algorithm for global optimization and feature selection. Appl Intell 48(10):3462–3481
Sayed GI, Tharwat A, Hassanien AE (2019) Chaotic dragonfly algorithm: an improved metaheuristic algorithm for feature selection. Appl Intell 49(1):188–205
Schiezaro M, Pedrini H (2013) Data feature selection based on artificial bee colony algorithm. EURASIP J Image Video Process 1:47
Searle SR, Searle S (1987) Linear models for unbalanced data, vol 1987. Wiley, New York
Sultanpure KA, Reddy LSS (2018) Job scheduling for energy efficiency using artificial bee colony through virtualization. Int J Intell Eng Syst 11(3):138–148
Sun B, Chen H, Wang J, Xie H (2018) Evolutionary under-sampling based bagging ensemble method for imbalanced data classification. Front Comput Sci 12(2):331–350
Thai-Nghe N, Gantner Z, Schmidt-Thieme L (2010) Cost-sensitive learning methods for imbalanced data. In: The 2010 international joint conference on neural networks (IJCNN). IEEE, pp 1–8
Thanathamathee P, Lursinsap C (2013) Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and adaboost techniques. Pattern Recogn Lett 34(12):1339–1347
Thirugnanasambandam K, Prakash S, Subramanian V, Pothula S, Thirumal V (2019) Reinforced cuckoo search algorithm-based multimodal optimization. Appl Intell 49(6):2059–2083
Verikas A, Kalsyte Z, Bacauskiene M, Gelzinis A (2010) Hybrid and ensemble-based soft computing techniques in bankruptcy prediction: a survey. Soft Comput 14(9):995–1010
Viola P, Jones M (2002) Fast and robust classification using asymmetric adaboost and a detector cascade. In: Advances in neural information processing systems, pp 1311–1318
Wang K, Wang Y, Zhao Q, Meng D, Liao X, Xu Z (2019) SPLBoost: an improved robust boosting algorithm based on self-paced learning. IEEE Trans Cybern 51(3):1556–1570
Weiss Y, Elovici Y, Rokach L (2013) The cash algorithm-cost-sensitive attribute selection using histograms. Inf Sci 222:247–268
Xinwu Y, Zhuang M, Shun Y (2016) Multi-class adaboost algorithm based on the adjusted weak classifier. J Electron Inf Technol 38(2):373–380
Xue B, Zhang M, Browne WN (2012) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671
Yijing L, Haixiang G, Xiao L, Yanan L, Jinling L (2016) Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data. Knowl Based Syst 94:88–104
Yin L, Ge Y, Xiao K, Wang X, Quan X (2013) Feature selection for high-dimensional imbalanced data. Neurocomputing 105:3–11
Zhai J, Zhang S, Zhang M, Liu X (2018) Fuzzy integral-based elm ensemble for imbalanced big data classification. Soft Comput 22(11):3519–3531
Zhang C, Chen Y (2017) Improved piecewise nonlinear combinatorial adaboost algorithm based on noise self-detection. Comput Eng 43:163–168
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Rekha G declares that she has no conflict of interest. Krishna Reddy V declares that he has no conflict of interest. Chandrashekar Jatoth declares that he has no conflict of interest. Ugo Fiore declares that he has no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gillala, R., Vuyyuru, K.R., Jatoth, C. et al. An efficient chaotic salp swarm optimization approach based on ensemble algorithm for class imbalance problems. Soft Comput 25, 14955–14965 (2021). https://doi.org/10.1007/s00500-021-06080-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-06080-x