Skip to main content

An efficient chaotic salp swarm optimization approach based on ensemble algorithm for class imbalance problems

Abstract

Class imbalance problems have attracted the research community, but a few works have focused on feature selection with imbalanced datasets. To handle class imbalance problems, we developed a novel fitness function for feature selection using the chaotic salp swarm optimization algorithm, an efficient meta-heuristic optimization algorithm that has been successfully used in a wide range of optimization problems. This paper proposes an AdaBoost algorithm with chaotic salp swarm optimization. The most discriminating features are selected using salp swarm optimization, and AdaBoost classifiers are thereafter trained on the features selected. Experiments show the ability of the proposed technique to find the optimal features with performance maximization of AdaBoost.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    https://sci2s.ugr.es/keel/imbalanced.php.

References

  1. Ahmed S, Mafarja M, Faris H, Aljarah I (2018) Feature selection using salp swarm algorithm with chaos. In: Proceedings of the 2nd international conference on intelligent systems, metaheuristics and swarm intelligence. ACM, pp 65–69

  2. Al-Ani A (2005) Feature subset selection using ant colony optimization. Int J Comput Intell 2(1):53–58

  3. Amarendra C, Reddy KH (2019) Pso algorithm support switching pulse sequence isvm for six-phase matrix converter-fed drives. In: Smart intelligent computing and applications. Springer, pp 559–569

  4. Bewoor LA, Chandra Prakash V, Sapkal SU (2017) Evolutionary hybrid particle swarm optimization algorithm for solving np-hard no-wait flow shop scheduling problems. Algorithms 10(4):121

    MathSciNet  MATH  Article  Google Scholar 

  5. Cao P, Li B, Zhao D, Zaiane O (2013) A novel cost sensitive neural network ensemble for multiclass imbalance data learning. In: The 2013 international joint conference on neural networks (IJCNN). IEEE, pp 1–8

  6. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    MATH  Article  Google Scholar 

  7. Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery. Springer, pp 107–119

  8. Chung D, Kim H (2015) Accurate ensemble pruning with pl-bagging. Comput Stat Data Anal 83:1–13

    MathSciNet  MATH  Article  Google Scholar 

  9. Di Martino M, Fernández A, Iturralde P, Lecumberry F (2013) Novel classifier scheme for imbalanced problems. Pattern Recogn Lett 34(10):1146–1151

    Article  Google Scholar 

  10. Dou P, Chen Y (2017) Remote sensing imagery classification using adaboost with a weight vector (wv adaboost). Remote Sens Lett 8(8):733–742

    Article  Google Scholar 

  11. Dwiyanti E, Ardiyanti A et al. (2016) Handling imbalanced data in churn prediction using rusboost and feature selection (case study: Pt. telekomunikasi indonesia regional 7). In: International conference on soft computing and data mining. Springer, pp 376–385

  12. Emary E, Zawbaa HM, Hassanien AE (2016) Binary ant lion approaches for feature selection. Neurocomputing 213:54–65

    Article  Google Scholar 

  13. Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27(8):861–874

    MathSciNet  Article  Google Scholar 

  14. Fiore U (2020) Minority oversampling based on the attraction-repulsion Weber problem. Concurr Comput Pract Exp 32(18):e5601

  15. Fiore U, De Santis A, Perla F, Zanetti P, Palmieri F (2020) Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf Sci 479:448–455

    Article  Google Scholar 

  16. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(4):463–484

    Article  Google Scholar 

  17. Galar M, Fernández A, Barrenechea E, Herrera F (2013) Eusboost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn 46(12):3460–3471

    Article  Google Scholar 

  18. Gao M, Hong X, Chen S, Harris CJ (2011) A combined smote and pso based rbf classifier for two-class imbalanced problems. Neurocomputing 74(17):3456–3466

    Article  Google Scholar 

  19. Haixiang G, Yijing L, Yanan L, Xiao L, Jinling L (2016) Bpso-adaboost-knn ensemble learning algorithm for multi-class imbalanced data classification. Eng Appl Artif Intell 49:176–193

    Article  Google Scholar 

  20. Joshi MV, Kumar V, Agarwal RC (2001) Evaluating boosting algorithms to classify rare classes: comparison and improvements. In: Proceedings 2001 IEEE international conference on data mining. IEEE, pp 257–264

  21. Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: Proceedings of the IEEE international conference on systems, man, and cybernetics. computational cybernetics and simulation, vol 5. IEEE, pp 4104–4108

  22. Li Y, Wang S, Tian Q, Ding X (2015) A boosting approach to exploit instance correlations for multi-instance classification. IEEE Trans Neural Netw Learn Syst 27(12):2740–2747

    Article  Google Scholar 

  23. Li K, Xie P, Liu W, Zha J (2017) An ensemble evolve algorithm for imbalanced data. J Comput Theor Nanosci 14(9):4624–4629

    Article  Google Scholar 

  24. Li L, Wang C, Li W, Chen J (2018) Hyperspectral image classification by adaboost weighted composite kernel extreme learning machines. Neurocomputing 275:1725–1733

    Article  Google Scholar 

  25. Li K, Zhou G, Zhai J, Li F, Shao M (2019) Improved pso\_adaboost ensemble algorithm for imbalanced data. Sensors 19(6):1476

    Article  Google Scholar 

  26. Liu TY (2009) Easyensemble and feature selection for imbalance data sets. In: 2009 International joint conference on bioinformatics. Systems biology and intelligent computing. IEEE, pp 517–520

  27. López V, Fernández A, Del Jesus MJ, Herrera F (2012) Cost sensitive and preprocessing for classification with imbalanced data-sets: similar behaviour and potential hybridizations. In: ICPRAM (2), pp 98–107

  28. López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141

    Article  Google Scholar 

  29. Maldonado S, Weber R, Famili F (2014) Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf Sci 286:228–246

    Article  Google Scholar 

  30. Menardi G, Torelli N (2014) Training and assessing classification rules with imbalanced data. Data Min Knowl Disc 28(1):92–122

    MathSciNet  MATH  Article  Google Scholar 

  31. Mirjalili S, Gandomi AH, Mirjalili SZ, Saremi S, Faris H, Mirjalili SM (2017) Salp swarm algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw 114:163–191

    Article  Google Scholar 

  32. Namassivaya N, Pal S, Ratnam DV (2019) Modelling of fpga-particle swarm optimized gnss receiver for satellite applications. Wirel Pers Commun 106(2):879–895

    Article  Google Scholar 

  33. Nikhath AK, Subrahmanyam K (2019) Feature selection, optimization and clustering strategies of text documents. Int J Electr Comput Eng 9(2):2088–8708

    Google Scholar 

  34. Ogiela L, Ogiela MR (2020) Cognitive security paradigm for cloud computing applications. Concurr Comput Pract Exp 32(8):e5316

    Google Scholar 

  35. Qiaojin G, Libin L, Ning L (2008) Novel modified adaboost algorithm for imbalanced data classification. Comput Eng Appl 44(21):217–221

    Google Scholar 

  36. Ramentol E, Caballero Y, Bello R, Herrera F (2012) Smote-rsb*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Knowl Inf Syst 33(2):245-265

    Article  Google Scholar 

  37. Rekha G, Reddy VK (2018) A novel approach for handling outliers in imbalance data. Int J Eng Technol 7(3.1):1–5

    Article  Google Scholar 

  38. Sayed GI, Khoriba G, Haggag MH (2018) A novel chaotic salp swarm algorithm for global optimization and feature selection. Appl Intell 48(10):3462–3481

    Article  Google Scholar 

  39. Sayed GI, Tharwat A, Hassanien AE (2019) Chaotic dragonfly algorithm: an improved metaheuristic algorithm for feature selection. Appl Intell 49(1):188–205

    Article  Google Scholar 

  40. Schiezaro M, Pedrini H (2013) Data feature selection based on artificial bee colony algorithm. EURASIP J Image Video Process 1:47

    Article  Google Scholar 

  41. Searle SR, Searle S (1987) Linear models for unbalanced data, vol 1987. Wiley, New York

    MATH  Google Scholar 

  42. Sultanpure KA, Reddy LSS (2018) Job scheduling for energy efficiency using artificial bee colony through virtualization. Int J Intell Eng Syst 11(3):138–148

    Google Scholar 

  43. Sun B, Chen H, Wang J, Xie H (2018) Evolutionary under-sampling based bagging ensemble method for imbalanced data classification. Front Comput Sci 12(2):331–350

    Article  Google Scholar 

  44. Thai-Nghe N, Gantner Z, Schmidt-Thieme L (2010) Cost-sensitive learning methods for imbalanced data. In: The 2010 international joint conference on neural networks (IJCNN). IEEE, pp 1–8

  45. Thanathamathee P, Lursinsap C (2013) Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and adaboost techniques. Pattern Recogn Lett 34(12):1339–1347

    Article  Google Scholar 

  46. Thirugnanasambandam K, Prakash S, Subramanian V, Pothula S, Thirumal V (2019) Reinforced cuckoo search algorithm-based multimodal optimization. Appl Intell 49(6):2059–2083

    Article  Google Scholar 

  47. Verikas A, Kalsyte Z, Bacauskiene M, Gelzinis A (2010) Hybrid and ensemble-based soft computing techniques in bankruptcy prediction: a survey. Soft Comput 14(9):995–1010

    Article  Google Scholar 

  48. Viola P, Jones M (2002) Fast and robust classification using asymmetric adaboost and a detector cascade. In: Advances in neural information processing systems, pp 1311–1318

  49. Wang K, Wang Y, Zhao Q, Meng D, Liao X, Xu Z (2019) SPLBoost: an improved robust boosting algorithm based on self-paced learning. IEEE Trans Cybern 51(3):1556–1570

  50. Weiss Y, Elovici Y, Rokach L (2013) The cash algorithm-cost-sensitive attribute selection using histograms. Inf Sci 222:247–268

    MathSciNet  Article  Google Scholar 

  51. Xinwu Y, Zhuang M, Shun Y (2016) Multi-class adaboost algorithm based on the adjusted weak classifier. J Electron Inf Technol 38(2):373–380

    Google Scholar 

  52. Xue B, Zhang M, Browne WN (2012) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671

    Article  Google Scholar 

  53. Yijing L, Haixiang G, Xiao L, Yanan L, Jinling L (2016) Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data. Knowl Based Syst 94:88–104

    Article  Google Scholar 

  54. Yin L, Ge Y, Xiao K, Wang X, Quan X (2013) Feature selection for high-dimensional imbalanced data. Neurocomputing 105:3–11

    Article  Google Scholar 

  55. Zhai J, Zhang S, Zhang M, Liu X (2018) Fuzzy integral-based elm ensemble for imbalanced big data classification. Soft Comput 22(11):3519–3531

    Article  Google Scholar 

  56. Zhang C, Chen Y (2017) Improved piecewise nonlinear combinatorial adaboost algorithm based on noise self-detection. Comput Eng 43:163–168

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Chandrashekar Jatoth.

Ethics declarations

Conflict of interest

Rekha G declares that she has no conflict of interest. Krishna Reddy V declares that he has no conflict of interest. Chandrashekar Jatoth declares that he has no conflict of interest. Ugo Fiore declares that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gillala, R., Vuyyuru, K.R., Jatoth, C. et al. An efficient chaotic salp swarm optimization approach based on ensemble algorithm for class imbalance problems. Soft Comput 25, 14955–14965 (2021). https://doi.org/10.1007/s00500-021-06080-x

Download citation

Keywords

  • Imbalanced data
  • Feature selection
  • Ensemble algorithms
  • Classification
  • Salp swarm algorithm