Skip to main content

Advertisement

Log in

An efficient binary chimp optimization algorithm for feature selection in biomedical data classification

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Accurate classification of high-dimensional biomedical data highly depends on the efficient recognition of the data's main features which can be used to assist diagnose related diseases. However, due to the existence of a large number of irrelevant or redundant features in biomedical data, classification approaches struggle to correctly identify patterns in data without a feature selection algorithm. Feature selection approaches seek to eliminate irrelevant and redundant features to maintain or enhance classification accuracy. In this paper, a new wrapper feature selection method is proposed based on the chimp optimization algorithm (ChOA) for biomedical data classification. The ChOA is a newly proposed metaheuristic algorithm whose capability for solving feature selection problems has not been investigated yet. Two binary variants of the ChoA are introduced for the feature selection problem. In the first approach, two transfer functions (S-shaped and V-shaped) are used to convert the continuous version of ChoA to binary. In addition to the transfer function, the crossover operator is utilized in the second approach to improve the ChOA's exploratory behavior. To validate the efficiency of the proposed approaches, five publicly available high-dimensional biomedical datasets, and a few datasets from different domains such as life, text, and image are employed. The proposed approaches were then compared with six well-known wrapper-based feature selection methods, including multi-objective genetic algorithm (GA), particle swarm optimization (PSO), Bat algorithm (BA), ant colony optimization (ACO), firefly algorithm (FA), and flower pollination (FP) algorithm, as well as two standard filter-based feature selection methods using three different classifiers. The experimental results demonstrate that the proposed approaches can effectively remove the least significant features and improve classification accuracy. The suggested wrapper feature selection techniques also outperform the GA, PSO, BA, ACO, FA, FP, and other existing methods in the terms of the number of selected genes, and classification accuracy in most cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput J 62:441–453. https://doi.org/10.1016/j.asoc.2017.11.006

    Article  Google Scholar 

  2. Tran B, Xue B, Zhang M (2019) Variable-length particle swarm optimization for feature selection on high-dimensional classification. IEEE Trans Evol Comput 23:473–487. https://doi.org/10.1109/TEVC.2018.2869405

    Article  Google Scholar 

  3. Al-Betar MA, Alomari OA, Abu-Romman SM (2020) A TRIZ-inspired bat algorithm for gene selection in cancer classification. Genomics 112:114–126. https://doi.org/10.1016/j.ygeno.2019.09.015

    Article  Google Scholar 

  4. Pashaei E, Yilmaz A, Ozen M, Aydin N (2016) A novel method for splice sites prediction using sequence component and hidden Markov model. In: Proceedings of the annual international conference of the IEEE engineering in medicine and biology society, EMBS. Institute of Electrical and Electronics Engineers Inc., pp 3076–3079

  5. Too J, Mirjalili S (2021) A hyper learning binary dragonfly algorithm for feature selection: a COVID-19 case study. Knowl Based Syst 212:106553. https://doi.org/10.1016/j.knosys.2020.106553

    Article  Google Scholar 

  6. Tabakhi S, Moradi P (2015) Relevance-redundancy feature selection based on ant colony optimization. Pattern Recognit 48:2798–2811. https://doi.org/10.1016/j.patcog.2015.03.020

    Article  Google Scholar 

  7. Bir-Jmel A, Douiri SM, Elbernoussi S (2019) Gene selection via a new hybrid ant colony optimization algorithm for cancer classification in high-dimensional data. Comput Math Methods Med. https://doi.org/10.1155/2019/7828590

    Article  MATH  Google Scholar 

  8. Alomari OA, Khader AT, Al-Betar MA, Awadallah MA (2018) A novel gene selection method using modified MRMR and hybrid bat-inspired algorithm with β-hill climbing. Appl Intell 48:4429–4447. https://doi.org/10.1007/s10489-018-1207-1

    Article  Google Scholar 

  9. Alshamlan HM (2018) Co-ABC: correlation artificial bee colony algorithm for biomarker gene discovery using gene expression profile. Saudi J Biol Sci 25:895–903. https://doi.org/10.1016/j.sjbs.2017.12.012

    Article  Google Scholar 

  10. Alshamlan H, Badr G, Alohali Y (2015) mRMR-ABC: a hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. Biomed Res Int 2015:604910. https://doi.org/10.1155/2015/604910

    Article  Google Scholar 

  11. Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput J 62:203–215. https://doi.org/10.1016/j.asoc.2017.09.038

    Article  Google Scholar 

  12. Li X, Yin M (2013) Multiobjective binary biogeography based optimization for feature selection using gene expression data. IEEE Trans Nanobiosci 12:343–353. https://doi.org/10.1109/TNB.2013.2294716

    Article  Google Scholar 

  13. Shukla AK (2019) Multi-population adaptive genetic algorithm for selection of microarray biomarkers. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04671-2

    Article  Google Scholar 

  14. Zhou Y, Zhang W, Kang J et al (2021) A problem-specific non-dominated sorting genetic algorithm for supervised feature selection. Inf Sci (Ny) 547:841–859. https://doi.org/10.1016/j.ins.2020.08.083

    Article  MathSciNet  MATH  Google Scholar 

  15. Shukla AK, Singh P, Vardhan M (2020) An adaptive inertia weight teaching-learning-based optimization algorithm and its applications. Appl Math Model 77:309–326. https://doi.org/10.1016/j.apm.2019.07.046

    Article  MathSciNet  MATH  Google Scholar 

  16. Dash R (2021) An adaptive harmony search approach for gene selection and classification of high dimensional medical data. J King Saud Univ Comput Inf Sci 33:195–207. https://doi.org/10.1016/j.jksuci.2018.02.013

    Article  Google Scholar 

  17. Venkatasalam K, Rajendran P, Thangavel M (2019) Improving the accuracy of feature selection in big data mining using accelerated flower pollination (AFP) algorithm. J Med Syst 43:1–11. https://doi.org/10.1007/s10916-019-1200-1

    Article  Google Scholar 

  18. Tumuluru P, Ravi B (2018) Chronological grasshopper optimization algorithm-based gene selection and cancer classification. J Adv Res Dyn Control Syst 10:80–94

    Google Scholar 

  19. Srivastava A, Chakrabarti S, Das S, et al (2013) Hybrid firefly based simultaneous gene selection and cancer classification using support vector machines and random forests. In: Advances in intelligent systems and computing. Springer Verlag, pp 485–494

  20. Medjahed SA, Saadi TA, Benyettou A, Ouali M (2017) Kernel-based learning and feature selection analysis for cancer diagnosis. Appl Soft Comput J 51:39–48. https://doi.org/10.1016/j.asoc.2016.12.010

    Article  Google Scholar 

  21. Alomari OA, Makhadmeh SN, Al-Betar MA et al (2021) Gene selection for microarray data classification based on Gray Wolf Optimizer enhanced with TRIZ-inspired operators. Knowl Based Syst 223:107034. https://doi.org/10.1016/j.knosys.2021.107034

    Article  Google Scholar 

  22. Khishe M, Mosavi MR (2020) Chimp optimization algorithm. Exp Syst Appl 149:113338. https://doi.org/10.1016/j.eswa.2020.113338

    Article  Google Scholar 

  23. Khishe M, Mosavi MR (2020) Classification of underwater acoustical dataset using neural network trained by Chimp Optimization Algorithm. Appl Acoust 157:107005. https://doi.org/10.1016/j.apacoust.2019.107005

    Article  Google Scholar 

  24. Kaur M, Kaur R, Singh N, Dhiman G (2021) SChoA: an newly fusion of sine and cosine with chimp optimization algorithm for HLS of datapaths in digital filters and engineering applications. Eng Comput. https://doi.org/10.1007/s00366-020-01233-2

    Article  Google Scholar 

  25. Jia H, Sun K, Zhang W, Leng X (2021) An enhanced chimp optimization algorithm for continuous optimization domains. Complex Intell Syst 1:3. https://doi.org/10.1007/s40747-021-00346-5

    Article  Google Scholar 

  26. Dashtban M, Balafar M (2017) Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts. Genomics 109:91–107. https://doi.org/10.1016/j.ygeno.2017.01.004

    Article  Google Scholar 

  27. Ahmed MS, Shahjaman M, Rana MM, Mollah MNH (2017) Robustification of Naïve Bayes classifier and its application for microarray gene expression data analysis. Biomed Res Int. https://doi.org/10.1155/2017/3020627

    Article  Google Scholar 

  28. Maleki N, Zeinali Y, Niaki STA (2021) A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection. Exp Syst Appl 164:113981. https://doi.org/10.1016/J.ESWA.2020.113981

    Article  Google Scholar 

  29. Tumar I, Hassouneh Y, Turabieh H, Thaher T (2020) Enhanced binary moth flame optimization as a feature selection algorithm to predict software fault prediction. IEEE Access 8:8041–8055. https://doi.org/10.1109/ACCESS.2020.2964321

    Article  Google Scholar 

  30. Agrawal P, Abutarboush HF, Ganesh T, Mohamed AW (2021) Metaheuristic algorithms on feature selection: a survey of one decade of research (2009–2019). IEEE Access 9:26766–26791. https://doi.org/10.1109/ACCESS.2021.3056407

    Article  Google Scholar 

  31. Shukla AK, Tripathi D, Reddy BR, Chandramohan D (2020) A study on metaheuristics approaches for gene selection in microarray data: algorithms, applications and open challenges. Evol Intell 13:309–329

    Article  Google Scholar 

  32. Zhang G, Hou J, Wang J et al (2020) Feature selection for microarray data classification using hybrid information gain and a modified Binary Krill Herd algorithm. Interdiscip Sci Comput Life Sci 12:288–301. https://doi.org/10.1007/s12539-020-00372-w

    Article  Google Scholar 

  33. Wang A, An N, Yang J et al (2017) Wrapper-based gene selection with Markov blanket. Comput Biol Med 81:11–23. https://doi.org/10.1016/j.compbiomed.2016.12.002

    Article  Google Scholar 

  34. Djellali H, Guessoum S, Ghoualmi-Zine N, Layachi S (2017) Fast correlation based filter combined with genetic algorithm and particle swarm on feature selection. In: 2017 5th International conference on electrical engineering - boumerdes, ICEE-B 2017. Institute of Electrical and Electronics Engineers Inc., pp 1–6

  35. Sun L, Zhang XY, Qian YH et al (2019) Joint neighborhood entropy-based gene selection method with fisher score for tumor classification. Appl Intell 49:1245–1259. https://doi.org/10.1007/s10489-018-1320-1

    Article  Google Scholar 

  36. Sun L, Kong X, Xu J et al (2019) A hybrid gene selection method based on relieff and ant colony optimization algorithm for tumor classification. Sci Rep 9:1–14. https://doi.org/10.1038/s41598-019-45223-x

    Article  Google Scholar 

  37. Zhang H, Li L, Luo C et al (2014) Informative gene selection and direct classification of tumor based on chi-square test of pairwise gene interactions. Biomed Res Int. https://doi.org/10.1155/2014/589290

    Article  Google Scholar 

  38. Pashaei E, Pashaei E (2019) Gene selection using intelligent dynamic genetic algorithm and random forest. In: 2019 11th international conference on electrical and electronics engineering (ELECO). pp 470–474

  39. Covões TF, Hruschka ER (2011) Towards improving cluster-based feature selection with a simplified silhouette filter. Inf Sci (Ny) 181:3766–3782. https://doi.org/10.1016/J.INS.2011.04.050

    Article  Google Scholar 

  40. Shukla AK, Singh P, Vardhan M (2018) A two-stage gene selection method for biomarker discovery from microarray data for cancer classification. Chemom Intell Lab Syst 183:47–58. https://doi.org/10.1016/J.CHEMOLAB.2018.10.009

    Article  Google Scholar 

  41. Meyer PE, Schretter C, Bontempi G (2008) Information-theoretic feature selection in microarray data using variable complementarity. IEEE J Sel Top Signal Process 2:261–274. https://doi.org/10.1109/JSTSP.2008.923858

    Article  Google Scholar 

  42. Alomari OA, Khader AT, Al-Betar MA, Abualigah LM (2017) MRMR BA: a hybrid gene selection algorithm for cancer classification. J Theor Appl Inf Technol 95:2610–2618

    Google Scholar 

  43. Pashaei E, Pashaei E (2021) Gene selection using hybrid dragonfly black hole algorithm: a case study on RNA-seq COVID-19 data. Anal Biochem 627:114242. https://doi.org/10.1016/j.ab.2021.114242

    Article  Google Scholar 

  44. Devi Arockia Vanitha C, Devaraj D, Venkatesulu M (2014) Gene expression data classification using Support Vector Machine and mutual information-based gene selection. In: Procedia computer science. Elsevier B.V., pp 13–21

  45. Chen KH, Wang KJM, Tsai ML et al (2014) Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm. BMC Bioinform 15:1–10. https://doi.org/10.1186/1471-2105-15-49

    Article  Google Scholar 

  46. Zeebaree DQ, Haron H, Abdulazeez AM (2018) Gene selection and classification of microarray data using convolutional neural network. In: ICOASE 2018 - International conference on advanced science and engineering. institute of electrical and electronics engineers Inc., pp 145–150

  47. Wang A, An N, Chen G et al (2015) Accelerating wrapper-based feature selection with K-nearest-neighbor. Knowl Based Syst 83:81–91. https://doi.org/10.1016/j.knosys.2015.03.009

    Article  Google Scholar 

  48. Shukla AK, Singh P, Vardhan M (2018) A hybrid gene selection method for microarray recognition. Biocybern Biomed Eng 38:975–991. https://doi.org/10.1016/j.bbe.2018.08.004

    Article  Google Scholar 

  49. Lin HY (2016) Gene discretization based on EM clustering and adaptive sequential forward gene selection for molecular classification. Appl Soft Comput J 48:683–690. https://doi.org/10.1016/j.asoc.2016.07.015

    Article  Google Scholar 

  50. Haq AU, Li J, Memon MH, et al (2019) Heart disease prediction system using model of machine learning and sequential backward selection algorithm for features selection. In: 2019 IEEE 5th international conference for convergence in technology, I2CT 2019. Institute of Electrical and Electronics Engineers Inc.

  51. Beheshti Z (2021) UTF: upgrade transfer function for binary meta-heuristic algorithms. Appl Soft Comput 106:107346. https://doi.org/10.1016/j.asoc.2021.107346

    Article  Google Scholar 

  52. Mirjalili S, Lewis A (2013) S-shaped versus V-shaped transfer functions for binary particle swarm optimization. Swarm Evol Comput 9:1–14. https://doi.org/10.1016/j.swevo.2012.09.002

    Article  Google Scholar 

  53. Hammouri AI, Mafarja M, Al-Betar MA et al (2020) An improved Dragonfly Algorithm for feature selection. Knowl Based Syst 203:106131. https://doi.org/10.1016/j.knosys.2020.106131

    Article  Google Scholar 

  54. Mirjalili S, Zhang H, Mirjalili S et al (2020) A novel U-shaped transfer function for binary particle swarm optimisation. In: Advances in intelligent systems and computing. Springer, pp 241–259

  55. Beheshti Z (2021) A novel x-shaped binary particle swarm optimization. Soft Comput 25:3013–3042. https://doi.org/10.1007/s00500-020-05360-2

    Article  Google Scholar 

  56. Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput J 56:94–106. https://doi.org/10.1016/j.asoc.2017.03.002

    Article  Google Scholar 

  57. Pashaei E, Pashaei E, Aydin N (2019) Gene selection using hybrid binary black hole algorithm and modified binary particle swarm optimization. Genomics 111:669–686. https://doi.org/10.1016/j.ygeno.2018.04.004

    Article  Google Scholar 

  58. Pashaei E, Pashaei E (2020) Gene selection for cancer classification using a new hybrid of binary black hole algorithm. In: The 28th IEEE conference on signal processing and communications applications (SIU2020)

  59. Shukla AK, Singh P, Vardhan M (2019) A new hybrid wrapper TLBO and SA with SVM approach for gene expression data. Inf Sci (Ny) 503:238–254. https://doi.org/10.1016/j.ins.2019.06.063

    Article  MathSciNet  Google Scholar 

  60. Baliarsingh SK, Muhammad K, Bakshi S (2021) SARA: a memetic algorithm for high-dimensional biomedical data. Appl Soft Comput 101:107009. https://doi.org/10.1016/j.asoc.2020.107009

    Article  Google Scholar 

  61. Nagpal S, Arora S, Dey S, Shreya S (2017) Feature selection using gravitational search algorithm for biomedical data. In: Procedia Computer Science. Elsevier B.V., pp 258–265

  62. Tuba E, Strumberger I, Bezdan T et al (2019) Classification and feature selection method for medical datasets by brain storm optimization algorithm and support vector machine. In: Procedia Computer Science. Elsevier B.V., pp 307–315

  63. Chaudhuri A, Sahu TP (2021) A hybrid feature selection method based on Binary Jaya algorithm for micro-array data classification. Comput Electr Eng 90:106963. https://doi.org/10.1016/j.compeleceng.2020.106963

    Article  Google Scholar 

  64. Sharifai GA, Zainol Z (2020) Feature selection for high-dimensional and imbalanced biomedical data based on robust correlation based redundancy and binary grasshopper optimization algorithm. Genes (Basel) 11:1–26. https://doi.org/10.3390/genes11070717

    Article  Google Scholar 

  65. Coleto-Alcudia V, Vega-Rodríguez MA (2020) Artificial Bee Colony algorithm based on Dominance (ABCD) for a hybrid gene selection method. Knowl Based Syst 205:106323. https://doi.org/10.1016/j.knosys.2020.106323

    Article  Google Scholar 

  66. Wang H, Jing X, Niu B (2017) A discrete bacterial algorithm for feature selection in classification of microarray gene expression cancer data. Knowl Based Syst 126:8–19. https://doi.org/10.1016/j.knosys.2017.04.004

    Article  Google Scholar 

  67. Alzaqebah M, Briki K, Alrefai N et al (2021) Memory based cuckoo search algorithm for feature selection of gene expression dataset. Informat Med Unlocked 24:100572. https://doi.org/10.1016/J.IMU.2021.100572

    Article  Google Scholar 

  68. Dabba A, Tari A, Meftali S (2021) Hybridization of Moth flame optimization algorithm and quantum computing for gene selection in microarray data. J Ambient Intell Humaniz Comput 12:2731–2750. https://doi.org/10.1007/s12652-020-02434-9

    Article  Google Scholar 

  69. Hall M, Frank E, Holmes G et al (2009) The WEKA data mining software. ACM SIGKDD Explor Newsl 11:10–18. https://doi.org/10.1145/1656274.1656278

    Article  Google Scholar 

  70. Chouhan SS, Kaul A, Singh UP, Jain S (2018) Bacterial foraging optimization based radial basis function neural network (BRBFNN) for identification and classification of plant leaf diseases: an automatic approach towards plant pathology. IEEE Access 6:8852–8863. https://doi.org/10.1109/ACCESS.2018.2800685

    Article  Google Scholar 

  71. Qi C, Diao J, Qiu L (2019) On estimating model in feature selection with cross-validation. IEEE Access 7:33454–33463. https://doi.org/10.1109/ACCESS.2019.2892062

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Elnaz Pashaei and Elham Pashaei designed the model and the computational framework. Both carried out the implementation and performed the experiment and wrote the manuscript.

Corresponding author

Correspondence to Elham Pashaei.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any data, or other information from studies or experimentation, with the involvement of human or animal subjects.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pashaei, E., Pashaei, E. An efficient binary chimp optimization algorithm for feature selection in biomedical data classification. Neural Comput & Applic 34, 6427–6451 (2022). https://doi.org/10.1007/s00521-021-06775-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06775-0

Keywords

Navigation