Abstract
Accurate classification of high-dimensional biomedical data highly depends on the efficient recognition of the data's main features which can be used to assist diagnose related diseases. However, due to the existence of a large number of irrelevant or redundant features in biomedical data, classification approaches struggle to correctly identify patterns in data without a feature selection algorithm. Feature selection approaches seek to eliminate irrelevant and redundant features to maintain or enhance classification accuracy. In this paper, a new wrapper feature selection method is proposed based on the chimp optimization algorithm (ChOA) for biomedical data classification. The ChOA is a newly proposed metaheuristic algorithm whose capability for solving feature selection problems has not been investigated yet. Two binary variants of the ChoA are introduced for the feature selection problem. In the first approach, two transfer functions (S-shaped and V-shaped) are used to convert the continuous version of ChoA to binary. In addition to the transfer function, the crossover operator is utilized in the second approach to improve the ChOA's exploratory behavior. To validate the efficiency of the proposed approaches, five publicly available high-dimensional biomedical datasets, and a few datasets from different domains such as life, text, and image are employed. The proposed approaches were then compared with six well-known wrapper-based feature selection methods, including multi-objective genetic algorithm (GA), particle swarm optimization (PSO), Bat algorithm (BA), ant colony optimization (ACO), firefly algorithm (FA), and flower pollination (FP) algorithm, as well as two standard filter-based feature selection methods using three different classifiers. The experimental results demonstrate that the proposed approaches can effectively remove the least significant features and improve classification accuracy. The suggested wrapper feature selection techniques also outperform the GA, PSO, BA, ACO, FA, FP, and other existing methods in the terms of the number of selected genes, and classification accuracy in most cases.
Similar content being viewed by others
References
Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput J 62:441–453. https://doi.org/10.1016/j.asoc.2017.11.006
Tran B, Xue B, Zhang M (2019) Variable-length particle swarm optimization for feature selection on high-dimensional classification. IEEE Trans Evol Comput 23:473–487. https://doi.org/10.1109/TEVC.2018.2869405
Al-Betar MA, Alomari OA, Abu-Romman SM (2020) A TRIZ-inspired bat algorithm for gene selection in cancer classification. Genomics 112:114–126. https://doi.org/10.1016/j.ygeno.2019.09.015
Pashaei E, Yilmaz A, Ozen M, Aydin N (2016) A novel method for splice sites prediction using sequence component and hidden Markov model. In: Proceedings of the annual international conference of the IEEE engineering in medicine and biology society, EMBS. Institute of Electrical and Electronics Engineers Inc., pp 3076–3079
Too J, Mirjalili S (2021) A hyper learning binary dragonfly algorithm for feature selection: a COVID-19 case study. Knowl Based Syst 212:106553. https://doi.org/10.1016/j.knosys.2020.106553
Tabakhi S, Moradi P (2015) Relevance-redundancy feature selection based on ant colony optimization. Pattern Recognit 48:2798–2811. https://doi.org/10.1016/j.patcog.2015.03.020
Bir-Jmel A, Douiri SM, Elbernoussi S (2019) Gene selection via a new hybrid ant colony optimization algorithm for cancer classification in high-dimensional data. Comput Math Methods Med. https://doi.org/10.1155/2019/7828590
Alomari OA, Khader AT, Al-Betar MA, Awadallah MA (2018) A novel gene selection method using modified MRMR and hybrid bat-inspired algorithm with β-hill climbing. Appl Intell 48:4429–4447. https://doi.org/10.1007/s10489-018-1207-1
Alshamlan HM (2018) Co-ABC: correlation artificial bee colony algorithm for biomarker gene discovery using gene expression profile. Saudi J Biol Sci 25:895–903. https://doi.org/10.1016/j.sjbs.2017.12.012
Alshamlan H, Badr G, Alohali Y (2015) mRMR-ABC: a hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. Biomed Res Int 2015:604910. https://doi.org/10.1155/2015/604910
Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput J 62:203–215. https://doi.org/10.1016/j.asoc.2017.09.038
Li X, Yin M (2013) Multiobjective binary biogeography based optimization for feature selection using gene expression data. IEEE Trans Nanobiosci 12:343–353. https://doi.org/10.1109/TNB.2013.2294716
Shukla AK (2019) Multi-population adaptive genetic algorithm for selection of microarray biomarkers. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04671-2
Zhou Y, Zhang W, Kang J et al (2021) A problem-specific non-dominated sorting genetic algorithm for supervised feature selection. Inf Sci (Ny) 547:841–859. https://doi.org/10.1016/j.ins.2020.08.083
Shukla AK, Singh P, Vardhan M (2020) An adaptive inertia weight teaching-learning-based optimization algorithm and its applications. Appl Math Model 77:309–326. https://doi.org/10.1016/j.apm.2019.07.046
Dash R (2021) An adaptive harmony search approach for gene selection and classification of high dimensional medical data. J King Saud Univ Comput Inf Sci 33:195–207. https://doi.org/10.1016/j.jksuci.2018.02.013
Venkatasalam K, Rajendran P, Thangavel M (2019) Improving the accuracy of feature selection in big data mining using accelerated flower pollination (AFP) algorithm. J Med Syst 43:1–11. https://doi.org/10.1007/s10916-019-1200-1
Tumuluru P, Ravi B (2018) Chronological grasshopper optimization algorithm-based gene selection and cancer classification. J Adv Res Dyn Control Syst 10:80–94
Srivastava A, Chakrabarti S, Das S, et al (2013) Hybrid firefly based simultaneous gene selection and cancer classification using support vector machines and random forests. In: Advances in intelligent systems and computing. Springer Verlag, pp 485–494
Medjahed SA, Saadi TA, Benyettou A, Ouali M (2017) Kernel-based learning and feature selection analysis for cancer diagnosis. Appl Soft Comput J 51:39–48. https://doi.org/10.1016/j.asoc.2016.12.010
Alomari OA, Makhadmeh SN, Al-Betar MA et al (2021) Gene selection for microarray data classification based on Gray Wolf Optimizer enhanced with TRIZ-inspired operators. Knowl Based Syst 223:107034. https://doi.org/10.1016/j.knosys.2021.107034
Khishe M, Mosavi MR (2020) Chimp optimization algorithm. Exp Syst Appl 149:113338. https://doi.org/10.1016/j.eswa.2020.113338
Khishe M, Mosavi MR (2020) Classification of underwater acoustical dataset using neural network trained by Chimp Optimization Algorithm. Appl Acoust 157:107005. https://doi.org/10.1016/j.apacoust.2019.107005
Kaur M, Kaur R, Singh N, Dhiman G (2021) SChoA: an newly fusion of sine and cosine with chimp optimization algorithm for HLS of datapaths in digital filters and engineering applications. Eng Comput. https://doi.org/10.1007/s00366-020-01233-2
Jia H, Sun K, Zhang W, Leng X (2021) An enhanced chimp optimization algorithm for continuous optimization domains. Complex Intell Syst 1:3. https://doi.org/10.1007/s40747-021-00346-5
Dashtban M, Balafar M (2017) Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts. Genomics 109:91–107. https://doi.org/10.1016/j.ygeno.2017.01.004
Ahmed MS, Shahjaman M, Rana MM, Mollah MNH (2017) Robustification of Naïve Bayes classifier and its application for microarray gene expression data analysis. Biomed Res Int. https://doi.org/10.1155/2017/3020627
Maleki N, Zeinali Y, Niaki STA (2021) A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection. Exp Syst Appl 164:113981. https://doi.org/10.1016/J.ESWA.2020.113981
Tumar I, Hassouneh Y, Turabieh H, Thaher T (2020) Enhanced binary moth flame optimization as a feature selection algorithm to predict software fault prediction. IEEE Access 8:8041–8055. https://doi.org/10.1109/ACCESS.2020.2964321
Agrawal P, Abutarboush HF, Ganesh T, Mohamed AW (2021) Metaheuristic algorithms on feature selection: a survey of one decade of research (2009–2019). IEEE Access 9:26766–26791. https://doi.org/10.1109/ACCESS.2021.3056407
Shukla AK, Tripathi D, Reddy BR, Chandramohan D (2020) A study on metaheuristics approaches for gene selection in microarray data: algorithms, applications and open challenges. Evol Intell 13:309–329
Zhang G, Hou J, Wang J et al (2020) Feature selection for microarray data classification using hybrid information gain and a modified Binary Krill Herd algorithm. Interdiscip Sci Comput Life Sci 12:288–301. https://doi.org/10.1007/s12539-020-00372-w
Wang A, An N, Yang J et al (2017) Wrapper-based gene selection with Markov blanket. Comput Biol Med 81:11–23. https://doi.org/10.1016/j.compbiomed.2016.12.002
Djellali H, Guessoum S, Ghoualmi-Zine N, Layachi S (2017) Fast correlation based filter combined with genetic algorithm and particle swarm on feature selection. In: 2017 5th International conference on electrical engineering - boumerdes, ICEE-B 2017. Institute of Electrical and Electronics Engineers Inc., pp 1–6
Sun L, Zhang XY, Qian YH et al (2019) Joint neighborhood entropy-based gene selection method with fisher score for tumor classification. Appl Intell 49:1245–1259. https://doi.org/10.1007/s10489-018-1320-1
Sun L, Kong X, Xu J et al (2019) A hybrid gene selection method based on relieff and ant colony optimization algorithm for tumor classification. Sci Rep 9:1–14. https://doi.org/10.1038/s41598-019-45223-x
Zhang H, Li L, Luo C et al (2014) Informative gene selection and direct classification of tumor based on chi-square test of pairwise gene interactions. Biomed Res Int. https://doi.org/10.1155/2014/589290
Pashaei E, Pashaei E (2019) Gene selection using intelligent dynamic genetic algorithm and random forest. In: 2019 11th international conference on electrical and electronics engineering (ELECO). pp 470–474
Covões TF, Hruschka ER (2011) Towards improving cluster-based feature selection with a simplified silhouette filter. Inf Sci (Ny) 181:3766–3782. https://doi.org/10.1016/J.INS.2011.04.050
Shukla AK, Singh P, Vardhan M (2018) A two-stage gene selection method for biomarker discovery from microarray data for cancer classification. Chemom Intell Lab Syst 183:47–58. https://doi.org/10.1016/J.CHEMOLAB.2018.10.009
Meyer PE, Schretter C, Bontempi G (2008) Information-theoretic feature selection in microarray data using variable complementarity. IEEE J Sel Top Signal Process 2:261–274. https://doi.org/10.1109/JSTSP.2008.923858
Alomari OA, Khader AT, Al-Betar MA, Abualigah LM (2017) MRMR BA: a hybrid gene selection algorithm for cancer classification. J Theor Appl Inf Technol 95:2610–2618
Pashaei E, Pashaei E (2021) Gene selection using hybrid dragonfly black hole algorithm: a case study on RNA-seq COVID-19 data. Anal Biochem 627:114242. https://doi.org/10.1016/j.ab.2021.114242
Devi Arockia Vanitha C, Devaraj D, Venkatesulu M (2014) Gene expression data classification using Support Vector Machine and mutual information-based gene selection. In: Procedia computer science. Elsevier B.V., pp 13–21
Chen KH, Wang KJM, Tsai ML et al (2014) Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm. BMC Bioinform 15:1–10. https://doi.org/10.1186/1471-2105-15-49
Zeebaree DQ, Haron H, Abdulazeez AM (2018) Gene selection and classification of microarray data using convolutional neural network. In: ICOASE 2018 - International conference on advanced science and engineering. institute of electrical and electronics engineers Inc., pp 145–150
Wang A, An N, Chen G et al (2015) Accelerating wrapper-based feature selection with K-nearest-neighbor. Knowl Based Syst 83:81–91. https://doi.org/10.1016/j.knosys.2015.03.009
Shukla AK, Singh P, Vardhan M (2018) A hybrid gene selection method for microarray recognition. Biocybern Biomed Eng 38:975–991. https://doi.org/10.1016/j.bbe.2018.08.004
Lin HY (2016) Gene discretization based on EM clustering and adaptive sequential forward gene selection for molecular classification. Appl Soft Comput J 48:683–690. https://doi.org/10.1016/j.asoc.2016.07.015
Haq AU, Li J, Memon MH, et al (2019) Heart disease prediction system using model of machine learning and sequential backward selection algorithm for features selection. In: 2019 IEEE 5th international conference for convergence in technology, I2CT 2019. Institute of Electrical and Electronics Engineers Inc.
Beheshti Z (2021) UTF: upgrade transfer function for binary meta-heuristic algorithms. Appl Soft Comput 106:107346. https://doi.org/10.1016/j.asoc.2021.107346
Mirjalili S, Lewis A (2013) S-shaped versus V-shaped transfer functions for binary particle swarm optimization. Swarm Evol Comput 9:1–14. https://doi.org/10.1016/j.swevo.2012.09.002
Hammouri AI, Mafarja M, Al-Betar MA et al (2020) An improved Dragonfly Algorithm for feature selection. Knowl Based Syst 203:106131. https://doi.org/10.1016/j.knosys.2020.106131
Mirjalili S, Zhang H, Mirjalili S et al (2020) A novel U-shaped transfer function for binary particle swarm optimisation. In: Advances in intelligent systems and computing. Springer, pp 241–259
Beheshti Z (2021) A novel x-shaped binary particle swarm optimization. Soft Comput 25:3013–3042. https://doi.org/10.1007/s00500-020-05360-2
Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput J 56:94–106. https://doi.org/10.1016/j.asoc.2017.03.002
Pashaei E, Pashaei E, Aydin N (2019) Gene selection using hybrid binary black hole algorithm and modified binary particle swarm optimization. Genomics 111:669–686. https://doi.org/10.1016/j.ygeno.2018.04.004
Pashaei E, Pashaei E (2020) Gene selection for cancer classification using a new hybrid of binary black hole algorithm. In: The 28th IEEE conference on signal processing and communications applications (SIU2020)
Shukla AK, Singh P, Vardhan M (2019) A new hybrid wrapper TLBO and SA with SVM approach for gene expression data. Inf Sci (Ny) 503:238–254. https://doi.org/10.1016/j.ins.2019.06.063
Baliarsingh SK, Muhammad K, Bakshi S (2021) SARA: a memetic algorithm for high-dimensional biomedical data. Appl Soft Comput 101:107009. https://doi.org/10.1016/j.asoc.2020.107009
Nagpal S, Arora S, Dey S, Shreya S (2017) Feature selection using gravitational search algorithm for biomedical data. In: Procedia Computer Science. Elsevier B.V., pp 258–265
Tuba E, Strumberger I, Bezdan T et al (2019) Classification and feature selection method for medical datasets by brain storm optimization algorithm and support vector machine. In: Procedia Computer Science. Elsevier B.V., pp 307–315
Chaudhuri A, Sahu TP (2021) A hybrid feature selection method based on Binary Jaya algorithm for micro-array data classification. Comput Electr Eng 90:106963. https://doi.org/10.1016/j.compeleceng.2020.106963
Sharifai GA, Zainol Z (2020) Feature selection for high-dimensional and imbalanced biomedical data based on robust correlation based redundancy and binary grasshopper optimization algorithm. Genes (Basel) 11:1–26. https://doi.org/10.3390/genes11070717
Coleto-Alcudia V, Vega-Rodríguez MA (2020) Artificial Bee Colony algorithm based on Dominance (ABCD) for a hybrid gene selection method. Knowl Based Syst 205:106323. https://doi.org/10.1016/j.knosys.2020.106323
Wang H, Jing X, Niu B (2017) A discrete bacterial algorithm for feature selection in classification of microarray gene expression cancer data. Knowl Based Syst 126:8–19. https://doi.org/10.1016/j.knosys.2017.04.004
Alzaqebah M, Briki K, Alrefai N et al (2021) Memory based cuckoo search algorithm for feature selection of gene expression dataset. Informat Med Unlocked 24:100572. https://doi.org/10.1016/J.IMU.2021.100572
Dabba A, Tari A, Meftali S (2021) Hybridization of Moth flame optimization algorithm and quantum computing for gene selection in microarray data. J Ambient Intell Humaniz Comput 12:2731–2750. https://doi.org/10.1007/s12652-020-02434-9
Hall M, Frank E, Holmes G et al (2009) The WEKA data mining software. ACM SIGKDD Explor Newsl 11:10–18. https://doi.org/10.1145/1656274.1656278
Chouhan SS, Kaul A, Singh UP, Jain S (2018) Bacterial foraging optimization based radial basis function neural network (BRBFNN) for identification and classification of plant leaf diseases: an automatic approach towards plant pathology. IEEE Access 6:8852–8863. https://doi.org/10.1109/ACCESS.2018.2800685
Qi C, Diao J, Qiu L (2019) On estimating model in feature selection with cross-validation. IEEE Access 7:33454–33463. https://doi.org/10.1109/ACCESS.2019.2892062
Author information
Authors and Affiliations
Contributions
Elnaz Pashaei and Elham Pashaei designed the model and the computational framework. Both carried out the implementation and performed the experiment and wrote the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any data, or other information from studies or experimentation, with the involvement of human or animal subjects.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pashaei, E., Pashaei, E. An efficient binary chimp optimization algorithm for feature selection in biomedical data classification. Neural Comput & Applic 34, 6427–6451 (2022). https://doi.org/10.1007/s00521-021-06775-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06775-0