Abstract
Due to the presence of irrelevant or redundant data in microarray datasets, capturing potential patterns accurately and directly via existing models is difficult. Feature selection (FS) has become a necessary strategy to identify and screen out the most relevant attributes. However, the high dimensionality of microarray datasets poses a serious challenge to most existing FS algorithms. For this purpose, we propose a novel feature selection strategy in this paper, called IG-MBKH. A pre-screening method of feature ranking which is based on information gain (IG) and an improved binary krill herd (MBKH) algorithm are integrated in this strategy. When searching for feature subsets using MBKH, a hyperbolic tangent function, an adaptive transfer factor, and a chaos memory weight factor are introduced to facilitate a better searching the possible feature subsets. The results indicates that the IG-MBKH algorithm can achieve improvement in convergence, the number of features and classification accuracy when compared to the BKH, MBKH, and several newest algorithms. Furthermore, we evaluate the impact of different classifiers on the performance of the strategy we propose.
Similar content being viewed by others
Notes
It is available: http://csse.szu.edu.cn/staff/zhuzx/Datasets.html.
References
Lee K, Man Z, Wang D, Cao Z (2011) Classification of microarray datasets using finite impulse response extreme learning machine for cancer diagnosis. In: IECON 2011-37th Annual Conference of the IEEE Industrial Electronics Society, pp. 2347–2352. IEEE. https://doi.org/10.1109/IECON.2011.6119676
Luo H, Wang J, Li M, Luo J, Ni P, Zhao K, Wu F, Pan Y (2018) Computational drug repositioning with random walk on a heterogeneous network. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TCBB.2018.2832078
Zhang C, Cai H, Huang J, Song Y (2016) nbcnv: a multi-constrained optimization model for discovering copy number variants in single-cell sequencing data. BMC Bioinform 17(1):384. https://doi.org/10.1186/s12859-016-1239-7
Liu H, Zhao, Z (2012) Manipulating data and dimension reduction methods: Feature selection. In: Computational Complexity: theory, techniques, and applications, pp. 1790–1800. Springer, New York. https://doi.org/10.1007/978-1-4614-1800-9_115
Ekbal A, Saha S (2015) Joint model for feature selection and parameter optimization coupled with classifier ensemble in chemical mention recognition. Knowl Based Syst 85:37–51. https://doi.org/10.1016/j.knosys.2015.04.015
Wang A, An N, Yang J, Chen G, Li L, Alterovitz G (2017) Wrapper-based gene selection with markov blanket. Comput Biol Med 81:11–23. https://doi.org/10.1016/j.compbiomed.2016.12.002
Martín-Valdivia MT, Díaz-Galiano MC, Montejo-Raez A, Ureña-López L (2008) Using information gain to improve multi-modal information retrieval systems. Inform Process Manag 44(3):1146–1158. https://doi.org/10.1016/j.ipm.2007.09.014
Baldi P, Long AD (2001) A bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17(6):509–519. https://doi.org/10.1093/bioinformatics/17.6.509
Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: European conference on machine learning, pp 171–182. Springer, New York. https://doi.org/10.1007/3-540-57868-4_57
Karegowda AG, Manjunath A, Jayaram M (2010) Comparative study of attribute selection using gain ratio and correlation based feature selection. Int J Inform Technol Knowl Manag 2(2):271–277
Cheng X, Cai H, Zhang Y, Xu B, Su W (2015) Optimal combination of feature selection and classification via local hyperplane based learning strategy. BMC Bioinform 16(1):219. https://doi.org/10.1186/s12859-015-0629-6
Zhang Y, Gong D, Hu Y, Zhang W (2015) Feature selection algorithm based on bare bones particle swarm optimization. Neurocomputing 148:150–157. https://doi.org/10.1016/j.neucom.2012.09.049
Yan C, Ma J, Luo H, Wang J (2018) A hybrid algorithm based on binary chemical reaction optimization and tabu search for feature selection of high-dimensional biomedical data. Tsinghua Sci Technol 23(6):733–743. https://doi.org/10.26599/TST.2018.9010101
Wang GG, Gandomi AH, Alavi AH, Hao GS (2014) Hybrid krill herd algorithm with differential evolution for global numerical optimization. Neural Comput Appl 25(2):297–308. https://doi.org/10.1007/s00521-013-1485-9
Guo L, Wang GG, Gandomi AH, Alavi AH, Duan H (2014) A new improved krill herd algorithm for global numerical optimization. Neurocomputing 138:392–402. https://doi.org/10.1016/j.neucom.2014.01.023
Xue B, Zhang M, Browne WN, Yao X (2015) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evolut Comput 20(4):606–626. https://doi.org/10.1109/TEVC.2015.2504420
Hu B, Dai Y, Su Y, Moore P, Zhang X, Mao C, Chen J, Xu L (2016) Feature selection for optimized high-dimensional biomedical data using an improved shuffled frog leaping algorithm. IEEE/ACM Trans Comput Biol Bioinform 15(6):1765–1773. https://doi.org/10.1109/TCBB.2016.2602263
Yan C, Ma J, Luo H, Zhang G, Luo J (2019) A novel feature selection method for high-dimensional biomedical data based on an improved binary clonal flower pollination algorithm. Hum Hered 84(1):1–13. https://doi.org/10.1159/000501652
Fong S, Deb S, Hanne T, Li JL (2016) Eidetic wolf search algorithm with a global memory structure. Eur J Oper Res 254(1):19–28. https://doi.org/10.1016/j.ejor.2016.03.043
Li J, Fong S, Wong RK, Millham R, Wong KK (2017) Elitist binary wolf search algorithm for heuristic feature selection in high-dimensional bioinformatics datasets. Sci Rep 7(1):4354. https://doi.org/10.1038/s41598-017-04037-5
Yan C, Ma J, Luo H, Patel A (2019) Hybrid binary coral reefs optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical datasets. Chemom Intell Lab Syst 184:102–111. https://doi.org/10.1016/j.chemolab.2018.11.010
Preeja V, Shahana A (2016) A binary krill herd approach based feature selection for high dimensional data. In: 2016 International Conference on Inventive Computation Technologies (ICICT), vol. 2, pp 1–6. IEEE. https://doi.org/10.1109/INVENTIVE.2016.7824803
Hu Z, Bao Y, Xiong T, Chiong R (2015) Hybrid filter-wrapper feature selection for short-term load forecasting. Eng Appl Artif Intell 40:17–27. https://doi.org/10.1016/j.engappai.2014.12.014
Chuang LY, Yang CH, Yang CH et al (2010) Ig-ga: a hybrid filter/wrapper method for feature selection of microarray data. J Med Biol Eng 30(1):23–28
Sahu B (2018) A combo feature selection method (filter + wrapper) for microarray gene classification. Int J Pure Appl Math 118(16):389–401
Liu Y, Yi X, Chen R, Zhai Z, Gu J (2018) Feature extraction based on information gain and sequential pattern for english question classification. IET Softw 12(6):520–526. https://doi.org/10.1049/iet-sen.2018.0006
Jadhav S, He H, Jenkins K (2018) Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl Soft Comput 69:541–553. https://doi.org/10.1016/j.asoc.2018.04.033
Lai CM, Yeh WC, Chang CY (2016) Gene selection using information gain and improved simplified swarm optimization. Neurocomputing 218:331–338. https://doi.org/10.1016/j.neucom.2016.08.089
Gandomi AH, Alavi AH (2012) Krill herd: a new bio-inspired optimization algorithm. Commun Nonlinear Sci Numer Simul 17(12):4831–4845. https://doi.org/10.1016/j.cnsns.2012.05.010
Mandal B, Roy PK, Mandal S (2014) Economic load dispatch using krill herd algorithm. Int J Electr Power Energy Syst 57:1–10. https://doi.org/10.1016/j.ijepes.2013.11.016
Amudhavel J, Kumarakrishnan S, Gomathy H, Jayabharathi A, Malarvizhi M, Kumar KP (2015) An scalable bandwidth reduction and optimization in smart phone ad hoc network (span) using krill herd algorithm. In: Proceedings of the 2015 International conference on advanced research in computer science engineering and technology (ICARCSET 2015), p 26. ACM. https://doi.org/10.1145/2743065.2743091
Alatas B, Akin E, Ozer AB (2009) Chaos embedded particle swarm optimization algorithms. Chaos Solitons Fractals 40(4):1715–1734. https://doi.org/10.1016/j.chaos.2007.09.063
Mirjalili S, Lewis A (2013) S-shaped versus v-shaped transfer functions for binary particle swarm optimization. Swarm Evolut Comput 9:1–14. https://doi.org/10.1016/j.swevo.2012.09.002
Zhu Z, Ong YS, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn 40(11):3236–3248. https://doi.org/10.1016/j.patcog.2007.02.007
Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput 56:94–106. https://doi.org/10.1016/j.asoc.2017.03.002
Wang G, Guo L, Wang H, Duan H, Liu L, Li J (2014) Incorporating mutation scheme into krill herd algorithm for global numerical optimization. Neural Comput Appl 24(3–4):853–871. https://doi.org/10.1007/s00521-013-1422-y
Wang G, Guo L, Gandomi AH, Cao L, Alavi AH, Duan H, Li J (2013) Lévy-flight krill herd algorithm. Math Probl Eng. https://doi.org/10.1155/2013/682073
Liu Y, Wang G, Chen H, Dong H, Zhu X, Wang S (2011) An improved particle swarm optimization for feature selection. J Bionic Eng 8(2):191–200. https://doi.org/10.1016/S1672-6529(11)60020-6
Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. In: Feature extraction, construction and selection, pp 117–136. Springer, New York. https://doi.org/10.1007/978-1-4615-5725-8_8
Tahir MA, Bouridane A, Kurugollu F (2007) Simultaneous feature selection and feature weighting using hybrid tabu search/k-nearest neighbor classifier. Pattern Recogn Lett 28(4):438–446. https://doi.org/10.1016/j.patrec.2006.08.016
Vieira SM, Mendonça LF, Farinha GJ, Sousa JM (2013) Modified binary pso for feature selection using svm applied to mortality prediction of septic patients. Appl Soft Comput 13(8):3494–3504. https://doi.org/10.1016/j.asoc.2013.03.021
Bielza C, Larrañaga P (2014) Discrete bayesian network classifiers: a survey. ACM Comput Surv (CSUR) 47(1):1–43. https://doi.org/10.1145/2576868
Acknowledgements
Feature selection for microarray data classification This work was supported by National Natural Science Foundation of China (Nos. 61802114, 61802113, 61602156S, 61972134), Science and Technology Development Plan Project of Henan Province (No. 202102210173) and Scientific Research Foundation of the Higher Education Institutions of Henan Province (18A520021). The manuscript is selected and recommended by CBC 2019 (The Fourth CCF Bioinformatics Conference).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Zhang, G., Hou, J., Wang, J. et al. Feature Selection for Microarray Data Classification Using Hybrid Information Gain and a Modified Binary Krill Herd Algorithm. Interdiscip Sci Comput Life Sci 12, 288–301 (2020). https://doi.org/10.1007/s12539-020-00372-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-020-00372-w