Skip to main content

Advertisement

Log in

Feature Selection for Microarray Data Classification Using Hybrid Information Gain and a Modified Binary Krill Herd Algorithm

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Due to the presence of irrelevant or redundant data in microarray datasets, capturing potential patterns accurately and directly via existing models is difficult. Feature selection (FS) has become a necessary strategy to identify and screen out the most relevant attributes. However, the high dimensionality of microarray datasets poses a serious challenge to most existing FS algorithms. For this purpose, we propose a novel feature selection strategy in this paper, called IG-MBKH. A pre-screening method of feature ranking which is based on information gain (IG) and an improved binary krill herd (MBKH) algorithm are integrated in this strategy. When searching for feature subsets using MBKH, a hyperbolic tangent function, an adaptive transfer factor, and a chaos memory weight factor are introduced to facilitate a better searching the possible feature subsets. The results indicates that the IG-MBKH algorithm can achieve improvement in convergence, the number of features and classification accuracy when compared to the BKH, MBKH, and several newest algorithms. Furthermore, we evaluate the impact of different classifiers on the performance of the strategy we propose.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. It is available: http://csse.szu.edu.cn/staff/zhuzx/Datasets.html.

References

  1. Lee K, Man Z, Wang D, Cao Z (2011) Classification of microarray datasets using finite impulse response extreme learning machine for cancer diagnosis. In: IECON 2011-37th Annual Conference of the IEEE Industrial Electronics Society, pp. 2347–2352. IEEE. https://doi.org/10.1109/IECON.2011.6119676

  2. Luo H, Wang J, Li M, Luo J, Ni P, Zhao K, Wu F, Pan Y (2018) Computational drug repositioning with random walk on a heterogeneous network. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TCBB.2018.2832078

    Article  PubMed  Google Scholar 

  3. Zhang C, Cai H, Huang J, Song Y (2016) nbcnv: a multi-constrained optimization model for discovering copy number variants in single-cell sequencing data. BMC Bioinform 17(1):384. https://doi.org/10.1186/s12859-016-1239-7

    Article  CAS  Google Scholar 

  4. Liu H, Zhao, Z (2012) Manipulating data and dimension reduction methods: Feature selection. In: Computational Complexity: theory, techniques, and applications, pp. 1790–1800. Springer, New York. https://doi.org/10.1007/978-1-4614-1800-9_115

  5. Ekbal A, Saha S (2015) Joint model for feature selection and parameter optimization coupled with classifier ensemble in chemical mention recognition. Knowl Based Syst 85:37–51. https://doi.org/10.1016/j.knosys.2015.04.015

    Article  Google Scholar 

  6. Wang A, An N, Yang J, Chen G, Li L, Alterovitz G (2017) Wrapper-based gene selection with markov blanket. Comput Biol Med 81:11–23. https://doi.org/10.1016/j.compbiomed.2016.12.002

    Article  CAS  PubMed  Google Scholar 

  7. Martín-Valdivia MT, Díaz-Galiano MC, Montejo-Raez A, Ureña-López L (2008) Using information gain to improve multi-modal information retrieval systems. Inform Process Manag 44(3):1146–1158. https://doi.org/10.1016/j.ipm.2007.09.014

    Article  Google Scholar 

  8. Baldi P, Long AD (2001) A bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17(6):509–519. https://doi.org/10.1093/bioinformatics/17.6.509

    Article  CAS  Google Scholar 

  9. Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: European conference on machine learning, pp 171–182. Springer, New York. https://doi.org/10.1007/3-540-57868-4_57

  10. Karegowda AG, Manjunath A, Jayaram M (2010) Comparative study of attribute selection using gain ratio and correlation based feature selection. Int J Inform Technol Knowl Manag 2(2):271–277

    Google Scholar 

  11. Cheng X, Cai H, Zhang Y, Xu B, Su W (2015) Optimal combination of feature selection and classification via local hyperplane based learning strategy. BMC Bioinform 16(1):219. https://doi.org/10.1186/s12859-015-0629-6

    Article  Google Scholar 

  12. Zhang Y, Gong D, Hu Y, Zhang W (2015) Feature selection algorithm based on bare bones particle swarm optimization. Neurocomputing 148:150–157. https://doi.org/10.1016/j.neucom.2012.09.049

    Article  Google Scholar 

  13. Yan C, Ma J, Luo H, Wang J (2018) A hybrid algorithm based on binary chemical reaction optimization and tabu search for feature selection of high-dimensional biomedical data. Tsinghua Sci Technol 23(6):733–743. https://doi.org/10.26599/TST.2018.9010101

    Article  CAS  Google Scholar 

  14. Wang GG, Gandomi AH, Alavi AH, Hao GS (2014) Hybrid krill herd algorithm with differential evolution for global numerical optimization. Neural Comput Appl 25(2):297–308. https://doi.org/10.1007/s00521-013-1485-9

    Article  CAS  Google Scholar 

  15. Guo L, Wang GG, Gandomi AH, Alavi AH, Duan H (2014) A new improved krill herd algorithm for global numerical optimization. Neurocomputing 138:392–402. https://doi.org/10.1016/j.neucom.2014.01.023

    Article  Google Scholar 

  16. Xue B, Zhang M, Browne WN, Yao X (2015) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evolut Comput 20(4):606–626. https://doi.org/10.1109/TEVC.2015.2504420

    Article  Google Scholar 

  17. Hu B, Dai Y, Su Y, Moore P, Zhang X, Mao C, Chen J, Xu L (2016) Feature selection for optimized high-dimensional biomedical data using an improved shuffled frog leaping algorithm. IEEE/ACM Trans Comput Biol Bioinform 15(6):1765–1773. https://doi.org/10.1109/TCBB.2016.2602263

    Article  PubMed  Google Scholar 

  18. Yan C, Ma J, Luo H, Zhang G, Luo J (2019) A novel feature selection method for high-dimensional biomedical data based on an improved binary clonal flower pollination algorithm. Hum Hered 84(1):1–13. https://doi.org/10.1159/000501652

    Article  CAS  Google Scholar 

  19. Fong S, Deb S, Hanne T, Li JL (2016) Eidetic wolf search algorithm with a global memory structure. Eur J Oper Res 254(1):19–28. https://doi.org/10.1016/j.ejor.2016.03.043

    Article  Google Scholar 

  20. Li J, Fong S, Wong RK, Millham R, Wong KK (2017) Elitist binary wolf search algorithm for heuristic feature selection in high-dimensional bioinformatics datasets. Sci Rep 7(1):4354. https://doi.org/10.1038/s41598-017-04037-5

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Yan C, Ma J, Luo H, Patel A (2019) Hybrid binary coral reefs optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical datasets. Chemom Intell Lab Syst 184:102–111. https://doi.org/10.1016/j.chemolab.2018.11.010

    Article  CAS  Google Scholar 

  22. Preeja V, Shahana A (2016) A binary krill herd approach based feature selection for high dimensional data. In: 2016 International Conference on Inventive Computation Technologies (ICICT), vol. 2, pp 1–6. IEEE. https://doi.org/10.1109/INVENTIVE.2016.7824803

  23. Hu Z, Bao Y, Xiong T, Chiong R (2015) Hybrid filter-wrapper feature selection for short-term load forecasting. Eng Appl Artif Intell 40:17–27. https://doi.org/10.1016/j.engappai.2014.12.014

    Article  Google Scholar 

  24. Chuang LY, Yang CH, Yang CH et al (2010) Ig-ga: a hybrid filter/wrapper method for feature selection of microarray data. J Med Biol Eng 30(1):23–28

    Google Scholar 

  25. Sahu B (2018) A combo feature selection method (filter + wrapper) for microarray gene classification. Int J Pure Appl Math 118(16):389–401

    Google Scholar 

  26. Liu Y, Yi X, Chen R, Zhai Z, Gu J (2018) Feature extraction based on information gain and sequential pattern for english question classification. IET Softw 12(6):520–526. https://doi.org/10.1049/iet-sen.2018.0006

    Article  Google Scholar 

  27. Jadhav S, He H, Jenkins K (2018) Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl Soft Comput 69:541–553. https://doi.org/10.1016/j.asoc.2018.04.033

    Article  Google Scholar 

  28. Lai CM, Yeh WC, Chang CY (2016) Gene selection using information gain and improved simplified swarm optimization. Neurocomputing 218:331–338. https://doi.org/10.1016/j.neucom.2016.08.089

    Article  Google Scholar 

  29. Gandomi AH, Alavi AH (2012) Krill herd: a new bio-inspired optimization algorithm. Commun Nonlinear Sci Numer Simul 17(12):4831–4845. https://doi.org/10.1016/j.cnsns.2012.05.010

    Article  Google Scholar 

  30. Mandal B, Roy PK, Mandal S (2014) Economic load dispatch using krill herd algorithm. Int J Electr Power Energy Syst 57:1–10. https://doi.org/10.1016/j.ijepes.2013.11.016

    Article  Google Scholar 

  31. Amudhavel J, Kumarakrishnan S, Gomathy H, Jayabharathi A, Malarvizhi M, Kumar KP (2015) An scalable bandwidth reduction and optimization in smart phone ad hoc network (span) using krill herd algorithm. In: Proceedings of the 2015 International conference on advanced research in computer science engineering and technology (ICARCSET 2015), p 26. ACM. https://doi.org/10.1145/2743065.2743091

  32. Alatas B, Akin E, Ozer AB (2009) Chaos embedded particle swarm optimization algorithms. Chaos Solitons Fractals 40(4):1715–1734. https://doi.org/10.1016/j.chaos.2007.09.063

    Article  Google Scholar 

  33. Mirjalili S, Lewis A (2013) S-shaped versus v-shaped transfer functions for binary particle swarm optimization. Swarm Evolut Comput 9:1–14. https://doi.org/10.1016/j.swevo.2012.09.002

    Article  Google Scholar 

  34. Zhu Z, Ong YS, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn 40(11):3236–3248. https://doi.org/10.1016/j.patcog.2007.02.007

    Article  Google Scholar 

  35. Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput 56:94–106. https://doi.org/10.1016/j.asoc.2017.03.002

    Article  Google Scholar 

  36. Wang G, Guo L, Wang H, Duan H, Liu L, Li J (2014) Incorporating mutation scheme into krill herd algorithm for global numerical optimization. Neural Comput Appl 24(3–4):853–871. https://doi.org/10.1007/s00521-013-1422-y

    Article  Google Scholar 

  37. Wang G, Guo L, Gandomi AH, Cao L, Alavi AH, Duan H, Li J (2013) Lévy-flight krill herd algorithm. Math Probl Eng. https://doi.org/10.1155/2013/682073

    Article  Google Scholar 

  38. Liu Y, Wang G, Chen H, Dong H, Zhu X, Wang S (2011) An improved particle swarm optimization for feature selection. J Bionic Eng 8(2):191–200. https://doi.org/10.1016/S1672-6529(11)60020-6

    Article  Google Scholar 

  39. Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. In: Feature extraction, construction and selection, pp 117–136. Springer, New York. https://doi.org/10.1007/978-1-4615-5725-8_8

  40. Tahir MA, Bouridane A, Kurugollu F (2007) Simultaneous feature selection and feature weighting using hybrid tabu search/k-nearest neighbor classifier. Pattern Recogn Lett 28(4):438–446. https://doi.org/10.1016/j.patrec.2006.08.016

    Article  Google Scholar 

  41. Vieira SM, Mendonça LF, Farinha GJ, Sousa JM (2013) Modified binary pso for feature selection using svm applied to mortality prediction of septic patients. Appl Soft Comput 13(8):3494–3504. https://doi.org/10.1016/j.asoc.2013.03.021

    Article  Google Scholar 

  42. Bielza C, Larrañaga P (2014) Discrete bayesian network classifiers: a survey. ACM Comput Surv (CSUR) 47(1):1–43. https://doi.org/10.1145/2576868

    Article  Google Scholar 

Download references

Acknowledgements

Feature selection for microarray data classification This work was supported by National Natural Science Foundation of China (Nos. 61802114, 61802113, 61602156S, 61972134), Science and Technology Development Plan Project of Henan Province (No. 202102210173) and Scientific Research Foundation of the Higher Education Institutions of Henan Province (18A520021). The manuscript is selected and recommended by CBC 2019 (The Fourth CCF Bioinformatics Conference).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Chaokun Yan or Junwei Luo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, G., Hou, J., Wang, J. et al. Feature Selection for Microarray Data Classification Using Hybrid Information Gain and a Modified Binary Krill Herd Algorithm. Interdiscip Sci Comput Life Sci 12, 288–301 (2020). https://doi.org/10.1007/s12539-020-00372-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-020-00372-w

Keywords

Navigation