Skip to main content
Log in

An efficient gene selection technique for cancer recognition based on neighborhood mutual information

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Gene selection is a key problem in gene expression based cancer recognition and related tasks. A measure, called neighborhood mutual information (NMI), is introduced to evaluate the relevance between genes and related decision in this work. Then the measure is combined with the search strategy of minimal redundancy and maximal relevancy (mRMR) for constructing a NMI based mRMR gene selection algorithm (NMI_mRMR). In addition, it is also found that the first k best genes with respect to NMI are usually enough for cancer classification. We can just perform mRMR on these genes and remove the rest in the preprocessing step, which will lead to reduction of computational time. Based on this observation, an efficient gene selection algorithm, denoted by NMI_EmRMR, is proposed. Several cancer recognition tasks are gathered for testing the proposed technique. The experimental results show NMI_EmRMR is effective and efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Chee M, Yang R, Hubbell E et al (1996) Accessing genetic information with high-density DNA arrays. Science 274:610–614

    Article  Google Scholar 

  2. Fodor SP, Read JL, Pirrung MC et al (1991) Light-directed, spatially addressable parallel chemical synthesis. Science 251:767–773

    Article  Google Scholar 

  3. DeRisi J et al (1996) Use of a cDNA microarray to analyze gene expression patterns in human cancer. Nat Genet 14:457–460

    Article  Google Scholar 

  4. Golub T et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537

    Article  Google Scholar 

  5. Hoogeboom HJ, Kosters WA, Laros JFJ (2008) Selection of DNA markers. IEEE Trans Syst Man Cybernet Part C Appl Rev 38:26–32

    Article  Google Scholar 

  6. Piatetsky-Shapiro G, Tamayo P (2003) Articles on microarray data mining. SIGKDD Explor 5:1–5

    Article  Google Scholar 

  7. Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20:2429–2437

    Article  Google Scholar 

  8. Saeys Y, Inza I, Larranag P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517

    Article  Google Scholar 

  9. Bandyopadhyay S, Maulik U, Roy D (2008) Gene identification: classical and computational intelligence approaches. IEEE Trans Syst Man Cybernet Part C Appl Rev 38:55–68

    Article  Google Scholar 

  10. Zhu ZX, Ong YS, Dash M (2007) Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans Syst Man Cybernet Part B Cybernet 37:70–76

    Article  Google Scholar 

  11. Chow TWS, Wang P, Ma EWM (2008) A new feature selection scheme using a data distribution factor for unsupervised nominal data. IEEE Trans Syst Man Cybernet Part B Cybernet 38:499–509

    Article  Google Scholar 

  12. Guyon I et al (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422

    Article  MATH  Google Scholar 

  13. Chen Z, Li J, Wei L (2007) A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue. Artif Intell Med 41:161–175

    Article  Google Scholar 

  14. Liu J, Ranka S, Kahveci T (2008) Classification and feature selection algorithms for multi-class. CGH Data 24:i86–i95

    Google Scholar 

  15. Maglietta R, D’Addabbo A, Piepoli A, Perri BF et al (2007) Selection of relevant genes in cancer diagnosis based on their prediction accuracy. Artif Intell Med 40:29–44

    Article  Google Scholar 

  16. Su Y, Murali TM, Pavlovic V, Kasif S (2003) RankGene: identification of diagnostic genes based on expression data. Bioinformatics, pp 1578–1579

  17. Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the IEEE computer society conference on bioinformatics, pp 523–528

  18. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238

    Article  Google Scholar 

  19. Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3:185–205

    Article  Google Scholar 

  20. Zhang Y, Ding C, Li T (2008) Gene selection algorithm by combining reliefF and mRMR. BMC Genomics 9(Suppl 2):S27. doi:10.1186/1471-2164-9-S2-S27

  21. Yun C, Shin D, Jo H, Yang J, Kim S (2007) An experimental study on feature subset selection methods. Computer and Information Technology, in CIT 2007. 7th IEEE international conference on, pp 77–82

  22. Liu X, Krishnan A, Mondry A (2005) An entropy-based gene selection method for cancer classification using microarray data. BMC Bioinform 6:76. doi:10.1186/1471-2105-6-76

    Article  Google Scholar 

  23. Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5:537–550

    Article  Google Scholar 

  24. Wang H, Bell D, Murtagh F (1999) Axiomatic approach to feature subset selection based on relevance. IEEE Trans Pattern Anal Mach Intell 21:271–277

    Article  Google Scholar 

  25. Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of 17th international conference machine learning, pp 359–366

  26. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res J Mach Learn Res 5:1205–1224

    MathSciNet  Google Scholar 

  27. Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of thirteenth international joint conference on artificial intelligence. Morgan Kaufmann, San Mateo, CA, pp 1022–1027

  28. Kwak N, Choi CH (2002) Input feature selection by mutual information based on Parzen window. IEEE Trans Pattern Anal Mach Intell 24:1667–1671

    Article  Google Scholar 

  29. Li J, Su H, Chen H, Futscher BW (2007) Optimal search-based gene subset selection for gene array cancer classification. IEEE Trans Inform Technol Biomed 11:398–405

    Article  Google Scholar 

  30. Perou CM, Sørlie T, Eisen MB et al (2000) Molecular portraits of human breast tumours. Nature 406:747–752

    Article  Google Scholar 

  31. Alizadeh A et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 4051:503–511

    Article  Google Scholar 

  32. Slonim DK, et al (2000) Class predication and discovery using expression data. In: Proceedings of the 4th annual international conference on computational molecular biology, pp 263–272

  33. Liu J, Iba H, Ishizuka M (2001) Selecting informative genes with parallel genetic algorithms in tissue classification. Genome Inform 12:14–23

    Google Scholar 

  34. Armstrong SA et al (2000) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30:41–47

    Article  Google Scholar 

  35. Beer DG, Kardia SLR, Huang CC et al (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8:816–824

    Google Scholar 

  36. Khan J, Weil JS, Ringnér M, Saall LH, Ladanyi M et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7:673–679

    Article  Google Scholar 

  37. Hu QH, Yu DR, Liu JF, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178:3577–3594

    Article  MATH  MathSciNet  Google Scholar 

  38. Hu QH, Yu DR, Xie ZX. Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27:414–423

  39. Robnik-sikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53:23–69

    Article  MATH  Google Scholar 

  40. Sotoca JM, Pla F, Sánchez JS (2007) Band selection in multispectral images by minimization of dependent information. IEEE Trans Syst Man Cybernet Part C Appl Rev 37:258–267

    Article  Google Scholar 

Download references

Acknowledgments

Supported by the National Natural Science Foundation of China under Grants No. 60703013 and 61070089.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qinghua Hu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, Q., Pan, W., An, S. et al. An efficient gene selection technique for cancer recognition based on neighborhood mutual information. Int. J. Mach. Learn. & Cyber. 1, 63–74 (2010). https://doi.org/10.1007/s13042-010-0008-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-010-0008-6

Keywords

Navigation