Skip to main content

Advertisement

Log in

A Filter Feature Selection Method Based on MFA Score and Redundancy Excluding and It’s Application to Tumor Gene Expression Data Analysis

  • Original Research Article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Feature selection techniques have been widely applied to tumor gene expression data analysis in recent years. A filter feature selection method named marginal Fisher analysis score (MFA score) which is based on graph embedding has been proposed, and it has been widely used mainly because it is superior to Fisher score. Considering the heavy redundancy in gene expression data, we proposed a new filter feature selection technique in this paper. It is named MFA score+ and is based on MFA score and redundancy excluding. We applied it to an artificial dataset and eight tumor gene expression datasets to select important features and then used support vector machine as the classifier to classify the samples. Compared with MFA score, t test and Fisher score, it achieved higher classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Abbreviations

MFA:

Marginal Fisher analysis

FDA:

Fisher discriminant analysis

SVM:

Support vector machine

MFA score+:

Marginal Fisher analysis score and redundancy excluding

References

  1. Liu H, Li J, Wong L (2002) A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Inform 13:51–60

    CAS  PubMed  Google Scholar 

  2. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517

    Article  CAS  Google Scholar 

  3. He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Proceedings of neural information processing systems, pp 505–512

  4. He X, Cai D, Yan S, Zhang H (2005) Neighborhood preserving embedding. IEEE Int Conf Comput Vis 2:1208–1213

    Google Scholar 

  5. He X, Yan S, Hu Y, Niyogi P, Zhang H (2005) Face recognition using laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27:328–340

    Article  Google Scholar 

  6. Xu D, Yan S, Tao D, Lin S, Zhang H (2007) Marginal Fisher analysis and its variants for human gait recognition and content-based image retrieval. IEEE Trans Image Process 16:2811–2821

    Article  Google Scholar 

  7. Yan S, Xu D, Zhang B, Zhang H (2005) Graph embedding: a general framework for dimensionality reduction. IEEE Intell Conf Comput Vis Pattern Recognit 2:830–837

    Google Scholar 

  8. Yan S, Xu D, Zhang L, Zhang B, Zhang H (2005) Coupled kernel-based subspace learning. Comput Soc Conf Comput Vis Pattern Recognit 1:645–650

    Google Scholar 

  9. Wei D, Li S, Tan M (2012) Graph embedding based feature selection. Neurocomputing 93:115–125

    Article  Google Scholar 

  10. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224

    Google Scholar 

  11. Statnikov A, Tsamardinos I, Dosbayev Y, Aliferis CF (2005) GEMS: A system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int J Med Inform 74:491–503

    Article  Google Scholar 

  12. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15:1373–1396

    Article  Google Scholar 

  13. Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156

    Article  Google Scholar 

  14. Devore J, Peck R (1997) Statistics: the exploration and analysis of data. Duxbury Press, Pacific Grove

    Google Scholar 

  15. Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New York

    Google Scholar 

  16. Fukunaga K, Mantock JM (1983) Nonparametric discriminant analysis. IEEE Trans Pattern Anal Mach Intell 5:671–678

    Article  CAS  Google Scholar 

  17. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537

    Article  CAS  Google Scholar 

  18. Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell 97:273–324

    Article  Google Scholar 

  19. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326

    Article  CAS  Google Scholar 

  20. The Y, Roweis S (2002) Automatic alignment of hidden representations. Adv Neural Inf Process Syst 15:841–848

    Google Scholar 

  21. Tenenbaum JB, Silva VD, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323

    Article  CAS  Google Scholar 

  22. Yan S, Zhang H, Hu Y, Zhang B, Cheng Q (2004) Discriminant analysis on embedded manifold. Process Eighth Eur Conf Comput Vis 1:121–132

    Google Scholar 

  23. Ye J, Janardan R, Li Q (2005) Two-dimensional linear discriminant analysis. Adv Neural Inf Process Syst 17:1569–1576

    Google Scholar 

  24. Yu H, Yang J (2001) A direct LDA algorithm for high dimensional data-with application to face recognition. Pattern Recognit 34:2067–2070

    Article  Google Scholar 

  25. Zhao J, Lu K, He X (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71:1842–1849

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by the Project for the National Key Technology R&D Program under Grant No. 2011BAC12B0304 and the Scientific Plan of Beijing Municipal Commission of Education under Grant No. JC002011200903.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Su.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Su, L. & Pang, Z. A Filter Feature Selection Method Based on MFA Score and Redundancy Excluding and It’s Application to Tumor Gene Expression Data Analysis. Interdiscip Sci Comput Life Sci 7, 391–396 (2015). https://doi.org/10.1007/s12539-015-0272-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-015-0272-y

Keywords

Navigation