Skip to main content

Selecting Few Genes for Microarray Gene Expression Classification

  • Conference paper
Current Topics in Artificial Intelligence (CAEPIA 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5988))

Included in the following conference series:

Abstract

Due to the high number of gene expressions contained on microarray data, feature extraction techniques are usually applied before inducing classifiers. A common criterion to decide on the number of selected genes is minimizing the classifier error. However, considering the risk of overfitting due to the small sample size, and the fact that the number of selected genes is usually larger than the suspected number of discriminating genes, this work proposes relaxing the minimum error rate criterion. The paper shows that from a small number of feature selection and classification methods, it is possible to find configurations that select few genes without significantly worsening the error rate of the best classifier. Average ranking for 10 to 40 genes shows that SVM-RFE with Naïve Bayes and FCBF with SVM behave consistently well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chai, H., Domeniconi, C.: An evaluation of gene selection methods for multi-class microarray data classification. In: Proceedings of the Workshop W9 on Data Mining and Text Minig for Bioinformatics, pp. 3–10 (2004)

    Google Scholar 

  2. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  3. Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology 3(2), 185–205 (2005)

    Article  Google Scholar 

  4. Ferri, C., Hernández-Orallo, J., Modroiu, R.: An experimental comparison of performance measures for classification. Pattern Recognition Letters (September 2008)

    Google Scholar 

  5. Garcia, S., Herrera, F.: An Extension on ”Statistical Comparisons of Classifiers over Multiple Data Sets” for all Pairwise Comparisons. Journal of Machine Learning Research 9, 2677–2694 (2008)

    MATH  Google Scholar 

  6. Golub, T.R., Stomin, D.K., Tamayo, P.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531 (1999)

    Article  Google Scholar 

  7. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  8. Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Sleeman, D., Edwards, P. (eds.) Machine Learning: Proceedings of International Conference (ICML-92), pp. 249–256 (1992)

    Google Scholar 

  9. Kononenko, I.: Estimating attributes: analysis and extension of relief. In: Proc European Conference on Machine Learning, pp. 171–182 (1994)

    Google Scholar 

  10. Li, T., Zhang, C., Oghara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15), 2429–2437 (2004)

    Article  Google Scholar 

  11. Li, W., Yang, Y.: How many genes are needed for a discriminant microarray data analysis? In: Critical Assessment of Techniques for Microarray Data Mining Workshop, pp. 137–150 (2000)

    Google Scholar 

  12. Li, Z., Zhang, L., Chen, H.: Are filter methods very effective in gene selection of microarray data? In: IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW2007, pp. 97–100 (2007)

    Google Scholar 

  13. Nadeau, C., Bengio, Y.: Inference for the generalization error. Machine Learning 52(3), 239–281 (2003)

    Article  MATH  Google Scholar 

  14. Ridge, K.: Kent ridge bio-medical dataset (2009), http://datam.i2r.a-star.edu.sg/datasets/krbd/

  15. Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques. Bioinformatics 23, 2507–2517 (2007)

    Article  Google Scholar 

  16. Robnik Sikonja, M., Kononenko, I.: An adaptation of relief for attribute estimation in regression. In: Fisher, D.H. (ed.) Machine Learning: Proceedings of the Fourteenth International Conference (ICML-97), pp. 296–304 (1997)

    Google Scholar 

  17. Stiglic, G., Rodríguez, J.-J., Kokol, P.: Feature selection and classification for small gene sets. In: Chetty, M., Ngom, A., Ahmad, S. (eds.) PRIB 2008. LNCS (LNBI), vol. 5265, pp. 121–131. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  18. Symons, S., Nieselt, K.: Data mining microarray data - Comprehensive benchmarking of feature selection and classification methods (Pre-print), http://www.zbit.uni-tuebingen.de/pas/preprints/GCB2006/SymonsNieselt.pdf

  19. Tang, Y., Zhang, Y., Huang, Z.: FCM-SVM-RFE gene feature selection algorithm for leukemia classification from microarray gene expression data. In: FUZZ’05, The 14th IEEE International Conference on Fuzzy Systems, pp. 97–101 (2005)

    Google Scholar 

  20. Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  21. Xiong, M., Fang, Z., Zhao, J.: Biomarker identification by feature wrappers. Genome Research 11, 1878–1887 (2001)

    Google Scholar 

  22. Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alonso-González, C.J., Moro, Q.I., Prieto, O.J., Simón, M.A. (2010). Selecting Few Genes for Microarray Gene Expression Classification. In: Meseguer, P., Mandow, L., Gasca, R.M. (eds) Current Topics in Artificial Intelligence. CAEPIA 2009. Lecture Notes in Computer Science(), vol 5988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14264-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14264-2_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14263-5

  • Online ISBN: 978-3-642-14264-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics