Skip to main content

Combined Gene Selection Methods for Microarray Data Analysis

  • Conference paper
Knowledge-Based Intelligent Information and Engineering Systems (KES 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4251))

Abstract

In recent years, the rapid development of DNA Microarray technology has made it possible for scientists to monitor the expression level of thousands of genes in a single experiment. As a new technology, Microarray data presents some fresh challenges to scientists since Microarray data contains a large number of genes (around tens thousands) with a small number of samples (around hundreds). Both filter and wrapper gene selection methods aim to select the most informative genes among the massive data in order to reduce the size of the expression database. Gene selection methods are used in both data preprocessing and classification stages. We have conducted some experiments on different existing gene selection methods to preprocess Microarray data for classification by benchmark algorithms SVMs and C4.5. The study suggests that the combination of filter and wrapper methods in general improve the accuracy performance of gene expression Microarray data classification. The study also indicates that not all filter gene selection methods help improve the performance of classification. The experimental results show that among tested gene selection methods, Correlation Coefficient is the best gene selection method for improving the classification accuracy on both SVMs and C4.5 classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alizadeh, A., Eishen, M.B., Davis, E., Ma, C., et al.: Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)

    Article  Google Scholar 

  2. Brown, M., Grundy, W., Lin, D., Cristianini, N., Sugnet, C., Furey, T., Jr., M., Haussler, D.: Knowledge-based analysis of microarray gene expression data by using suport vector machines. In: Proc. Natl. Acad. Sci., vol. 97, pp. 262–267 (2000)

    Google Scholar 

  3. Cho, S.-B., Won, H.-H.: Machine learning in dna microarray analysis for cancer classification. In: CRPITS ’19: Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003, Adelaide, Australia, pp. 189–198. Australian Computer Society, Inc., Darlinghurst, Australia (2003)

    Google Scholar 

  4. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)

    MATH  Google Scholar 

  5. Dettling, M.: Bagboosting for tumor classification with gene expression data. Bioinformatics 20(18), 3583–3593 (2004)

    Article  Google Scholar 

  6. Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine learning 40, 139–157

    Google Scholar 

  7. Furey, T.S., Christianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Hauessler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)

    Article  Google Scholar 

  8. Gordon, G., Jensen, R., Hsiao, L.-L., Gullans, S., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gege expression ratios in lung cancer and mesothelioma. Cancer Research 62, 4963–4967 (2002)

    Google Scholar 

  9. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389–422 (2002)

    Article  MATH  Google Scholar 

  10. Joachims, T.: Making large-scale support vector machine learning practical. In: Advances in Kernel Methods: Support Vector Machines (1998)

    Google Scholar 

  11. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) Proceedings of ECML 1998, 10th European Conference on Machine Learning, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  12. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)

    Article  MATH  Google Scholar 

  13. Koller, D., Sahami, M.: Toward optimal feature selection. In: International Conference on Machine Learning, pp. 284–292 (1996)

    Google Scholar 

  14. Li, J., Liu, H.: Kent ridge bio-medical data set repository (2002), http://sdmc.lit.org.sg/gedatasets/datasets.html

  15. Li, J., Liu, H., Ng, S.-K., Wong, L.: Discovery of significant rules for classifying cancer diagnosis data. In: ECCB, pp. 93–102 (2003)

    Google Scholar 

  16. Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems, vol. 10, pp. 570–576. MIT Press, Cambridge (1998)

    Google Scholar 

  17. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  18. Osuna, E., Freund, R., Girosi, F.: Training support vector machines:an application to face detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (1997)

    Google Scholar 

  19. Quinlan, J.: Improved use of continuous attributes in C4.5. Artificial Intelligence Research 4, 77–90 (1996)

    MATH  Google Scholar 

  20. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, California (1993)

    Google Scholar 

  21. Tan, A.C., Gibert, D.: Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics 2(3), 75–83 (2003)

    Google Scholar 

  22. Golub, T.R., Slonim, D.K., Tamayo, P., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  23. Van’t Veer, L.J., Dai, H., Van de Vijver, M.J., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)

    Article  Google Scholar 

  24. West, M., Blanchette, C., et al.: Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. U S A 98, 11462–11467 (2001)

    Article  Google Scholar 

  25. Yeang, C., Ramaswamy, S., Tamayo, P., et.,, al,: Molecular classification of multiple tumor types. Bioinformatics 17(Suppl. 1), 316–322 (2001)

    Google Scholar 

  26. Yu, L., Liu, H.: Redundancy based feature selection for microarray data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, pp. 737–742 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hu, H., Li, J., Wang, H., Daggard, G. (2006). Combined Gene Selection Methods for Microarray Data Analysis. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2006. Lecture Notes in Computer Science(), vol 4251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11892960_117

Download citation

  • DOI: https://doi.org/10.1007/11892960_117

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-46535-5

  • Online ISBN: 978-3-540-46536-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics