Skip to main content

A Survey of Classification Techniques for Microarray Data Analysis

  • Chapter
  • First Online:
Handbook of Statistical Bioinformatics

Abstract

With the recent advance of biomedical technology, a lot of ‘OMIC’ data from genomic, transcriptomic, and proteomic domain can now be collected quickly and cheaply. One such technology is the microarray technology which allows researchers to gather information on expressions of thousands of genes all at the same time. With the large amount of data, a new problem surfaces – how to extract useful information from them. Data mining and machine learning techniques have been applied in many computer applications for some time. It would be natural to use some of these techniques to assist in drawing inference from the volume of information gathered through microarray experiments. This chapter is a survey of common classification techniques and related methods to increase their accuracies for microarray analysis based on data mining methodology. Publicly available datasets are used to evaluate their performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. The Human Genome Project (2003, last modified 2008). The human genome project home page. Retrieved from http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml.

  2. Speed, T. (Ed.). (2003). Statistical analysis of gene expression microarray data (Chap. 3). New York: Chapman & Hall/CRC.

    Google Scholar 

  3. NCBI. Dna_microarray (2007). Retrieved from http://www.ncbi.nlm.nih.gov/About/primer/microarrays.html.

  4. Piatetsky-Shapiro, G., & Tamayo, P. (Dec 2003). Microarray data mining: Facing the challenges. SIGKDD Explorations, 5(2), 1–5.

    Article  Google Scholar 

  5. Chng, W. J., et al. (Apr 2007). Molecular dissection of hyperdiploid multiple myeloma by gene expression profiling. Cancer Research, 67(7), 2982–2989.

    Article  Google Scholar 

  6. Golub, T. R., et al. (Oct 15 1999). Molecular classification of cnacer: class discovery and class prediction by gene expression monitoring. Science, 286(5439), 531–537.

    Google Scholar 

  7. Shipp, M. A., et al. (Jan 2002). Diffuse large b-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning. Nature Medicine, 8(1), 68–74.

    Article  Google Scholar 

  8. Kamber, M., & Han, J. (2006). Data mining: Concepts and techniques (2nd ed.). Amsterdam: Elsevier.

    MATH  Google Scholar 

  9. Moore, A. (2006). Lecture notes on data mining. Retrieved from http://www.autonlab.org/tutorials/.

  10. Breiman, L., et al. (1984). Classification and regression trees. Belmont, CA: Wadsworth Press.

    MATH  Google Scholar 

  11. Zhang, H., et al. (2003). Cell and tumor classification using gene expression data: Construction of forests. Proceedings of the National Academy of Sciences of the United States of America, 100(7), 4168–4172, APR.

    Google Scholar 

  12. Tan, P. J., Dowe, D. L., & Dix, T. I. (2007). Building classification models from microarray data with tree-based classification algorithms. AI:2007: Advance in Artificial Intelligence, 4830.

    Google Scholar 

  13. Li, X., & Eick, C. F. (2003). Fast decision tree learning techniques for microarray data collections. The 2003 International Conference on Machine Learning and Applications, 2.

    Google Scholar 

  14. Peterson, L. E., & Coleman, M. A. (Jan 2008). Machine learning-based receiver operating characteristic (roc) curves for crisp and fuzzy classification of dna microarrays in cancer research. International Journal of Approximate Reasoning, 47, 17–36.

    Article  MATH  Google Scholar 

  15. Pique-Regi, R., et al. (2005). Sequential diagonal linear discriminant analysis (seqdlda) for microarray classification and gene identification. Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conf Workshop.

    Google Scholar 

  16. Guo, Y. (2007). Regularized linear discriminant analysis and its application to microarray. Biostatistics, 8(1), 86–100.

    Article  MATH  Google Scholar 

  17. Vapnik, V. (1998). Statistical learning theory (1st ed.). John Wiley and Sons, Inc., Hoboken, New Jersey.

    MATH  Google Scholar 

  18. Brown, M. et al. (Jan 2000). Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Sciences of the United States of America, 97(1), 262–267.

    Article  Google Scholar 

  19. Guyon, B., Weston, S., Barnhill, V., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1–3), 389–422.

    Article  MATH  Google Scholar 

  20. Zhang, X., et al. (April 2006). Recursive svm feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics, 7, 197.

    Article  Google Scholar 

  21. Zhang, X., et al. (2006). Gene selection using support vector machines with non-convex penalty. Bioinformatics 2006, 22(1), 88–95.

    Google Scholar 

  22. Zhou, X., & Tuck, D. P. (2007). Msvm-rfe: Extensions of svm-rfe for multiclass gene selection on dna microaarray. Bioinformatics, 23(15), 2029.

    Article  Google Scholar 

  23. Khan, J. et al. (Jul 2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7, 673–679.

    Article  Google Scholar 

  24. O’Neill, M., & Song, L. (2003). Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect. BMC Bioinformatics, 4, 13.

    Article  Google Scholar 

  25. Cho, H. S., et al. (2003). cdna microarray data based classification of cancers using neural networks and genetic algorithms. Nanotech, 1, 28–31.

    Google Scholar 

  26. Friedman, N., et al. (2000). Using bayesian networks to analyze expression data. Journal of Computational Biology, 7, 601–620.

    Article  Google Scholar 

  27. de Ferrari, L., & Aikens, S. (2006). Mining housekeeping genes with a naive bayes classifier. BMC Genomics, 7, 277.

    Article  Google Scholar 

  28. Helman, P., et al. (2004). A bayesian network classification methodology for gene expression data. Journal of Computational Biology, 11(4), 581–615.

    Article  Google Scholar 

  29. Demichelis, F., et al. (2006). A hierarchical nave bayes model for handling sample heterogeneity in classification problems: An application to tissue microarrays. BMC Bioinformatics, 7, 514.

    Article  Google Scholar 

  30. Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.

    MathSciNet  MATH  Google Scholar 

  31. Dettling, M. (2004). Bagboosting for tumor classification with gene expression data. Bioinformatics, 20(18), 3583–3593.

    Article  Google Scholar 

  32. Dudoit, S., & Fridlyand, J. (2003). Bagging to improve the accuracy of a clustering procedure. Bioinformatics, 19(9), 1090–1099.

    Article  Google Scholar 

  33. Long, P. M., & Bega, V. B. (2003). Boosting and microarray data. Machine Learning, 52(1), 31–44.

    Article  MATH  Google Scholar 

  34. Simon, R. (2008). Challenges of microarray data and the evaluation of gene expression profile signatures. Cancer Investigation, 26, 327–332.

    Article  Google Scholar 

  35. Yanaihara, N., et al. (Mar 2006). Unique microrna molecular profiles in lung cancer diagnosis and prognosis. Cancer Cell, 9(3), 189–198.

    Article  Google Scholar 

  36. Bianchi, F., et al. (Nov 2007). Survival prediction of stage i lung adenocarcinomas by expression of 10 genes. Journal of Clinical Investigation, 117(11), 3436–3444.

    Article  Google Scholar 

  37. NCI. Review (2003). Retrieved from http://linus.nci.nih.gov/~brb/book.html.

  38. Simon, R., et al. (2004). Design and analysis of DNA microarray investigations. London-Berlin-Heidelberg: Springer-Verlag.

    MATH  Google Scholar 

  39. Slawski, M., et al. (Oct 2008). Cma: A comprehensive bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics, 9(1), 439.

    Article  Google Scholar 

  40. Golub, T. R., et al. (Oct 1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286(5439), 531–537.

    Article  Google Scholar 

  41. Reich, M., et al. (May 2006). Genepattern 2.0. Nature Genetics, 38(5), 500–501.

    Google Scholar 

  42. Gadisseur, A., et al. (Jun 2009). Laboratory diagnosis and molecular classification of von willebrand disease. Acta Haematology, 121(2–3), 71–84.

    Article  Google Scholar 

  43. Moreno, C. S., et al. (Nov 2005). Novel molecular signaling and classification of human clinically nonfunctional pituitary adenomas identified by gene expression profiling and proteomic analyses. Cancer Research, 65(22), 10214–10222.

    Article  Google Scholar 

  44. Tibshirani, R., et al. (Mar 2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences of the United States of America, 99, 6567–6572.

    Article  Google Scholar 

  45. Li, C., et al. (2001). Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proceedings of the National Academy of Science United States of America, 98, 31–36.

    Article  MATH  Google Scholar 

  46. Lin, M., et al. (2004). dchipsnp: Significance curve and clustering of snp-array-based loss-of-heterozygosity data. Bioinformatics, 20, 1233–1240.

    Article  Google Scholar 

  47. Wired. (Aug 2003). The end of cancer (as we know it). Wired, 11, 8.

    Google Scholar 

  48. The Scientist. (2004). The making of microarray prognosis. The Scientist, 18(5), 32.

    Google Scholar 

  49. Cobb, K. (Fall 2006). Microarrays: The search for meaning in a vast sea of data. Biomedical Computation Review, 2, 17–23.

    Google Scholar 

  50. Dobbin, K., & Simon, R. (2005). Sample size determination in microarray experiments for class comparison and prognostic classification. Biostatistics, 6(1), 27–38.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cheng Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Yip, WK., Amin, S.B., Li, C. (2011). A Survey of Classification Techniques for Microarray Data Analysis. In: Lu, HS., Schölkopf, B., Zhao, H. (eds) Handbook of Statistical Bioinformatics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16345-6_10

Download citation

Publish with us

Policies and ethics