Abstract
With the recent advance of biomedical technology, a lot of ‘OMIC’ data from genomic, transcriptomic, and proteomic domain can now be collected quickly and cheaply. One such technology is the microarray technology which allows researchers to gather information on expressions of thousands of genes all at the same time. With the large amount of data, a new problem surfaces – how to extract useful information from them. Data mining and machine learning techniques have been applied in many computer applications for some time. It would be natural to use some of these techniques to assist in drawing inference from the volume of information gathered through microarray experiments. This chapter is a survey of common classification techniques and related methods to increase their accuracies for microarray analysis based on data mining methodology. Publicly available datasets are used to evaluate their performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
The Human Genome Project (2003, last modified 2008). The human genome project home page. Retrieved from http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml.
Speed, T. (Ed.). (2003). Statistical analysis of gene expression microarray data (Chap. 3). New York: Chapman & Hall/CRC.
NCBI. Dna_microarray (2007). Retrieved from http://www.ncbi.nlm.nih.gov/About/primer/microarrays.html.
Piatetsky-Shapiro, G., & Tamayo, P. (Dec 2003). Microarray data mining: Facing the challenges. SIGKDD Explorations, 5(2), 1–5.
Chng, W. J., et al. (Apr 2007). Molecular dissection of hyperdiploid multiple myeloma by gene expression profiling. Cancer Research, 67(7), 2982–2989.
Golub, T. R., et al. (Oct 15 1999). Molecular classification of cnacer: class discovery and class prediction by gene expression monitoring. Science, 286(5439), 531–537.
Shipp, M. A., et al. (Jan 2002). Diffuse large b-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning. Nature Medicine, 8(1), 68–74.
Kamber, M., & Han, J. (2006). Data mining: Concepts and techniques (2nd ed.). Amsterdam: Elsevier.
Moore, A. (2006). Lecture notes on data mining. Retrieved from http://www.autonlab.org/tutorials/.
Breiman, L., et al. (1984). Classification and regression trees. Belmont, CA: Wadsworth Press.
Zhang, H., et al. (2003). Cell and tumor classification using gene expression data: Construction of forests. Proceedings of the National Academy of Sciences of the United States of America, 100(7), 4168–4172, APR.
Tan, P. J., Dowe, D. L., & Dix, T. I. (2007). Building classification models from microarray data with tree-based classification algorithms. AI:2007: Advance in Artificial Intelligence, 4830.
Li, X., & Eick, C. F. (2003). Fast decision tree learning techniques for microarray data collections. The 2003 International Conference on Machine Learning and Applications, 2.
Peterson, L. E., & Coleman, M. A. (Jan 2008). Machine learning-based receiver operating characteristic (roc) curves for crisp and fuzzy classification of dna microarrays in cancer research. International Journal of Approximate Reasoning, 47, 17–36.
Pique-Regi, R., et al. (2005). Sequential diagonal linear discriminant analysis (seqdlda) for microarray classification and gene identification. Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conf Workshop.
Guo, Y. (2007). Regularized linear discriminant analysis and its application to microarray. Biostatistics, 8(1), 86–100.
Vapnik, V. (1998). Statistical learning theory (1st ed.). John Wiley and Sons, Inc., Hoboken, New Jersey.
Brown, M. et al. (Jan 2000). Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Sciences of the United States of America, 97(1), 262–267.
Guyon, B., Weston, S., Barnhill, V., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1–3), 389–422.
Zhang, X., et al. (April 2006). Recursive svm feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics, 7, 197.
Zhang, X., et al. (2006). Gene selection using support vector machines with non-convex penalty. Bioinformatics 2006, 22(1), 88–95.
Zhou, X., & Tuck, D. P. (2007). Msvm-rfe: Extensions of svm-rfe for multiclass gene selection on dna microaarray. Bioinformatics, 23(15), 2029.
Khan, J. et al. (Jul 2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7, 673–679.
O’Neill, M., & Song, L. (2003). Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect. BMC Bioinformatics, 4, 13.
Cho, H. S., et al. (2003). cdna microarray data based classification of cancers using neural networks and genetic algorithms. Nanotech, 1, 28–31.
Friedman, N., et al. (2000). Using bayesian networks to analyze expression data. Journal of Computational Biology, 7, 601–620.
de Ferrari, L., & Aikens, S. (2006). Mining housekeeping genes with a naive bayes classifier. BMC Genomics, 7, 277.
Helman, P., et al. (2004). A bayesian network classification methodology for gene expression data. Journal of Computational Biology, 11(4), 581–615.
Demichelis, F., et al. (2006). A hierarchical nave bayes model for handling sample heterogeneity in classification problems: An application to tissue microarrays. BMC Bioinformatics, 7, 514.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.
Dettling, M. (2004). Bagboosting for tumor classification with gene expression data. Bioinformatics, 20(18), 3583–3593.
Dudoit, S., & Fridlyand, J. (2003). Bagging to improve the accuracy of a clustering procedure. Bioinformatics, 19(9), 1090–1099.
Long, P. M., & Bega, V. B. (2003). Boosting and microarray data. Machine Learning, 52(1), 31–44.
Simon, R. (2008). Challenges of microarray data and the evaluation of gene expression profile signatures. Cancer Investigation, 26, 327–332.
Yanaihara, N., et al. (Mar 2006). Unique microrna molecular profiles in lung cancer diagnosis and prognosis. Cancer Cell, 9(3), 189–198.
Bianchi, F., et al. (Nov 2007). Survival prediction of stage i lung adenocarcinomas by expression of 10 genes. Journal of Clinical Investigation, 117(11), 3436–3444.
NCI. Review (2003). Retrieved from http://linus.nci.nih.gov/~brb/book.html.
Simon, R., et al. (2004). Design and analysis of DNA microarray investigations. London-Berlin-Heidelberg: Springer-Verlag.
Slawski, M., et al. (Oct 2008). Cma: A comprehensive bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics, 9(1), 439.
Golub, T. R., et al. (Oct 1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286(5439), 531–537.
Reich, M., et al. (May 2006). Genepattern 2.0. Nature Genetics, 38(5), 500–501.
Gadisseur, A., et al. (Jun 2009). Laboratory diagnosis and molecular classification of von willebrand disease. Acta Haematology, 121(2–3), 71–84.
Moreno, C. S., et al. (Nov 2005). Novel molecular signaling and classification of human clinically nonfunctional pituitary adenomas identified by gene expression profiling and proteomic analyses. Cancer Research, 65(22), 10214–10222.
Tibshirani, R., et al. (Mar 2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences of the United States of America, 99, 6567–6572.
Li, C., et al. (2001). Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proceedings of the National Academy of Science United States of America, 98, 31–36.
Lin, M., et al. (2004). dchipsnp: Significance curve and clustering of snp-array-based loss-of-heterozygosity data. Bioinformatics, 20, 1233–1240.
Wired. (Aug 2003). The end of cancer (as we know it). Wired, 11, 8.
The Scientist. (2004). The making of microarray prognosis. The Scientist, 18(5), 32.
Cobb, K. (Fall 2006). Microarrays: The search for meaning in a vast sea of data. Biomedical Computation Review, 2, 17–23.
Dobbin, K., & Simon, R. (2005). Sample size determination in microarray experiments for class comparison and prognostic classification. Biostatistics, 6(1), 27–38.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Yip, WK., Amin, S.B., Li, C. (2011). A Survey of Classification Techniques for Microarray Data Analysis. In: Lu, HS., Schölkopf, B., Zhao, H. (eds) Handbook of Statistical Bioinformatics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16345-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-16345-6_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16344-9
Online ISBN: 978-3-642-16345-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)