Abstract
Gene expression profiling using micro-arrays is a modern approach for molecular diagnostics. In clinical micro-array studies, researchers aim to predict disease type, survival, or treatment response using gene expression profiles. In this process, they encounter a series of obstacles and pitfalls. This chapter reviews fundamental issues from machine learning and recommends a procedure for the computational aspects of a clinical micro-array study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Roepman, P., Wessels, L. R, Kettelarij, N., et al. (2005) An expression profile for diagnosis of lymph node metastases from primary head and neck squamous cell carcinomas. Nat Genet 37, 182–186.
Schölkopf, B., Smola, A. J. (2001) Learning with Kernels MIT Press, Cambridge, MA.
Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK.
Devroye, L., Györfi, L., Lugosi, L. (1996) A Probabilistic Theory of Pattern Recognition. Springer, New York.
Hastie, T., Tibshirani, R., Friedman, J. (2001) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, New York.
Duda, R. O., Hart, P. E., Stork, D. G. (2001) Pattern Classification. Wiley, New York.
McLachlan, G. J., Do, K. A., Ambroise, C. (2004) Analyzing Micro-array GeneExpression Data. Wiley, New York.
Terry Speed (ed.) (2003) Statistical Analysis of Gene Expression Micro-array Data. Chapman & Hall/CRC, Boca Raton, FL.
Haferlach, T., Kohlmann, A., Schnittger, S., et al. (2005)A global approach to the diagnosis of leukemia using gene expression profiling. Blood 106, 1189–1198.
van't Veer, L. J., Dai, H., van de Vijver, M. J., et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536.
Cheok, M. H., Yang, W, Pui, C. H., et al. (2003) Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells. Nat Genet 34, 85–90.
West, M., Blanchette, C, Dressman, H., et al. (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 98, 11462–11467.
Wessels, L. F., Reinders, M. J., Hart, A. A., et al. (2005) A protocol for building and evaluating predictors of disease state based on micro-array data. Bioinformatics 21, 3755–3762.
Dudoit, S., Fridlyand, J., Speed, T. (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Amer Stat Assoc 97, 77–87.
Jäger, J., Weichenhan, D., Ivandic, B., et al. (2005) Early diagnostic marker panel determination for micro-array based clinical studies. SAGMB 4, Art 9.
John, G. H., Kohavi, R., Pfleger, K. (1994) Irrelevant Features and the Subset Selection Problem Morgan Kaufmann Publishers International Conference on Machine Learning, San Francisco CA, USA pp. 121–129.
Ihaka, R., Gentleman, R. (1996) R: a language for data analysis and graphics. J Corn-put Graphical Stat 5, 299–314.
Tedm, R. D. C. (2005), R Foundation for Statistical Computing. Vienna, A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria
Gentleman, R. C., Carey, V. J., Bates, D. M., et al. (2004) Bioconductor: Open software development for computational biology and bioinformatics. Gen Biol 5, R80.
Liu, Li, Wong (2005) Use of extreme patient samples for outcome prediction from gene expression data. Bioinformatics.
Stone, M. (1974) Cross-validatory choice and assessment of statistical predictions. J Roy Stat Soc Series B (Method) 36, 111–147.
Geisser, S. (1975) The predictive sample reuse method with applications. J Amer Stat Assoc 70, 320–328.
Ruschhaupt, M., Huber, W, Poustka, A., et al. (2004) A compendium to ensure computational reproducibility in high-dimensional classification tasks. Stat Appl Gen Mol Biol 3, 37.
Dudoit, S. (2003) Introduction to Multiple Hypothesis Testing. Biostatistics Division, California University Berkeley CA, USA.
Tibshirani, R., Hastie, T, Narasimhan, B., et al. (2003) Class prediction by nearest shrunken centroids, with applications to DNA micro-arrays. Statist Sci 18, 104–117.
Tibshirani, R., Hastie, T., Narasimhan, B., et al. (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA 99, 6567– 6572.
Huang, X., Pan, W. (2003) Linear regression and two-class classification with gene expression data. Bioinformatics 19, 2072–2078.
Vapnik, V (1998) Statistical Learning Theory. Wiley, New York.
Vapnik, V (1995) The Nature of Statistical Learning Theory. Springer, New York.
Guyon, I., Weston, J., Barnhill, S., et al. (2002) Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422.
Braga-Neto, U. M., Dougherty, E. R. (2004) Is cross-validation valid for small-sample micro-array classification? Bioinformatics 20, 374–380.
Kohavi, R. (1995) IJCAI 1137–1145.
Hastie, T., Tibshirani, R, Friedman, J. (2001) The Elements of Statistical Learning. Springer, New York.
Efron, B., Tibshirani, R (1997) Improvements on cross-validation: the 632+ bootstrap method. J Amer Stat Assoc 92, 548–560.
Ambroise, C., McLachlan, G. J. (2002) Selection bias in gene extraction on the basis of micro-array gene-expression data. Proc Natl Acad Sci USA 99, 6562–6566.
Simon, R, Radmacher, M. D., Dobbin, K., et al. (2003) Pitfalls in the use of DNA micro-array data for diagnostic and prognostic classification. J Natl Cancer Inst 95, 14–18.
Ntzani, E. E., Ioannidis, J. P. A. (2003) Predictive ability of DNA micro-arrays for cancer outcomes and correlates: an empirical assessment. Lancet 362, 1439–1444.
Reid, J. F., Lusa, L., De Cecco, L., et al. (2005) Limits of predictive models using micro-array data for breast cancer clinical treatment outcome. J Natl Cancer Inst 97, 927–930.
Michiels, S., Koscielny, S., Hill, C. (2005) Prediction of cancer outcome with micro-arrays: a multiple random validation strategy. Lancet 365, 488–492.
van de Vijver, M. J., He, Y. D., van't Veer, L. J., et al. (2002) A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347, 1999–2009.
Sorlie, T., Tibshirani, R, Parker, J., et al. (2003) Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci USA 100, 8418–8423.
Ramaswamy, S., Ross, K. N., Lander, E. S., et al. (2003) A molecular signature of metastasis in primary solid tumors. Nat Genet 33, 49–54.
Dor, L. E., Kela, I., Getz, G., et al. (2005) Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 21, 171–178.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Lottaz, C., Kostka, D., Markowetz, F., Spang, R. (2008). Computational Diagnostics with Gene Expression Profiles. In: Keith, J.M. (eds) Bioinformatics. Methods in Molecular Biology™, vol 453. Humana Press. https://doi.org/10.1007/978-1-60327-429-6_15
Download citation
DOI: https://doi.org/10.1007/978-1-60327-429-6_15
Publisher Name: Humana Press
Print ISBN: 978-1-60327-428-9
Online ISBN: 978-1-60327-429-6
eBook Packages: Springer Protocols