Bioinformatics pp 281-296 | Cite as

Computational Diagnostics with Gene Expression Profiles

  • Claudio Lottaz
  • Dennis Kostka
  • Florian Markowetz
  • Rainer Spang
Part of the Methods in Molecular Biology™ book series (MIMB, volume 453)


Gene expression profiling using micro-arrays is a modern approach for molecular diagnostics. In clinical micro-array studies, researchers aim to predict disease type, survival, or treatment response using gene expression profiles. In this process, they encounter a series of obstacles and pitfalls. This chapter reviews fundamental issues from machine learning and recommends a procedure for the computational aspects of a clinical micro-array study.

Key words

Micro-arrays gene expression profiles statistical classification supervised machine learning gene selection model assessment 


  1. 1.
    Roepman, P., Wessels, L. R, Kettelarij, N., et al. (2005) An expression profile for diagnosis of lymph node metastases from primary head and neck squamous cell carcinomas. Nat Genet 37, 182–186.PubMedCrossRefGoogle Scholar
  2. 2.
    Schölkopf, B., Smola, A. J. (2001) Learning with Kernels MIT Press, Cambridge, MA.Google Scholar
  3. 3.
    Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK.Google Scholar
  4. 4.
    Devroye, L., Györfi, L., Lugosi, L. (1996) A Probabilistic Theory of Pattern Recognition. Springer, New York.Google Scholar
  5. 5.
    Hastie, T., Tibshirani, R., Friedman, J. (2001) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, New York.Google Scholar
  6. 6.
    Duda, R. O., Hart, P. E., Stork, D. G. (2001) Pattern Classification. Wiley, New York.Google Scholar
  7. 7.
    McLachlan, G. J., Do, K. A., Ambroise, C. (2004) Analyzing Micro-array GeneExpression Data. Wiley, New York.CrossRefGoogle Scholar
  8. 8.
    Terry Speed (ed.) (2003) Statistical Analysis of Gene Expression Micro-array Data. Chapman & Hall/CRC, Boca Raton, FL.Google Scholar
  9. 9.
    Haferlach, T., Kohlmann, A., Schnittger, S., et al. (2005)A global approach to the diagnosis of leukemia using gene expression profiling. Blood 106, 1189–1198.PubMedCrossRefGoogle Scholar
  10. 10.
    van't Veer, L. J., Dai, H., van de Vijver, M. J., et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536.CrossRefGoogle Scholar
  11. 11.
    Cheok, M. H., Yang, W, Pui, C. H., et al. (2003) Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells. Nat Genet 34, 85–90.PubMedCrossRefGoogle Scholar
  12. 12.
    West, M., Blanchette, C, Dressman, H., et al. (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 98, 11462–11467.PubMedCrossRefGoogle Scholar
  13. 13.
    Wessels, L. F., Reinders, M. J., Hart, A. A., et al. (2005) A protocol for building and evaluating predictors of disease state based on micro-array data. Bioinformatics 21, 3755–3762.PubMedCrossRefGoogle Scholar
  14. 14.
    Dudoit, S., Fridlyand, J., Speed, T. (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Amer Stat Assoc 97, 77–87.CrossRefGoogle Scholar
  15. 15.
    Jäger, J., Weichenhan, D., Ivandic, B., et al. (2005) Early diagnostic marker panel determination for micro-array based clinical studies. SAGMB 4, Art 9.Google Scholar
  16. 16.
    John, G. H., Kohavi, R., Pfleger, K. (1994) Irrelevant Features and the Subset Selection Problem Morgan Kaufmann Publishers International Conference on Machine Learning, San Francisco CA, USA pp. 121–129.Google Scholar
  17. 17.
    Ihaka, R., Gentleman, R. (1996) R: a language for data analysis and graphics. J Corn-put Graphical Stat 5, 299–314.CrossRefGoogle Scholar
  18. 18.
    Tedm, R. D. C. (2005), R Foundation for Statistical Computing. Vienna, A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, AustriaGoogle Scholar
  19. 19.
    Gentleman, R. C., Carey, V. J., Bates, D. M., et al. (2004) Bioconductor: Open software development for computational biology and bioinformatics. Gen Biol 5, R80.CrossRefGoogle Scholar
  20. 20.
    Liu, Li, Wong (2005) Use of extreme patient samples for outcome prediction from gene expression data. Bioinformatics.Google Scholar
  21. 21.
    Stone, M. (1974) Cross-validatory choice and assessment of statistical predictions. J Roy Stat Soc Series B (Method) 36, 111–147.Google Scholar
  22. 22.
    Geisser, S. (1975) The predictive sample reuse method with applications. J Amer Stat Assoc 70, 320–328.CrossRefGoogle Scholar
  23. 23.
    Ruschhaupt, M., Huber, W, Poustka, A., et al. (2004) A compendium to ensure computational reproducibility in high-dimensional classification tasks. Stat Appl Gen Mol Biol 3, 37.Google Scholar
  24. 24.
    Dudoit, S. (2003) Introduction to Multiple Hypothesis Testing. Biostatistics Division, California University Berkeley CA, USA.Google Scholar
  25. 25.
    Tibshirani, R., Hastie, T, Narasimhan, B., et al. (2003) Class prediction by nearest shrunken centroids, with applications to DNA micro-arrays. Statist Sci 18, 104–117.CrossRefGoogle Scholar
  26. 26.
    Tibshirani, R., Hastie, T., Narasimhan, B., et al. (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA 99, 6567– 6572.PubMedCrossRefGoogle Scholar
  27. 27.
    Huang, X., Pan, W. (2003) Linear regression and two-class classification with gene expression data. Bioinformatics 19, 2072–2078.PubMedCrossRefGoogle Scholar
  28. 28.
    Vapnik, V (1998) Statistical Learning Theory. Wiley, New York.Google Scholar
  29. 29.
    Vapnik, V (1995) The Nature of Statistical Learning Theory. Springer, New York.Google Scholar
  30. 30.
    Guyon, I., Weston, J., Barnhill, S., et al. (2002) Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422.CrossRefGoogle Scholar
  31. 31.
    Braga-Neto, U. M., Dougherty, E. R. (2004) Is cross-validation valid for small-sample micro-array classification? Bioinformatics 20, 374–380.PubMedCrossRefGoogle Scholar
  32. 32.
    Kohavi, R. (1995) IJCAI 1137–1145.Google Scholar
  33. 33.
    Hastie, T., Tibshirani, R, Friedman, J. (2001) The Elements of Statistical Learning. Springer, New York.Google Scholar
  34. 34.
    Efron, B., Tibshirani, R (1997) Improvements on cross-validation: the 632+ bootstrap method. J Amer Stat Assoc 92, 548–560.CrossRefGoogle Scholar
  35. 35.
    Ambroise, C., McLachlan, G. J. (2002) Selection bias in gene extraction on the basis of micro-array gene-expression data. Proc Natl Acad Sci USA 99, 6562–6566.PubMedCrossRefGoogle Scholar
  36. 36.
    Simon, R, Radmacher, M. D., Dobbin, K., et al. (2003) Pitfalls in the use of DNA micro-array data for diagnostic and prognostic classification. J Natl Cancer Inst 95, 14–18.PubMedCrossRefGoogle Scholar
  37. 37.
    Ntzani, E. E., Ioannidis, J. P. A. (2003) Predictive ability of DNA micro-arrays for cancer outcomes and correlates: an empirical assessment. Lancet 362, 1439–1444.PubMedCrossRefGoogle Scholar
  38. 38.
    Reid, J. F., Lusa, L., De Cecco, L., et al. (2005) Limits of predictive models using micro-array data for breast cancer clinical treatment outcome. J Natl Cancer Inst 97, 927–930.PubMedCrossRefGoogle Scholar
  39. 39.
    Michiels, S., Koscielny, S., Hill, C. (2005) Prediction of cancer outcome with micro-arrays: a multiple random validation strategy. Lancet 365, 488–492.PubMedCrossRefGoogle Scholar
  40. 40.
    van de Vijver, M. J., He, Y. D., van't Veer, L. J., et al. (2002) A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347, 1999–2009.PubMedCrossRefGoogle Scholar
  41. 41.
    Sorlie, T., Tibshirani, R, Parker, J., et al. (2003) Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci USA 100, 8418–8423.PubMedCrossRefGoogle Scholar
  42. 42.
    Ramaswamy, S., Ross, K. N., Lander, E. S., et al. (2003) A molecular signature of metastasis in primary solid tumors. Nat Genet 33, 49–54.PubMedCrossRefGoogle Scholar
  43. 43.
    Dor, L. E., Kela, I., Getz, G., et al. (2005) Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 21, 171–178.CrossRefGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Claudio Lottaz
    • 1
  • Dennis Kostka
    • 1
  • Florian Markowetz
    • 1
  • Rainer Spang
    • 1
  1. 1.Max Planck Institute for Molecular Genetics and Berlin Center for Genome-Based BioinformaticsBerlinGermany

Personalised recommendations