Skip to main content

High-Dimensional Profiling for Computational Diagnosis

  • Protocol
  • First Online:
Bioinformatics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1526))

Abstract

New technologies allow for high-dimensional profiling of patients. For instance, genome-wide gene expression analysis in tumors or in blood is feasible with microarrays, if all transcripts are known, or even without this restriction using high-throughput RNA sequencing. Other technologies like NMR finger printing allow for high-dimensional profiling of metabolites in blood or urine. Such technologies for high-dimensional patient profiling represent novel possibilities for molecular diagnostics. In clinical profiling studies, researchers aim to predict disease type, survival, or treatment response for new patients using high-dimensional profiles. In this process, they encounter a series of obstacles and pitfalls. We review fundamental issues from machine learning and recommend a procedure for the computational aspects of a clinical profiling study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Roepman P, Wessels LF, Kettelarij N, Kemmeren P, Miles AJ, Lijnzaad P, Tilanus MG, Koole R, Hordijk GJ, van der Vliet PC, Reinders MJ, Slootweg PJ, Holstege FC (2005) An expression profile for diagnosis of lymph node metastases from primary head and neck squamous cell carcinomas. Nat Genet 37:182–186

    Article  CAS  PubMed  Google Scholar 

  2. Schölkopf B, Smola AJ (2001) Learning with kernels. MIT Press, Cambridge, MA

    Google Scholar 

  3. Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge

    Book  Google Scholar 

  4. Devroye L, Györfi L, Lugosi L (1996) A probabilistic theory of pattern recognition. Springer, New York

    Book  Google Scholar 

  5. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, New York

    Book  Google Scholar 

  6. Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New York

    Google Scholar 

  7. McLachlan GJ, Do KA, Ambroise C (2004) Analyzing microarray gene expression data. Wiley, New York

    Book  Google Scholar 

  8. Speed T (2003) Statistical analysis of gene expression microarray data. Chapman & Hall/CRC, Boca Raton, FL

    Book  Google Scholar 

  9. Kohlmann A, Kipps TJ, Rassenti LZ, Downing JR, Shurtleff SA, Mills KI, Gilkes AF, Hofmann WK, Basso G, Dell'orto MC, Foà R, Chiaretti S, De Vos J, Rauhut S, Papenhausen PR, Hernández JM, Lumbreras E, Yeoh AE, Koay ES, Li R, Liu WM, Williams PM, Wieczorek L, Haferlach T (2008) An international standardization programme towards the application of gene expression profiling in routine leukaemia diagnostics: the Microarray Innovations in Leukemia study prophase. Br J Haematol 142(5):802–807

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Bacher U, Kohlmann AI, Haferlach T (2009) Perspectives of gene expression profiling for diagnosis and therapy in haematological malignancies. Brief Funct Genomics 8(3):184–193

    Article  CAS  Google Scholar 

  11. Haferlach T, Kohlmann A, Schnittger S, Dugas M, Hiddemann W, Kern W, Schoch C (2005) A global approach to the diagnosis of leukemia using gene expression profiling. Blood 106:1189–1198

    Article  CAS  PubMed  Google Scholar 

  12. van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536

    Article  PubMed  Google Scholar 

  13. Cheok MH, Yang W, Pui CH, Downing JR, Cheng C, Naeve CW, Relling MV, Evans WE (2003) Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells. Nat Genet 34:85–90

    Article  CAS  PubMed  Google Scholar 

  14. West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA, Marks JR, Nevins JR (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci U S A 98:11462–11467

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Wessels LF, Reinders MJ, Hart AA, Veenman CJ, Dai H, He YD, Veer LJ (2005) A protocol for building and evaluating predictors of disease state based on microarray data. Bioinformatics 21:3755–3762

    Article  CAS  PubMed  Google Scholar 

  16. Dudoit S, Fridlyand J, Speed T (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77–87

    Article  CAS  Google Scholar 

  17. Jäger J, Weichenhan D, Ivandic B, Spang R (2005) Early diagnostic marker panel determination for microarray based clinical studies. SAGMB 4, Art 9

    Google Scholar 

  18. John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: International conference on machine learning. Morgan Kaufmann Publishers, San Francisco, CA, USA, pp 121–129

    Google Scholar 

  19. Ihaka R, Gentleman RC (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5:299–314

    Google Scholar 

  20. R Development Core Team (2006) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria

    Google Scholar 

  21. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80

    Article  PubMed  PubMed Central  Google Scholar 

  22. Liu H, Li J, Wong L (2005) Use of extreme patient samples for outcome prediction from gene expression data. Bioinformatics 21(16):3377–3384

    Article  CAS  PubMed  Google Scholar 

  23. Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B Methodol 36:111–147

    Google Scholar 

  24. Geisser S (1975) The predictive sample reuse method with applications. J Am Stat Assoc 70:320–328

    Article  Google Scholar 

  25. Kohlmann A, Haschke-Becher E, Wimmer B, Huber-Wechselberger A, Meyer-Monard S, Huxol H, Siegler U, Rossier M, Matthes T, Rebsamen M, Chiappe A, Diemand A, Rauhut S, Johnson A, Liu WM, Williams PM, Wieczorek L, Haferlach T (2008) Intraplatform reproducibility and technical precision of gene expression profiling in 4 laboratories investigating 160 leukemia samples: the DACH study. Clin Chem 54(10):1705–1715

    Article  CAS  PubMed  Google Scholar 

  26. Geiss GK, Bumgarner RE, Birditt B, Dahl T, Dowidar N, Dunaway DL, Fell HP, Ferree S, George RD, Grogan T, James JJ, Maysuria M, Mitton JD, Oliveri P, Osborn JL, Peng T, Ratcliffe AL, Webster PJ, Davidson EH, Hood L, Dimitrov K (2008) Direct multiplexed measurement of gene expression with color-coded probe pairs. Nat Biotechnol 26(3):317–325

    Article  CAS  PubMed  Google Scholar 

  27. Masqué-Soler N, Szczepanowski M, Kohler CW, Spang R, Klapper W (2013) Molecular classification of mature aggressive B-cell lymphoma using digital multiplexed gene expression on formalin-fixed paraffin-embedded biopsy specimens. Blood 122(11):1985–1986

    Article  PubMed  Google Scholar 

  28. Scott DW, Wright GW, Williams PM, Lih C-J, Walsh W, Jaffe ES, Rosenwald A, Campo E, Chan WC, Connors JM, Smeland EB, Mottok A, Braziel RM, Ott G, Delabie J, Tubbs RR, Cook JR, Weisenburger DD, Greiner TC, Glinsmann-Gibson BJ, Fu K, Staudt LM, Gascoyne RD, Rimsza LM (2014) Determining cell-of-origin subtypes of diffuse large B-cell lymphoma using gene expression in formalin-fixed paraffin-embedded tissue. Blood 123(8):1214–1217

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36

    Article  PubMed  PubMed Central  Google Scholar 

  30. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Rehrauer H, Opitz L, Tan G, Sieverling L, Schlapbach R (2013) Blind spots of quantitative RNA-seq: the limits for assessing abundance, differential expression, and isoform switching. BMC Bioinformatics 14:370

    Article  PubMed  PubMed Central  Google Scholar 

  32. Xing Y, Yu T, Wu YN, Roy M, Kim J, Lee C (2006) An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs. Nucleic Acids Res 34(10):3150–3160

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628

    Article  CAS  PubMed  Google Scholar 

  34. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Wagner GP, Kin K, Lynch VJ (2012) Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci 131(4):281–285

    Article  CAS  PubMed  Google Scholar 

  36. Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol 15(12):550

    Article  PubMed  PubMed Central  Google Scholar 

  37. Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140

    Article  CAS  PubMed  Google Scholar 

  38. Witten DM (2011) Classification and clustering of sequencing data using a Poisson model. Ann Appl Stat 5(4):2493–2518

    Article  Google Scholar 

  39. Klein MS, Buttchereit N, Miemczyk SP, Immervoll AK, Louis C, Wiedemann S, Junge W, Thaller G, Oefner PJ, Gronwald W (2012) NMR metabolomic analysis of dairy cows reveals milk glycerophosphocholine to phosphocholine ratio as prognostic biomarker for risk of ketosis. J Proteome Res 11(2):1373–1381

    Article  CAS  PubMed  Google Scholar 

  40. Gronwald W, Klein MS, Zeltner R, Schulze BD, Reinhold SW, Deutschmann M, Immervoll AK, Böger CA, Banas B, Eckardt KU, Oefner PJ (2011) Detection of autosomal dominant polycystic kidney disease by NMR spectroscopic fingerprinting of urine. Kidney Int 79:1244–1253

    Article  CAS  PubMed  Google Scholar 

  41. Ernst RR, Bodenhausen G, Wokaun A (1987) Principles of nuclear magnetic resonance in one and two dimensions. Oxford University Press, London

    Google Scholar 

  42. Savorani F, Tomasi G, Engelsen SB (2010) Icoshift: a versatile tool for the rapid alignment of 1D NMR spectra. J Magn Reson 202:190–202

    Article  CAS  PubMed  Google Scholar 

  43. Huber W, Heydebreck AV, Sültmann H, Poustka A, Vingron M (2002) Variance stabilisation applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18:96–104

    Article  Google Scholar 

  44. Bolstad BM, Irizarry RA, Astrand M, Speed TP (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19:185–193

    Article  CAS  PubMed  Google Scholar 

  45. Kohl SM, Klein MS, Hochrein J, Oefner PJ, Spang R, Gronwald W (2012) State-of-the art data normalization methods improve NMR-based metabolomic analysis. Metabolomics 8:146–160

    Article  CAS  PubMed  Google Scholar 

  46. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  47. Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2:121–167

    Article  Google Scholar 

  48. Vapnik V (1998) Statistical learning theory. Wiley, New York

    Google Scholar 

  49. Vapnik V (1995) The nature of statistical learning theory. Springer, New York

    Book  Google Scholar 

  50. Hochrein J, Klein MS, Zacharias HU, Li J, Wijffels G, Schirra HJ, Spang R, Oefner PJ, Gronwald W (2012) Performance evaluation of algorithms for the classification of metabolic 1H-NMR fingerprints. J Proteome Res 11:6242–6251

    CAS  PubMed  Google Scholar 

  51. Simon R, Radmacher MD, Dobbin K, McShane LM (2003) Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst 95:14–18

    Article  CAS  PubMed  Google Scholar 

  52. Ntzani EE, Ioannidis JPA (2003) Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment. Lancet 362:1439–1444

    Article  CAS  PubMed  Google Scholar 

  53. Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci U S A 99:6562–6566

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Reid JF, Lusa L, De Cecco L, Coradini D, Veneroni S, Daidone MG, Gariboldi M, Pierotti MA (2005) Limits of predictive models using microarray data for breast cancer clinical treatment outcome. J Natl Cancer Inst 97:927–930

    Article  CAS  PubMed  Google Scholar 

  55. Michiels S, Koscielny S, Hill C (2005) Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365:488–492

    Article  CAS  PubMed  Google Scholar 

  56. Dudoit S (2003) Introduction to multiple hypothesis testing. Biostatistics Division, California University, Berkeley CA, USA

    Google Scholar 

  57. Tibshirani R, Hastie T, Narasimhan B, Chu G (2003) Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Stat Sci 18:104–117

    Article  Google Scholar 

  58. Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 99:6567–6572

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Huang X, Pan W (2003) Linear regression and two-class classification with gene expression data. Bioinformatics 19:2072–2078

    Article  CAS  PubMed  Google Scholar 

  60. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422

    Article  Google Scholar 

  61. Ruschhaupt M, Huber W, Poustka A, Mansmann U (2004) A compendium to ensure computational reproducibility in high-dimensional classification tasks. Stat Appl Genet Mol Biol 3:37

    Google Scholar 

  62. Braga-Neto UM, Dougherty ER (2004) Is cross-validation valid for small-sample microarray classification? Bioinformatics 20:374–380

    Article  CAS  PubMed  Google Scholar 

  63. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and Model Selection. In International joint conference on artificial intelligence, Montreal, Quebec, Canada, pp. 1137–1145

    Google Scholar 

  64. Efron B, Tibshirani R (1997) Improvements on cross-validation: the 632+ bootstrap method. J Am Stat Assoc 92:548–560

    Google Scholar 

  65. van de Vijver MJ, He YD, van't Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, Parrish M, Atsma D, Witteveen A, Glas A, Delahaye L, van der Velde T, Bartelink H, Rodenhuis S, Rutgers ET, Friend SH, Bernards R (2002) A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347:1999–2009

    Article  PubMed  Google Scholar 

  66. Sorlie T, Tibshirani R, Parker J, Hastie T, Emrron JS, Nobel A, Deng S, Johnsen H, Pesich R, Geisler S, Demeter J, Perou CM, Lonning PE, Brown PO, Borresen-Dal AL, Botstein D (2003) Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 100:8418–8423

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Ramaswamy S, Ross KN, Lander ES, Golub TR (2003) A molecular signature of metastasis in primary solid tumors. Nat Genet 33:49–54

    Article  CAS  PubMed  Google Scholar 

  68. Ein-Dor LE, Kela I, Getz G, Givol D, Domany E (2005) Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 21:171–178

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudio Lottaz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this protocol

Cite this protocol

Lottaz, C., Gronwald, W., Spang, R., Engelmann, J.C. (2017). High-Dimensional Profiling for Computational Diagnosis. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1526. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6613-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6613-4_12

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6611-0

  • Online ISBN: 978-1-4939-6613-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics