High-Dimensional Profiling for Computational Diagnosis

Lottaz, Claudio; Gronwald, Wolfram; Spang, Rainer; Engelmann, Julia C.

doi:10.1007/978-1-4939-6613-4_12

Claudio Lottaz³,
Wolfram Gronwald³,
Rainer Spang³ &
…
Julia C. Engelmann³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1526))

5109 Accesses
2 Citations
1 Altmetric

Abstract

New technologies allow for high-dimensional profiling of patients. For instance, genome-wide gene expression analysis in tumors or in blood is feasible with microarrays, if all transcripts are known, or even without this restriction using high-throughput RNA sequencing. Other technologies like NMR finger printing allow for high-dimensional profiling of metabolites in blood or urine. Such technologies for high-dimensional patient profiling represent novel possibilities for molecular diagnostics. In clinical profiling studies, researchers aim to predict disease type, survival, or treatment response for new patients using high-dimensional profiles. In this process, they encounter a series of obstacles and pitfalls. We review fundamental issues from machine learning and recommend a procedure for the computational aspects of a clinical profiling study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Roepman P, Wessels LF, Kettelarij N, Kemmeren P, Miles AJ, Lijnzaad P, Tilanus MG, Koole R, Hordijk GJ, van der Vliet PC, Reinders MJ, Slootweg PJ, Holstege FC (2005) An expression profile for diagnosis of lymph node metastases from primary head and neck squamous cell carcinomas. Nat Genet 37:182–186
Article CAS PubMed Google Scholar
Schölkopf B, Smola AJ (2001) Learning with kernels. MIT Press, Cambridge, MA
Google Scholar
Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge
Book Google Scholar
Devroye L, Györfi L, Lugosi L (1996) A probabilistic theory of pattern recognition. Springer, New York
Book Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, New York
Book Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New York
Google Scholar
McLachlan GJ, Do KA, Ambroise C (2004) Analyzing microarray gene expression data. Wiley, New York
Book Google Scholar
Speed T (2003) Statistical analysis of gene expression microarray data. Chapman & Hall/CRC, Boca Raton, FL
Book Google Scholar
Kohlmann A, Kipps TJ, Rassenti LZ, Downing JR, Shurtleff SA, Mills KI, Gilkes AF, Hofmann WK, Basso G, Dell'orto MC, Foà R, Chiaretti S, De Vos J, Rauhut S, Papenhausen PR, Hernández JM, Lumbreras E, Yeoh AE, Koay ES, Li R, Liu WM, Williams PM, Wieczorek L, Haferlach T (2008) An international standardization programme towards the application of gene expression profiling in routine leukaemia diagnostics: the Microarray Innovations in Leukemia study prophase. Br J Haematol 142(5):802–807
Article CAS PubMed PubMed Central Google Scholar
Bacher U, Kohlmann AI, Haferlach T (2009) Perspectives of gene expression profiling for diagnosis and therapy in haematological malignancies. Brief Funct Genomics 8(3):184–193
Article CAS Google Scholar
Haferlach T, Kohlmann A, Schnittger S, Dugas M, Hiddemann W, Kern W, Schoch C (2005) A global approach to the diagnosis of leukemia using gene expression profiling. Blood 106:1189–1198
Article CAS PubMed Google Scholar
van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536
Article PubMed Google Scholar
Cheok MH, Yang W, Pui CH, Downing JR, Cheng C, Naeve CW, Relling MV, Evans WE (2003) Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells. Nat Genet 34:85–90
Article CAS PubMed Google Scholar
West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA, Marks JR, Nevins JR (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci U S A 98:11462–11467
Article CAS PubMed PubMed Central Google Scholar
Wessels LF, Reinders MJ, Hart AA, Veenman CJ, Dai H, He YD, Veer LJ (2005) A protocol for building and evaluating predictors of disease state based on microarray data. Bioinformatics 21:3755–3762
Article CAS PubMed Google Scholar
Dudoit S, Fridlyand J, Speed T (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77–87
Article CAS Google Scholar
Jäger J, Weichenhan D, Ivandic B, Spang R (2005) Early diagnostic marker panel determination for microarray based clinical studies. SAGMB 4, Art 9
Google Scholar
John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: International conference on machine learning. Morgan Kaufmann Publishers, San Francisco, CA, USA, pp 121–129
Google Scholar
Ihaka R, Gentleman RC (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5:299–314
Google Scholar
R Development Core Team (2006) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Google Scholar
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80
Article PubMed PubMed Central Google Scholar
Liu H, Li J, Wong L (2005) Use of extreme patient samples for outcome prediction from gene expression data. Bioinformatics 21(16):3377–3384
Article CAS PubMed Google Scholar
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B Methodol 36:111–147
Google Scholar
Geisser S (1975) The predictive sample reuse method with applications. J Am Stat Assoc 70:320–328
Article Google Scholar
Kohlmann A, Haschke-Becher E, Wimmer B, Huber-Wechselberger A, Meyer-Monard S, Huxol H, Siegler U, Rossier M, Matthes T, Rebsamen M, Chiappe A, Diemand A, Rauhut S, Johnson A, Liu WM, Williams PM, Wieczorek L, Haferlach T (2008) Intraplatform reproducibility and technical precision of gene expression profiling in 4 laboratories investigating 160 leukemia samples: the DACH study. Clin Chem 54(10):1705–1715
Article CAS PubMed Google Scholar
Geiss GK, Bumgarner RE, Birditt B, Dahl T, Dowidar N, Dunaway DL, Fell HP, Ferree S, George RD, Grogan T, James JJ, Maysuria M, Mitton JD, Oliveri P, Osborn JL, Peng T, Ratcliffe AL, Webster PJ, Davidson EH, Hood L, Dimitrov K (2008) Direct multiplexed measurement of gene expression with color-coded probe pairs. Nat Biotechnol 26(3):317–325
Article CAS PubMed Google Scholar
Masqué-Soler N, Szczepanowski M, Kohler CW, Spang R, Klapper W (2013) Molecular classification of mature aggressive B-cell lymphoma using digital multiplexed gene expression on formalin-fixed paraffin-embedded biopsy specimens. Blood 122(11):1985–1986
Article PubMed Google Scholar
Scott DW, Wright GW, Williams PM, Lih C-J, Walsh W, Jaffe ES, Rosenwald A, Campo E, Chan WC, Connors JM, Smeland EB, Mottok A, Braziel RM, Ott G, Delabie J, Tubbs RR, Cook JR, Weisenburger DD, Greiner TC, Glinsmann-Gibson BJ, Fu K, Staudt LM, Gascoyne RD, Rimsza LM (2014) Determining cell-of-origin subtypes of diffuse large B-cell lymphoma using gene expression in formalin-fixed paraffin-embedded tissue. Blood 123(8):1214–1217
Article CAS PubMed PubMed Central Google Scholar
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36
Article PubMed PubMed Central Google Scholar
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63
Article CAS PubMed PubMed Central Google Scholar
Rehrauer H, Opitz L, Tan G, Sieverling L, Schlapbach R (2013) Blind spots of quantitative RNA-seq: the limits for assessing abundance, differential expression, and isoform switching. BMC Bioinformatics 14:370
Article PubMed PubMed Central Google Scholar
Xing Y, Yu T, Wu YN, Roy M, Kim J, Lee C (2006) An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs. Nucleic Acids Res 34(10):3150–3160
Article CAS PubMed PubMed Central Google Scholar
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628
Article CAS PubMed Google Scholar
Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515
Article CAS PubMed PubMed Central Google Scholar
Wagner GP, Kin K, Lynch VJ (2012) Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci 131(4):281–285
Article CAS PubMed Google Scholar
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol 15(12):550
Article PubMed PubMed Central Google Scholar
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140
Article CAS PubMed Google Scholar
Witten DM (2011) Classification and clustering of sequencing data using a Poisson model. Ann Appl Stat 5(4):2493–2518
Article Google Scholar
Klein MS, Buttchereit N, Miemczyk SP, Immervoll AK, Louis C, Wiedemann S, Junge W, Thaller G, Oefner PJ, Gronwald W (2012) NMR metabolomic analysis of dairy cows reveals milk glycerophosphocholine to phosphocholine ratio as prognostic biomarker for risk of ketosis. J Proteome Res 11(2):1373–1381
Article CAS PubMed Google Scholar
Gronwald W, Klein MS, Zeltner R, Schulze BD, Reinhold SW, Deutschmann M, Immervoll AK, Böger CA, Banas B, Eckardt KU, Oefner PJ (2011) Detection of autosomal dominant polycystic kidney disease by NMR spectroscopic fingerprinting of urine. Kidney Int 79:1244–1253
Article CAS PubMed Google Scholar
Ernst RR, Bodenhausen G, Wokaun A (1987) Principles of nuclear magnetic resonance in one and two dimensions. Oxford University Press, London
Google Scholar
Savorani F, Tomasi G, Engelsen SB (2010) Icoshift: a versatile tool for the rapid alignment of 1D NMR spectra. J Magn Reson 202:190–202
Article CAS PubMed Google Scholar
Huber W, Heydebreck AV, Sültmann H, Poustka A, Vingron M (2002) Variance stabilisation applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18:96–104
Article Google Scholar
Bolstad BM, Irizarry RA, Astrand M, Speed TP (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19:185–193
Article CAS PubMed Google Scholar
Kohl SM, Klein MS, Hochrein J, Oefner PJ, Spang R, Gronwald W (2012) State-of-the art data normalization methods improve NMR-based metabolomic analysis. Metabolomics 8:146–160
Article CAS PubMed Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article Google Scholar
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2:121–167
Article Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
Google Scholar
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Book Google Scholar
Hochrein J, Klein MS, Zacharias HU, Li J, Wijffels G, Schirra HJ, Spang R, Oefner PJ, Gronwald W (2012) Performance evaluation of algorithms for the classification of metabolic 1H-NMR fingerprints. J Proteome Res 11:6242–6251
CAS PubMed Google Scholar
Simon R, Radmacher MD, Dobbin K, McShane LM (2003) Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst 95:14–18
Article CAS PubMed Google Scholar
Ntzani EE, Ioannidis JPA (2003) Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment. Lancet 362:1439–1444
Article CAS PubMed Google Scholar
Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci U S A 99:6562–6566
Article CAS PubMed PubMed Central Google Scholar
Reid JF, Lusa L, De Cecco L, Coradini D, Veneroni S, Daidone MG, Gariboldi M, Pierotti MA (2005) Limits of predictive models using microarray data for breast cancer clinical treatment outcome. J Natl Cancer Inst 97:927–930
Article CAS PubMed Google Scholar
Michiels S, Koscielny S, Hill C (2005) Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365:488–492
Article CAS PubMed Google Scholar
Dudoit S (2003) Introduction to multiple hypothesis testing. Biostatistics Division, California University, Berkeley CA, USA
Google Scholar
Tibshirani R, Hastie T, Narasimhan B, Chu G (2003) Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Stat Sci 18:104–117
Article Google Scholar
Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 99:6567–6572
Article CAS PubMed PubMed Central Google Scholar
Huang X, Pan W (2003) Linear regression and two-class classification with gene expression data. Bioinformatics 19:2072–2078
Article CAS PubMed Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
Article Google Scholar
Ruschhaupt M, Huber W, Poustka A, Mansmann U (2004) A compendium to ensure computational reproducibility in high-dimensional classification tasks. Stat Appl Genet Mol Biol 3:37
Google Scholar
Braga-Neto UM, Dougherty ER (2004) Is cross-validation valid for small-sample microarray classification? Bioinformatics 20:374–380
Article CAS PubMed Google Scholar
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and Model Selection. In International joint conference on artificial intelligence, Montreal, Quebec, Canada, pp. 1137–1145
Google Scholar
Efron B, Tibshirani R (1997) Improvements on cross-validation: the 632+ bootstrap method. J Am Stat Assoc 92:548–560
Google Scholar
van de Vijver MJ, He YD, van't Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, Parrish M, Atsma D, Witteveen A, Glas A, Delahaye L, van der Velde T, Bartelink H, Rodenhuis S, Rutgers ET, Friend SH, Bernards R (2002) A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347:1999–2009
Article PubMed Google Scholar
Sorlie T, Tibshirani R, Parker J, Hastie T, Emrron JS, Nobel A, Deng S, Johnsen H, Pesich R, Geisler S, Demeter J, Perou CM, Lonning PE, Brown PO, Borresen-Dal AL, Botstein D (2003) Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 100:8418–8423
Article CAS PubMed PubMed Central Google Scholar
Ramaswamy S, Ross KN, Lander ES, Golub TR (2003) A molecular signature of metastasis in primary solid tumors. Nat Genet 33:49–54
Article CAS PubMed Google Scholar
Ein-Dor LE, Kela I, Getz G, Givol D, Domany E (2005) Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 21:171–178
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
Claudio Lottaz, Wolfram Gronwald, Rainer Spang & Julia C. Engelmann

Authors

Claudio Lottaz
View author publications
You can also search for this author in PubMed Google Scholar
Wolfram Gronwald
View author publications
You can also search for this author in PubMed Google Scholar
Rainer Spang
View author publications
You can also search for this author in PubMed Google Scholar
Julia C. Engelmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Claudio Lottaz .

Editor information

Editors and Affiliations

Monash University, Melbourne, Victoria, Australia
Jonathan M. Keith

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Lottaz, C., Gronwald, W., Spang, R., Engelmann, J.C. (2017). High-Dimensional Profiling for Computational Diagnosis. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1526. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6613-4_12

Download citation

DOI: https://doi.org/10.1007/978-1-4939-6613-4_12
Published: 29 November 2016
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6611-0
Online ISBN: 978-1-4939-6613-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics