Statistical Aspects in Proteomic Biomarker Discovery

Jung, Klaus

doi:10.1007/978-1-4939-3106-4_19

Klaus Jung Ph.D.³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1362))

3913 Accesses
1 Altmetric

Abstract

In the pursuit of a personalized medicine, i.e., the individual treatment of a patient, many medical decision problems are desired to be supported by biomarkers that can help to make a diagnosis, prediction, or prognosis. Proteomic biomarkers are of special interest since they can not only be detected in tissue samples but can also often be easily detected in diverse body fluids. Statistical methods play an important role in the discovery and validation of proteomic biomarkers. They are necessary in the planning of experiments, in the processing of raw signals, and in the final data analysis. This review provides an overview on the most frequent experimental settings including sample size considerations, and focuses on exploratory data analysis and classifier development.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Soares H, Chen Y, Sabbagh M et al (2009) Identifying early markers of Alzheimer’s disease using quantitative multiplex proteomic immunoassay panels. Ann N Y Acad Sci 1180:56–67
Article CAS PubMed Google Scholar
Pan S, Chen R, Brand RE et al (2012) Multiplex targeted proteomic assay for biomarker detection in plasma: a pancreatic cancer biomarker case study. J Proteome Res 11:1937–1948
Article PubMed Central CAS PubMed Google Scholar
Baas T, Baskin CR, Diamond DL et al (2006) Integrated molecular signature of disease: analysis of influenza virus-infected macaques through functional genomics and proteomics. J Virol 80:10813–10828
Article PubMed Central CAS PubMed Google Scholar
Paweletz CP, Trock B, Pennanen M (2001) Proteomic patterns of nipple aspirate fluids obtained by SELDI-TOF: potential for new biomarkers to aid in the diagnosis of breast cancer. Dis Markers 17:301–307
Article PubMed Central CAS PubMed Google Scholar
Li J, Zhang Z, Rosenzweig J et al (2002) Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clin Chem 48:1296–1304
CAS PubMed Google Scholar
Brown JM, Krutzsch H, Shu H et al (2002) Proteomic analysis and identification of new biomarkers and therapeutic targets for invasive ovarian cancer. Proteomics 2:76–84
Article Google Scholar
Wang TJ, Gona P, Larson MG et al (2006) Multiple biomarkers for the prediction of first major cardiovascular events and death. N Engl J Med 355:2631–2639
Article CAS PubMed Google Scholar
Hye A, Lynham S, Thambisetty M et al (2006) Proteome-based plasma biomarkers for Alzheimer’s disease. Brain 129:3042–3050
Article CAS PubMed Google Scholar
Abdi F, Quinn JF, Jankovic J et al (2006) Detection of biomarkers with multiplex quantitative proteomic platform in cerebrospinal fluid of patients with neurodegenerative disorders. J Alzheimers Dis 9:293–348
CAS PubMed Google Scholar
Pisitkun T, Shen R-F, Knepper MA (2004) Identification and proteomic profiling of exosomes in human urine. Proc Natl Acad Sci U S A 101:13368–13373
Article PubMed Central CAS PubMed Google Scholar
Hu S, Arellano M, Boontheung P et al (2008) Salivary proteomics for oral cancer biomarker discovery. Clin Cancer Res 14:6246–6252
Article PubMed Central CAS PubMed Google Scholar
Pavlou MP, Diamandis EP, Blasutig IM (2012) The long journey of cancer biomarkers from bench to clinic. Clin Chem 59:147–157
Article PubMed Google Scholar
Christin C, Bischoff R, Horvatovich P (2011) Data processing pipelines for comprehensive profiling of proteomics samples by label-free LC-MS for biomarker discovery. Talanta 83:1209–1224
Article CAS PubMed Google Scholar
Listgarten J, Emili A (2005) Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry. Mol Cell Proteomics 4:419–434
Article CAS PubMed Google Scholar
Caffrey RE (2010) A review of experimental design best practices for proteomics based biomarker discovery: focus on SELDI-TOF. Methods Mol Biol 641:167–183
Article CAS PubMed Google Scholar
Ward DG, Cheng Y, N’Kontchou G et al (2006) Changes in the serum proteome associated with the development of hepatocellular carcinoma in hepatitis C-related cirrhosis. Br J Cancer 94:287–292
Article PubMed Central CAS PubMed Google Scholar
Artigaud S, Gauthier O, Pichereau V (2013) Identifying differentially expressed proteins in 2-DE experiments: inputs from transcriptomics statistical tools. Bioinformatics 29:2729–2734
Article CAS PubMed Google Scholar
Eisen MB, Spellman PT, Brown PO (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95:14863–14868
Article PubMed Central CAS PubMed Google Scholar
Alaiya AA, Franzén B, Hagman A et al (2002) Molecular classification of borderline ovarian tumours using hierarchical cluster analysis of protein expression profiles. Int J Cancer 98:895–899
Article CAS PubMed Google Scholar
Yanagisawa K, Shyr Y, Xu BJ et al (2003) Proteomic patterns of tumour subsets in non-small-cell lung cancer. Lancet 362:433–439
Article CAS PubMed Google Scholar
Vasseur C, Labadie J, Hébraud M (1999) Differential protein expression by Pseudomonas fragi submitted to various stresses. Electrophoresis 20:2204–2213
Article CAS PubMed Google Scholar
Goodacre R, Heald JK, Kell DB (1999) Characterisation of intact microorganisms using electrospray ionisation mass spectrometry. FEMS Microbiol Lett 176:17–24
Article CAS Google Scholar
Duncan R, Carpenter B, Main LC et al (2008) Characterisation and protein expression profiling of annexins in colorectal cancer. Br J Cancer 98:426–433
Article PubMed Central CAS PubMed Google Scholar
Zhang Y, Wolf-Yadlin A, Ross RL et al (2005) Time-resolved mass spectrometry of tyrosine phosphorylation sites in the epidermal growth factor receptor signaling network reveals dynamic modules. Mol Cell Proteomics 4:1240–1250
Article CAS PubMed Google Scholar
Troyanskaya O, Cantor M, Sherlock G et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520–525
Article CAS PubMed Google Scholar
Jung K, Gannoun A, Sitek B et al (2006) Statistical evaluation of methods for the analysis of dynamic protein expression data from a tumour study. RevStat-Stat J 4:67–80
Google Scholar
Karpievitch YV, Dabney AR, Smith RD (2012) Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics 13:S5
Article PubMed Central CAS PubMed Google Scholar
Frantzi M, Bhat A, Latosinska A (2014) Clinical proteomic biomarkers: relevant issues on study design & technical considerations in biomarker development. Clin Transl Med 3:7
Article PubMed Central PubMed Google Scholar
Pesch B, Brüning T, Johnen G et al (2014) Biomarker research with prospective study designs for the early detection of cancer. Biochim Biophys Acta 1844:874–883
Article CAS PubMed Google Scholar
Gosho M, Nagashima K, Sato Y (2012) Study designs and statistical analyses for biomarker research. Sensors 12:8966–8986
Article PubMed Central CAS PubMed Google Scholar
Dancey JE, Dobbin KK, Groshen S et al (2010) Guidelines of the development and incorporation of biomarker studies in early clinical trials of novel agents. Clin Cancer Res 16:1745–1755
Article CAS PubMed Google Scholar
Smyth GK (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004:Article 3
Google Scholar
Ryu SY, Qian W-J, Camp DG et al (2014) Detecting differential protein expression in large-scale population proteomics. Bioinformatics 30:2741–2746
Article PubMed Central PubMed Google Scholar
Clough T, Thaminy S, Ragg S et al (2012) Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs. BMC Bioinformatics 13:S6
Article PubMed Central CAS PubMed Google Scholar
Listgarten J, Neal RM, Roweis ST et al (2007) Difference detection in LC-MC data for protein biomarker discovery. Bioinformatics 23:e198–e204
Article CAS PubMed Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B 57:289–300
Google Scholar
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1165–1188
Article Google Scholar
Hulsen T, de Vlieg J, Alkema W (2008) BioVenn—a web application for the comparison and visualization of biological lists using area-proportional Venn diagrams. BMC Genomics 9:488
Article PubMed Central PubMed Google Scholar
Choi H, Fermin D, Nesvizhskii AI (2008) Significance analysis of spectral count data in label-free shotgun proteomics. Mol Cell Proteomics 7:2373–2385
Article PubMed Central CAS PubMed Google Scholar
Cairns DA, Barrett JH, Billingham LJ et al (2009) Sample size determination in clinical proteomic profiling experiments using mass spectrometry for class comparison. Proteomics 9:74–86
Article CAS PubMed Google Scholar
Nyangoma SO, Collins SI, Altman D et al (2012) Sample size calculations for designing clinical proteomic profiling studies using mass spectrometry. Stat Appl Genet Mol Biol 11(3)
Google Scholar
A-Shahrour F, Carbonell J, Minguez P et al (2008) Babelomics: advanced functional profiling of transcriptomics, proteomics and genomics experiments. Nucleic Acids Res 36:W341–W346
Article Google Scholar
Cha S, Imielinski MB, Rejtar T et al (2010) In situ proteomic analysis of human breast cancer epithelial cells using laser capture microdissection: annotation by protein set enrichment analysis and gene ontology. Mol Cell Proteomics 9:2529–2544
Article PubMed Central CAS PubMed Google Scholar
Jung K, Dihazi H, Bibi A et al (2014) Adaption of the global test idea to proteomics data with missing values. Bioinformatics 30:1424–1430
Article CAS PubMed Google Scholar
Chen LS, Paul D, Prentice RL et al (2011) A regularized Hotelling’s T² test for pathway analysis in proteomics studies. J Am Stat Assoc 106:1345–1360
Article PubMed Central CAS PubMed Google Scholar
Baggerly KA, Morris JS, Wang J et al (2003) A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization-time of flight proteomics spectra from serum samples. Proteomics 3:1667–1672
Article CAS PubMed Google Scholar
Agranoff D, Fernandez-Reyes D, Papdopoulos MC et al (2006) Identification of diagnostic markers for tuberculosis by proteomic fingerprinting of serum. Lancet 368:1012–1021
Article CAS PubMed Google Scholar
Carlsson A, Wingren C, Ingvarsson J et al (2008) Serum proteome profiling of metastatic breast cancer using recombinant antibody microarrays. Eur J Cancer 44:472–480
Article CAS PubMed Google Scholar
Tibshirani R, Hastie T, Narshimhan B et al (2004) Sample classification from protein mass spectrometry, by ‘peak probability contrasts’. Bioinformatics 20:3034–3044
Article CAS PubMed Google Scholar
Geurts P, Fillet M, de Seny D et al (2005) Proteomic mass spectra classification using decision tree based ensemble methods. Bioinformatics 21:3138–3145
Article CAS PubMed Google Scholar
Wu B, Abbott T, Fishman D et al (2003) Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19:1636–1643
Article CAS PubMed Google Scholar
Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77–87
Article CAS Google Scholar
Lilien RH, Farid H, Donald BR (2010) Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum. J Comput Biol 10:925–946
Article Google Scholar
Karp NA, Griffin JL, Lilley KS (2005) Application of partial least squares discriminant analysis to two-dimensional difference gel studies in expression proteomics. Proteomics 5:81–90
Article CAS PubMed Google Scholar
Binder H, Allignol A, Schumacher M (2009) Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics 25:890–896
Article CAS PubMed Google Scholar
Wang Z, Wang CY (2010) Buckly-James boosting for survival analysis with high-dimensional biomarker data. Stat Appl Genet Mol Biol 9:Article 24
Google Scholar
Brage-Neto U, Dougherty ER (2004) Is cross-validation valid for small sample microarray classification? Bioinformatics 20:374–380
Article Google Scholar
Borra S, Di Ciaccio A (2010) Measuring the prediction error. A comparison of cross validation, bootstrap and covariance penalty methods. Comput Stat Data Anal 54:2976–2989
Article Google Scholar
Pattengalem ND, Alipour M, Binida-Emonds ORP (2010) How many bootstrap replicates are necessary? J Comput Biol 17:337–354
Article Google Scholar
Jung K, Grade M, Gaedcke J et al (2010) A new sensitivity-preferred strategy to build prediction rules for therapy response of cancer patients using gene expression data. Comput Methods Programs Biomed 100:132–139
Article PubMed Google Scholar
Foody GM (2009) Classification accuracy comparison: hypothesis tests and the use of confidence intervals in evaluation of difference, equivalence and non-inferiority. Remote Sens Environ 113:1658–1663
Article Google Scholar
Porzelius C, Schumacher M, Binder H (2010) A general, prediction error-based criterion for selecting model complexity for high-dimensional survival models. Stat Med 29:830–838
Article PubMed Google Scholar
Harrel FE, Lee KL (1984) Regression modelling strategies for improved prognostic prediction. Stat Med 3:143–152
Article Google Scholar
Newson RB (2010) Comparing the predictive power of survival models using Harrell’s C or Somers’ D. Stata J 10:339–358
Google Scholar
Fu WJ, Dougherty ER, Mallick B et al (2005) How many samples are needed to build a classifier: a general sequential approach. Bioinformatics 21:63–70
Article CAS PubMed Google Scholar
Figuera RL, Zeng-Treidler Q, Kandula S et al (2012) Predicting sample size required for classification performance. BMC Med Inform Decis Mak 12:8
Article Google Scholar
Dobbin KK, Simon RM (2006) Sample size planning for developing classifiers using high-dimensional DNA microarray data. Biostatistics 8:101–117
Article PubMed Google Scholar
Fuchs M, Beißbarth T, Wingender E et al (2013) Connecting high-dimensional mRNA and miRNA expression data for binary medical classification problems. Comput Methods Programs Biomed 111:592–601
Article PubMed Google Scholar
Bruns DE (2003) The STARD initiative and the reporting of studies of diagnostic accuracy. Clin Chem 49:19–20
Article CAS PubMed Google Scholar
McShane LM, Altman DG, Sauerbrei W et al (2005) REporting recommendations for tumour MARKer prognostic studies (REMARK). Nat Clin Pract Oncol 2:416–422
Article CAS PubMed Google Scholar
Marot G, Mayer CD (2009) Sequential analysis for microarray data based on sensitivity and meta-analysis. Stat Appl Genet Mol Biol 8:Article 3
Google Scholar
Kolesnikov N, Hastings E, Keays M et al (2015) ArrayExpress update—simplifying data submissions. Nucleic Acids Res 43:D1113–D1116
Article PubMed Central PubMed Google Scholar
Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210
Article PubMed Central CAS PubMed Google Scholar

Download references

Acknowledgements

The author would like to thank Prof Olga Vitek (Northeastern University, Boston) for very helpful comments on a previous version of the manuscript.

Author information

Authors and Affiliations

Department of Medical Statistics, Georg-August-University Göttingen, Humboldtallee 32, 37073, Göttingen, Germany
Klaus Jung Ph.D.

Authors

Klaus Jung Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Klaus Jung Ph.D. .

Editor information

Editors and Affiliations

Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
Klaus Jung

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Jung, K. (2016). Statistical Aspects in Proteomic Biomarker Discovery. In: Jung, K. (eds) Statistical Analysis in Proteomics. Methods in Molecular Biology, vol 1362. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3106-4_19

Download citation

DOI: https://doi.org/10.1007/978-1-4939-3106-4_19
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3105-7
Online ISBN: 978-1-4939-3106-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics