Skip to main content

Statistical Aspects in Proteomic Biomarker Discovery

  • Protocol
Statistical Analysis in Proteomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1362))

Abstract

In the pursuit of a personalized medicine, i.e., the individual treatment of a patient, many medical decision problems are desired to be supported by biomarkers that can help to make a diagnosis, prediction, or prognosis. Proteomic biomarkers are of special interest since they can not only be detected in tissue samples but can also often be easily detected in diverse body fluids. Statistical methods play an important role in the discovery and validation of proteomic biomarkers. They are necessary in the planning of experiments, in the processing of raw signals, and in the final data analysis. This review provides an overview on the most frequent experimental settings including sample size considerations, and focuses on exploratory data analysis and classifier development.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Soares H, Chen Y, Sabbagh M et al (2009) Identifying early markers of Alzheimer’s disease using quantitative multiplex proteomic immunoassay panels. Ann N Y Acad Sci 1180:56–67

    Article  CAS  PubMed  Google Scholar 

  2. Pan S, Chen R, Brand RE et al (2012) Multiplex targeted proteomic assay for biomarker detection in plasma: a pancreatic cancer biomarker case study. J Proteome Res 11:1937–1948

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  3. Baas T, Baskin CR, Diamond DL et al (2006) Integrated molecular signature of disease: analysis of influenza virus-infected macaques through functional genomics and proteomics. J Virol 80:10813–10828

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  4. Paweletz CP, Trock B, Pennanen M (2001) Proteomic patterns of nipple aspirate fluids obtained by SELDI-TOF: potential for new biomarkers to aid in the diagnosis of breast cancer. Dis Markers 17:301–307

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  5. Li J, Zhang Z, Rosenzweig J et al (2002) Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clin Chem 48:1296–1304

    CAS  PubMed  Google Scholar 

  6. Brown JM, Krutzsch H, Shu H et al (2002) Proteomic analysis and identification of new biomarkers and therapeutic targets for invasive ovarian cancer. Proteomics 2:76–84

    Article  Google Scholar 

  7. Wang TJ, Gona P, Larson MG et al (2006) Multiple biomarkers for the prediction of first major cardiovascular events and death. N Engl J Med 355:2631–2639

    Article  CAS  PubMed  Google Scholar 

  8. Hye A, Lynham S, Thambisetty M et al (2006) Proteome-based plasma biomarkers for Alzheimer’s disease. Brain 129:3042–3050

    Article  CAS  PubMed  Google Scholar 

  9. Abdi F, Quinn JF, Jankovic J et al (2006) Detection of biomarkers with multiplex quantitative proteomic platform in cerebrospinal fluid of patients with neurodegenerative disorders. J Alzheimers Dis 9:293–348

    CAS  PubMed  Google Scholar 

  10. Pisitkun T, Shen R-F, Knepper MA (2004) Identification and proteomic profiling of exosomes in human urine. Proc Natl Acad Sci U S A 101:13368–13373

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  11. Hu S, Arellano M, Boontheung P et al (2008) Salivary proteomics for oral cancer biomarker discovery. Clin Cancer Res 14:6246–6252

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  12. Pavlou MP, Diamandis EP, Blasutig IM (2012) The long journey of cancer biomarkers from bench to clinic. Clin Chem 59:147–157

    Article  PubMed  Google Scholar 

  13. Christin C, Bischoff R, Horvatovich P (2011) Data processing pipelines for comprehensive profiling of proteomics samples by label-free LC-MS for biomarker discovery. Talanta 83:1209–1224

    Article  CAS  PubMed  Google Scholar 

  14. Listgarten J, Emili A (2005) Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry. Mol Cell Proteomics 4:419–434

    Article  CAS  PubMed  Google Scholar 

  15. Caffrey RE (2010) A review of experimental design best practices for proteomics based biomarker discovery: focus on SELDI-TOF. Methods Mol Biol 641:167–183

    Article  CAS  PubMed  Google Scholar 

  16. Ward DG, Cheng Y, N’Kontchou G et al (2006) Changes in the serum proteome associated with the development of hepatocellular carcinoma in hepatitis C-related cirrhosis. Br J Cancer 94:287–292

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  17. Artigaud S, Gauthier O, Pichereau V (2013) Identifying differentially expressed proteins in 2-DE experiments: inputs from transcriptomics statistical tools. Bioinformatics 29:2729–2734

    Article  CAS  PubMed  Google Scholar 

  18. Eisen MB, Spellman PT, Brown PO (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95:14863–14868

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  19. Alaiya AA, Franzén B, Hagman A et al (2002) Molecular classification of borderline ovarian tumours using hierarchical cluster analysis of protein expression profiles. Int J Cancer 98:895–899

    Article  CAS  PubMed  Google Scholar 

  20. Yanagisawa K, Shyr Y, Xu BJ et al (2003) Proteomic patterns of tumour subsets in non-small-cell lung cancer. Lancet 362:433–439

    Article  CAS  PubMed  Google Scholar 

  21. Vasseur C, Labadie J, Hébraud M (1999) Differential protein expression by Pseudomonas fragi submitted to various stresses. Electrophoresis 20:2204–2213

    Article  CAS  PubMed  Google Scholar 

  22. Goodacre R, Heald JK, Kell DB (1999) Characterisation of intact microorganisms using electrospray ionisation mass spectrometry. FEMS Microbiol Lett 176:17–24

    Article  CAS  Google Scholar 

  23. Duncan R, Carpenter B, Main LC et al (2008) Characterisation and protein expression profiling of annexins in colorectal cancer. Br J Cancer 98:426–433

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  24. Zhang Y, Wolf-Yadlin A, Ross RL et al (2005) Time-resolved mass spectrometry of tyrosine phosphorylation sites in the epidermal growth factor receptor signaling network reveals dynamic modules. Mol Cell Proteomics 4:1240–1250

    Article  CAS  PubMed  Google Scholar 

  25. Troyanskaya O, Cantor M, Sherlock G et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520–525

    Article  CAS  PubMed  Google Scholar 

  26. Jung K, Gannoun A, Sitek B et al (2006) Statistical evaluation of methods for the analysis of dynamic protein expression data from a tumour study. RevStat-Stat J 4:67–80

    Google Scholar 

  27. Karpievitch YV, Dabney AR, Smith RD (2012) Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics 13:S5

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  28. Frantzi M, Bhat A, Latosinska A (2014) Clinical proteomic biomarkers: relevant issues on study design & technical considerations in biomarker development. Clin Transl Med 3:7

    Article  PubMed Central  PubMed  Google Scholar 

  29. Pesch B, Brüning T, Johnen G et al (2014) Biomarker research with prospective study designs for the early detection of cancer. Biochim Biophys Acta 1844:874–883

    Article  CAS  PubMed  Google Scholar 

  30. Gosho M, Nagashima K, Sato Y (2012) Study designs and statistical analyses for biomarker research. Sensors 12:8966–8986

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  31. Dancey JE, Dobbin KK, Groshen S et al (2010) Guidelines of the development and incorporation of biomarker studies in early clinical trials of novel agents. Clin Cancer Res 16:1745–1755

    Article  CAS  PubMed  Google Scholar 

  32. Smyth GK (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004:Article 3

    Google Scholar 

  33. Ryu SY, Qian W-J, Camp DG et al (2014) Detecting differential protein expression in large-scale population proteomics. Bioinformatics 30:2741–2746

    Article  PubMed Central  PubMed  Google Scholar 

  34. Clough T, Thaminy S, Ragg S et al (2012) Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs. BMC Bioinformatics 13:S6

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  35. Listgarten J, Neal RM, Roweis ST et al (2007) Difference detection in LC-MC data for protein biomarker discovery. Bioinformatics 23:e198–e204

    Article  CAS  PubMed  Google Scholar 

  36. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B 57:289–300

    Google Scholar 

  37. Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1165–1188

    Article  Google Scholar 

  38. Hulsen T, de Vlieg J, Alkema W (2008) BioVenn—a web application for the comparison and visualization of biological lists using area-proportional Venn diagrams. BMC Genomics 9:488

    Article  PubMed Central  PubMed  Google Scholar 

  39. Choi H, Fermin D, Nesvizhskii AI (2008) Significance analysis of spectral count data in label-free shotgun proteomics. Mol Cell Proteomics 7:2373–2385

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  40. Cairns DA, Barrett JH, Billingham LJ et al (2009) Sample size determination in clinical proteomic profiling experiments using mass spectrometry for class comparison. Proteomics 9:74–86

    Article  CAS  PubMed  Google Scholar 

  41. Nyangoma SO, Collins SI, Altman D et al (2012) Sample size calculations for designing clinical proteomic profiling studies using mass spectrometry. Stat Appl Genet Mol Biol 11(3)

    Google Scholar 

  42. A-Shahrour F, Carbonell J, Minguez P et al (2008) Babelomics: advanced functional profiling of transcriptomics, proteomics and genomics experiments. Nucleic Acids Res 36:W341–W346

    Article  Google Scholar 

  43. Cha S, Imielinski MB, Rejtar T et al (2010) In situ proteomic analysis of human breast cancer epithelial cells using laser capture microdissection: annotation by protein set enrichment analysis and gene ontology. Mol Cell Proteomics 9:2529–2544

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  44. Jung K, Dihazi H, Bibi A et al (2014) Adaption of the global test idea to proteomics data with missing values. Bioinformatics 30:1424–1430

    Article  CAS  PubMed  Google Scholar 

  45. Chen LS, Paul D, Prentice RL et al (2011) A regularized Hotelling’s T2 test for pathway analysis in proteomics studies. J Am Stat Assoc 106:1345–1360

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  46. Baggerly KA, Morris JS, Wang J et al (2003) A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization-time of flight proteomics spectra from serum samples. Proteomics 3:1667–1672

    Article  CAS  PubMed  Google Scholar 

  47. Agranoff D, Fernandez-Reyes D, Papdopoulos MC et al (2006) Identification of diagnostic markers for tuberculosis by proteomic fingerprinting of serum. Lancet 368:1012–1021

    Article  CAS  PubMed  Google Scholar 

  48. Carlsson A, Wingren C, Ingvarsson J et al (2008) Serum proteome profiling of metastatic breast cancer using recombinant antibody microarrays. Eur J Cancer 44:472–480

    Article  CAS  PubMed  Google Scholar 

  49. Tibshirani R, Hastie T, Narshimhan B et al (2004) Sample classification from protein mass spectrometry, by ‘peak probability contrasts’. Bioinformatics 20:3034–3044

    Article  CAS  PubMed  Google Scholar 

  50. Geurts P, Fillet M, de Seny D et al (2005) Proteomic mass spectra classification using decision tree based ensemble methods. Bioinformatics 21:3138–3145

    Article  CAS  PubMed  Google Scholar 

  51. Wu B, Abbott T, Fishman D et al (2003) Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19:1636–1643

    Article  CAS  PubMed  Google Scholar 

  52. Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77–87

    Article  CAS  Google Scholar 

  53. Lilien RH, Farid H, Donald BR (2010) Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum. J Comput Biol 10:925–946

    Article  Google Scholar 

  54. Karp NA, Griffin JL, Lilley KS (2005) Application of partial least squares discriminant analysis to two-dimensional difference gel studies in expression proteomics. Proteomics 5:81–90

    Article  CAS  PubMed  Google Scholar 

  55. Binder H, Allignol A, Schumacher M (2009) Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics 25:890–896

    Article  CAS  PubMed  Google Scholar 

  56. Wang Z, Wang CY (2010) Buckly-James boosting for survival analysis with high-dimensional biomarker data. Stat Appl Genet Mol Biol 9:Article 24

    Google Scholar 

  57. Brage-Neto U, Dougherty ER (2004) Is cross-validation valid for small sample microarray classification? Bioinformatics 20:374–380

    Article  Google Scholar 

  58. Borra S, Di Ciaccio A (2010) Measuring the prediction error. A comparison of cross validation, bootstrap and covariance penalty methods. Comput Stat Data Anal 54:2976–2989

    Article  Google Scholar 

  59. Pattengalem ND, Alipour M, Binida-Emonds ORP (2010) How many bootstrap replicates are necessary? J Comput Biol 17:337–354

    Article  Google Scholar 

  60. Jung K, Grade M, Gaedcke J et al (2010) A new sensitivity-preferred strategy to build prediction rules for therapy response of cancer patients using gene expression data. Comput Methods Programs Biomed 100:132–139

    Article  PubMed  Google Scholar 

  61. Foody GM (2009) Classification accuracy comparison: hypothesis tests and the use of confidence intervals in evaluation of difference, equivalence and non-inferiority. Remote Sens Environ 113:1658–1663

    Article  Google Scholar 

  62. Porzelius C, Schumacher M, Binder H (2010) A general, prediction error-based criterion for selecting model complexity for high-dimensional survival models. Stat Med 29:830–838

    Article  PubMed  Google Scholar 

  63. Harrel FE, Lee KL (1984) Regression modelling strategies for improved prognostic prediction. Stat Med 3:143–152

    Article  Google Scholar 

  64. Newson RB (2010) Comparing the predictive power of survival models using Harrell’s C or Somers’ D. Stata J 10:339–358

    Google Scholar 

  65. Fu WJ, Dougherty ER, Mallick B et al (2005) How many samples are needed to build a classifier: a general sequential approach. Bioinformatics 21:63–70

    Article  CAS  PubMed  Google Scholar 

  66. Figuera RL, Zeng-Treidler Q, Kandula S et al (2012) Predicting sample size required for classification performance. BMC Med Inform Decis Mak 12:8

    Article  Google Scholar 

  67. Dobbin KK, Simon RM (2006) Sample size planning for developing classifiers using high-dimensional DNA microarray data. Biostatistics 8:101–117

    Article  PubMed  Google Scholar 

  68. Fuchs M, Beißbarth T, Wingender E et al (2013) Connecting high-dimensional mRNA and miRNA expression data for binary medical classification problems. Comput Methods Programs Biomed 111:592–601

    Article  PubMed  Google Scholar 

  69. Bruns DE (2003) The STARD initiative and the reporting of studies of diagnostic accuracy. Clin Chem 49:19–20

    Article  CAS  PubMed  Google Scholar 

  70. McShane LM, Altman DG, Sauerbrei W et al (2005) REporting recommendations for tumour MARKer prognostic studies (REMARK). Nat Clin Pract Oncol 2:416–422

    Article  CAS  PubMed  Google Scholar 

  71. Marot G, Mayer CD (2009) Sequential analysis for microarray data based on sensitivity and meta-analysis. Stat Appl Genet Mol Biol 8:Article 3

    Google Scholar 

  72. Kolesnikov N, Hastings E, Keays M et al (2015) ArrayExpress update—simplifying data submissions. Nucleic Acids Res 43:D1113–D1116

    Article  PubMed Central  PubMed  Google Scholar 

  73. Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210

    Article  PubMed Central  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The author would like to thank Prof Olga Vitek (Northeastern University, Boston) for very helpful comments on a previous version of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Klaus Jung Ph.D. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Jung, K. (2016). Statistical Aspects in Proteomic Biomarker Discovery. In: Jung, K. (eds) Statistical Analysis in Proteomics. Methods in Molecular Biology, vol 1362. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3106-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-3106-4_19

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-3105-7

  • Online ISBN: 978-1-4939-3106-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics