Statistical Contributions to Proteomic Research

  • Jeffrey S. Morris
  • Keith A. Baggerly
  • Howard B. Gutstein
  • Kevin R. Coombes
Part of the Methods in Molecular Biology book series (MIMB, volume 641)


Proteomic profiling has the potential to impact the diagnosis, prognosis, and treatment of various diseases. A number of different proteomic technologies are available that allow us to look at many proteins at once, and all of them yield complex data that raise significant quantitative challenges. Inadequate attention to these quantitative issues can prevent these studies from achieving their desired goals, and can even lead to invalid results. In this chapter, we describe various ways the involvement of statisticians or other quantitative scientists in the study team can contribute to the success of proteomic research, and we outline some of the key statistical principles that should guide the experimental design and analysis of such studies.

Key words

Blocking Data preprocessing Experimental design False discovery rate Image processing Mass spectrometry Peak detection Randomization Spot detection 2D gel electrophoresis Validation 


  1. 1.
    O’Farrell P. H. (1975) High resolution two-dimensional electrophoresis of proteins. Journal of Biological Chemistry 250 4007–4021.PubMedGoogle Scholar
  2. 2.
    Petricoin, E. F., Ardekani, A. M., Hitt, B. A., Levine, P. J., Fusaro, V. A., Steinberg, S. M., Mills, G. B., Simone, C., Fishman, D. A., Kohn, E. C., and Liotta, L. A. (2002). Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359 527–577.CrossRefGoogle Scholar
  3. 3.
    Sorace, J. M. and Zhan, M. (2004). A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics 4 24.Google Scholar
  4. 4.
    Baggerly, K. A., Morris, J. S. and Coombes, K. R. (2004). Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 20 777–785.CrossRefPubMedGoogle Scholar
  5. 5.
    Diamandis, E. P. (2004a). Proteomic patterns to identify ovarian cancer: 3 years on. Expert Review of Molecular Diagnostics 4 575–577.Google Scholar
  6. 6.
    Diamandis, E. P. (2004b). Mass spectrometry as a diagnostic and a cancer biomarker discover tool: opportunities and potential problems. Molecular and Cellular Proteomics 3 367–378.CrossRefPubMedGoogle Scholar
  7. 7.
    Baggerly K. A., Coombes K. R., and Morris J. S. (2005). Are the NCI/FDA ovarian proteomic data biased? A reply to producers and consumers. Cancer Informatics 1(1) 9–14.Google Scholar
  8. 8.
    Baggerly K. A., Morris J. S., Edmonson S., and Coombes K. R. (2005). Signal in noise: evaluating reported reproducibility of serum proteomic tests for ovarian cancer. Journal of the National Cancer Institute 97 307–309.CrossRefPubMedGoogle Scholar
  9. 9.
    Zhang, Z., Bast, R. C., Yu, Y., Li, J., Sokoll, L. J., Rai, A. J., Rosenzweig, J. M., Cameron, B., Wang, Y. Y., Meng, X., Berchuck, A., Haaften-Day, C. V., Hacker, N. F., Bruijn, H. W. A., Zee A. G. J., Jacobs, I. J., Fung, E. T., and Chan, D. W. (2004). Three biomarkers identified from serum proteomic analysis for the detection of early stage ovarian cancer. Cancer Research 64, 5882–5890.CrossRefPubMedGoogle Scholar
  10. 10.
    Hu J., Coombes K. R., Morris J. S., and Baggerly, K. A. (2005). The importance of experimental design in proteomic mass spectrometry experiments: some cautionary tales. Briefings in Genomics and Proteomics 3(4) 322–331.CrossRefGoogle Scholar
  11. 11.
    Coombes, K. R., Fritsche, H. A. Jr., Clarke, C., Chen, J. N., Baggerly, K. A., Morris, J. S., Xiao, L. C., Hung, M. C., and Kuerer, H. M. (2003). Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization. Clinical Chemistry 49 1615–1623.CrossRefPubMedGoogle Scholar
  12. 12.
    Conrads, T. P., Fusaro, V. A., Ross, S., Johann, D., Rajapakse, V., Hitt, B. A., Steinberg, S. M., Kohn, E. C., Fishman, D. A., Whitely, G., Barrett, J. C., Liotta, L. A., Petricoin, E. F. III, Veenstra, T. D. (2004). High-resolution serum proteomic features of ovarian cancer detection. Endocrine Related Cancer 11(2) 163–178.CrossRefPubMedGoogle Scholar
  13. 13.
    Baggerly K. A., Edmonson S., Morris J. S., and Coombes K. R. (2004). High-resolution serum proteomic patterns for ovarian cancer detection. Endocrine-Related Cancers 11(4) 583–584.CrossRefGoogle Scholar
  14. 14.
    Box, G. E. P., Hunter, W. G., and Hunter, J. S. (2005). Statistics for experimenters: an introduction to design, data analysis, and model building. 2nd ed., Wiley: New York.Google Scholar
  15. 15.
    Baggerly, K. A., Morris, J. S., Wang, J., Gold, D., Xiao, L. C., and Coombes, K. R. (2003). A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization time of flight proteomics spectra from serum samples. Proteomics 3, 1667–1672.CrossRefPubMedGoogle Scholar
  16. 16.
    Diamandis, E. P. (2004c). Analysis of serum proteomic patterns for early cancer diagnosis: drawing attention to potential problems. Journal of the National Cancer Institute 96(5) 353–356.CrossRefPubMedGoogle Scholar
  17. 17.
    Coombes K. R., Morris J. S., Hu J., Edmondson S. R., and Baggerly K. A. (2005) Serum proteomics profiling: a young technology begins to mature. Nature Biotechnology 23(3) 291–292.CrossRefPubMedGoogle Scholar
  18. 18.
    Coombes, K. R., Tsavachidis, S., Morris, J. S., Baggerly, K. A., Hung, M. C., and Kuerer, H. M. (2005). Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics 5 4107–4117.CrossRefPubMedGoogle Scholar
  19. 19.
    Coombes, K. R., Baggerly, K. A., and Morris, J. S. (2007). Preprocessing mass spectrometry data. In: M. Dubitzky, M. Granzow, and D. Berrar (eds) Fundamentals of data mining in genomics and proteomics. Boston: Kluwer, pp 79–99Google Scholar
  20. 20.
    Morris, J. S., Coombes, K. R., Koomen, J. M., Baggerly, K. A., and Kobayashi, R. (2005). Feature extraction and quantification of mass spectrometry data in biomedical applications using the mean spectrum. Bioinformatics 21(9) 1764–1775.CrossRefPubMedGoogle Scholar
  21. 21.
    Karpievitch, Y. V., Hill, E. G., Morris, J. S., Coombes, K. R., Baggerly, K. A., and Almeida, J. S. (2007). PrepMS. Bioinformatics 23(2) 264–265.Google Scholar
  22. 22.
    Morris, J. S., Clark, B. N., and Gutstein, H. B. (2008). Pinnacle: a fast, automatic method for detecting and quantifying protein spots in 2-dimensional gel electrophoresis data. Bioinformatics 24(4) 529–536.CrossRefPubMedGoogle Scholar
  23. 23.
    Morris, J. S., Clark, B. N., Wei, W., and Gutstein, H. B. (2010). Evaluating the performance of new approaches to spot quantification and differential expression in 2-dimensional gel electrophoresis studies. Journal of Proteome Research 9(1) 595–604.Google Scholar
  24. 24.
    Dupuy A. and Simon R. M. (2007). Critical review of published microarray studies for cancer outcome and guidelines for statistical analysis and reporting. Journal of the National Cancer Institute 99(2) 147–157.CrossRefPubMedGoogle Scholar
  25. 25.
    Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B: Methodological 57 289–300.Google Scholar
  26. 26.
    Benjamini, Y. and Liu, W. (1999). A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence. Journal of Statistical Planning and Inference 82 163–170.CrossRefGoogle Scholar
  27. 27.
    Yekutieli, D. and Benjamini, Y. (1999). Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. Journal of Statistical Planning and Inference 82 171–196.CrossRefGoogle Scholar
  28. 28.
    Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B: Statistical Methodology 64 479–498.CrossRefGoogle Scholar
  29. 29.
    Storey, J. D. (2003). The positive false discovery rate: a Bayesian interpretation and the q-value. The Annals of Statistics 31 2013–2035.CrossRefGoogle Scholar
  30. 30.
    Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. Journal of the Royal Statistical Society, Series B: Statistical Methodology 64 499–517.CrossRefGoogle Scholar
  31. 31.
    Ishwaran, H. and Rao, J. S. (2003). Detecting differentially expressed genes in microarrays using Bayesian model selection. Journal of the American Statistical Association 98 438–455.CrossRefGoogle Scholar
  32. 32.
    Pounds, S. and Morris, S. W. (2003). Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics 19 1236–1242.CrossRefPubMedGoogle Scholar
  33. 33.
    Efron, B. (2004). Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. Journal of the American Statistical Association 99 96–104.CrossRefGoogle Scholar
  34. 34.
    Newton, M. A. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics (Oxford) 5 155–176.Google Scholar
  35. 35.
    Pounds, S. and Cheng, C. (2004). Improving false discovery rate estimation. Bioinformatics 20(11) 1737–1745.CrossRefPubMedGoogle Scholar
  36. 36.
    Strimmer, K. (2008). Fdrtool: a versitile R package for estimating local and tail area-based false discovery rates. Bioinformatics 24 1461–1462.CrossRefPubMedGoogle Scholar
  37. 37.
    Lecocke, M. and Hess, K. (2006). An empi­rical study of univariate and genetic algorithm-based feature selection in binary classification with microarray data. Cancer Informatics 2 313–327.Google Scholar
  38. Morris, J. S. and Carroll, R. J. (2006). Wavelet-based functional mixed models. Journal of the Royal Statistical Society, Series B 68(2) 179–199.Google Scholar
  39. Morris, J. S., Brown, P. J., Herrick, R. C., Baggerly, K. A., and Coombes, K. R. (2008). Bayesian analysis of mass spectrometry data using wavelet based functional mixed models. Biometrics 12 479–489.Google Scholar
  40. Morris, J. S., Baladandayuthapan, V., Herrick, R. C., Sanna, P., and Gutstein, H. B. (2010). Automated analysis of quantitative image data using isomorphic functional mixed models, with application to proteomics data. UT MD Anderson Cancer Center Department of Biostatistics Working Paper Series. Working Paper 56.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Jeffrey S. Morris
    • 1
  • Keith A. Baggerly
    • 2
  • Howard B. Gutstein
    • 3
  • Kevin R. Coombes
    • 2
  1. 1.Department of BiostatisticsThe University of Texas M. D. Anderson Cancer CenterHoustonUSA
  2. 2.Department of Bioinformatics and Computational BiologyThe University of Texas M. D. Anderson Cancer CenterHoustonUSA
  3. 3.Department of Anesthesiology and Perioperative MedicineThe University of Texas M. D. Anderson Cancer CenterHoustonUSA

Personalised recommendations