Abstract
Proteomic profiling has the potential to impact the diagnosis, prognosis, and treatment of various diseases. A number of different proteomic technologies are available that allow us to look at many proteins at once, and all of them yield complex data that raise significant quantitative challenges. Inadequate attention to these quantitative issues can prevent these studies from achieving their desired goals, and can even lead to invalid results. In this chapter, we describe various ways the involvement of statisticians or other quantitative scientists in the study team can contribute to the success of proteomic research, and we outline some of the key statistical principles that should guide the experimental design and analysis of such studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
O’Farrell P. H. (1975) High resolution two-dimensional electrophoresis of proteins. Journal of Biological Chemistry 250 4007–4021.
Petricoin, E. F., Ardekani, A. M., Hitt, B. A., Levine, P. J., Fusaro, V. A., Steinberg, S. M., Mills, G. B., Simone, C., Fishman, D. A., Kohn, E. C., and Liotta, L. A. (2002). Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359 527–577.
Sorace, J. M. and Zhan, M. (2004). A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics 4 24.
Baggerly, K. A., Morris, J. S. and Coombes, K. R. (2004). Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 20 777–785.
Diamandis, E. P. (2004a). Proteomic patterns to identify ovarian cancer: 3 years on. Expert Review of Molecular Diagnostics 4 575–577.
Diamandis, E. P. (2004b). Mass spectrometry as a diagnostic and a cancer biomarker discover tool: opportunities and potential problems. Molecular and Cellular Proteomics 3 367–378.
Baggerly K. A., Coombes K. R., and Morris J. S. (2005). Are the NCI/FDA ovarian proteomic data biased? A reply to producers and consumers. Cancer Informatics 1(1) 9–14.
Baggerly K. A., Morris J. S., Edmonson S., and Coombes K. R. (2005). Signal in noise: evaluating reported reproducibility of serum proteomic tests for ovarian cancer. Journal of the National Cancer Institute 97 307–309.
Zhang, Z., Bast, R. C., Yu, Y., Li, J., Sokoll, L. J., Rai, A. J., Rosenzweig, J. M., Cameron, B., Wang, Y. Y., Meng, X., Berchuck, A., Haaften-Day, C. V., Hacker, N. F., Bruijn, H. W. A., Zee A. G. J., Jacobs, I. J., Fung, E. T., and Chan, D. W. (2004). Three biomarkers identified from serum proteomic analysis for the detection of early stage ovarian cancer. Cancer Research 64, 5882–5890.
Hu J., Coombes K. R., Morris J. S., and Baggerly, K. A. (2005). The importance of experimental design in proteomic mass spectrometry experiments: some cautionary tales. Briefings in Genomics and Proteomics 3(4) 322–331.
Coombes, K. R., Fritsche, H. A. Jr., Clarke, C., Chen, J. N., Baggerly, K. A., Morris, J. S., Xiao, L. C., Hung, M. C., and Kuerer, H. M. (2003). Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization. Clinical Chemistry 49 1615–1623.
Conrads, T. P., Fusaro, V. A., Ross, S., Johann, D., Rajapakse, V., Hitt, B. A., Steinberg, S. M., Kohn, E. C., Fishman, D. A., Whitely, G., Barrett, J. C., Liotta, L. A., Petricoin, E. F. III, Veenstra, T. D. (2004). High-resolution serum proteomic features of ovarian cancer detection. Endocrine Related Cancer 11(2) 163–178.
Baggerly K. A., Edmonson S., Morris J. S., and Coombes K. R. (2004). High-resolution serum proteomic patterns for ovarian cancer detection. Endocrine-Related Cancers 11(4) 583–584.
Box, G. E. P., Hunter, W. G., and Hunter, J. S. (2005). Statistics for experimenters: an introduction to design, data analysis, and model building. 2nd ed., Wiley: New York.
Baggerly, K. A., Morris, J. S., Wang, J., Gold, D., Xiao, L. C., and Coombes, K. R. (2003). A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization time of flight proteomics spectra from serum samples. Proteomics 3, 1667–1672.
Diamandis, E. P. (2004c). Analysis of serum proteomic patterns for early cancer diagnosis: drawing attention to potential problems. Journal of the National Cancer Institute 96(5) 353–356.
Coombes K. R., Morris J. S., Hu J., Edmondson S. R., and Baggerly K. A. (2005) Serum proteomics profiling: a young technology begins to mature. Nature Biotechnology 23(3) 291–292.
Coombes, K. R., Tsavachidis, S., Morris, J. S., Baggerly, K. A., Hung, M. C., and Kuerer, H. M. (2005). Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics 5 4107–4117.
Coombes, K. R., Baggerly, K. A., and Morris, J. S. (2007). Preprocessing mass spectrometry data. In: M. Dubitzky, M. Granzow, and D. Berrar (eds) Fundamentals of data mining in genomics and proteomics. Boston: Kluwer, pp 79–99
Morris, J. S., Coombes, K. R., Koomen, J. M., Baggerly, K. A., and Kobayashi, R. (2005). Feature extraction and quantification of mass spectrometry data in biomedical applications using the mean spectrum. Bioinformatics 21(9) 1764–1775.
Karpievitch, Y. V., Hill, E. G., Morris, J. S., Coombes, K. R., Baggerly, K. A., and Almeida, J. S. (2007). PrepMS. Bioinformatics 23(2) 264–265.
Morris, J. S., Clark, B. N., and Gutstein, H. B. (2008). Pinnacle: a fast, automatic method for detecting and quantifying protein spots in 2-dimensional gel electrophoresis data. Bioinformatics 24(4) 529–536.
Morris, J. S., Clark, B. N., Wei, W., and Gutstein, H. B. (2010). Evaluating the performance of new approaches to spot quantification and differential expression in 2-dimensional gel electrophoresis studies. Journal of Proteome Research 9(1) 595–604.
Dupuy A. and Simon R. M. (2007). Critical review of published microarray studies for cancer outcome and guidelines for statistical analysis and reporting. Journal of the National Cancer Institute 99(2) 147–157.
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B: Methodological 57 289–300.
Benjamini, Y. and Liu, W. (1999). A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence. Journal of Statistical Planning and Inference 82 163–170.
Yekutieli, D. and Benjamini, Y. (1999). Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. Journal of Statistical Planning and Inference 82 171–196.
Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B: Statistical Methodology 64 479–498.
Storey, J. D. (2003). The positive false discovery rate: a Bayesian interpretation and the q-value. The Annals of Statistics 31 2013–2035.
Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. Journal of the Royal Statistical Society, Series B: Statistical Methodology 64 499–517.
Ishwaran, H. and Rao, J. S. (2003). Detecting differentially expressed genes in microarrays using Bayesian model selection. Journal of the American Statistical Association 98 438–455.
Pounds, S. and Morris, S. W. (2003). Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics 19 1236–1242.
Efron, B. (2004). Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. Journal of the American Statistical Association 99 96–104.
Newton, M. A. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics (Oxford) 5 155–176.
Pounds, S. and Cheng, C. (2004). Improving false discovery rate estimation. Bioinformatics 20(11) 1737–1745.
Strimmer, K. (2008). Fdrtool: a versitile R package for estimating local and tail area-based false discovery rates. Bioinformatics 24 1461–1462.
Lecocke, M. and Hess, K. (2006). An empiÂrical study of univariate and genetic algorithm-based feature selection in binary classification with microarray data. Cancer Informatics 2 313–327.
Morris, J. S. and Carroll, R. J. (2006). Wavelet-based functional mixed models. Journal of the Royal Statistical Society, Series B 68(2) 179–199.
Morris, J. S., Brown, P. J., Herrick, R. C., Baggerly, K. A., and Coombes, K. R. (2008). Bayesian analysis of mass spectrometry data using wavelet based functional mixed models. Biometrics 12 479–489.
Morris, J. S., Baladandayuthapan, V., Herrick, R. C., Sanna, P., and Gutstein, H. B. (2010). Automated analysis of quantitative image data using isomorphic functional mixed models, with application to proteomics data. UT MD Anderson Cancer Center Department of Biostatistics Working Paper Series. Working Paper 56.
Acknowledgements
JSM’s effort was partially supported by a grant from the NCI, R01 CA107304-01. HBG’s effort was supported by NIDA grants DA18310 (cell–cell signaling neuroproteomics center). HBG and JSM are supported by AA13888.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Morris, J.S., Baggerly, K.A., Gutstein, H.B., Coombes, K.R. (2010). Statistical Contributions to Proteomic Research. In: Rai, A. (eds) The Urinary Proteome. Methods in Molecular Biology, vol 641. Humana Press. https://doi.org/10.1007/978-1-60761-711-2_9
Download citation
DOI: https://doi.org/10.1007/978-1-60761-711-2_9
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-60761-710-5
Online ISBN: 978-1-60761-711-2
eBook Packages: Springer Protocols