Abstract
We present an adaption of the logistic regression model for the evaluation of mass spectrometry data in proteomics case-control studies. We parameterize the predictor as a linear combination of Gaussian basis functions along the mass/charge axis. The location of these basis functions is treated as a random variable and must be estimated from the data. A fully Bayesian implementation is pursued, which allows the number of functional components within the regression parameter vector to be specified as a random variable. Calculations are implemented through birth–death process modeling. We evaluate the model on data from a block-randomized case-control designed experiment, as well as on a proteomic model-mouse study, which were both carried out at the Leiden University Medical Center. The first experiment compares mass spectra of serum samples of 63 colon cancer patients with spectra from 50 control patients. We present a-posteriori analyses of the fitted models which allow researchers to select specific spectral regions for further investigation and identification of the associated differentially expressed peptides. A sensitivity study is presented which links some of our results to those which may be obtained through standard maximum likelihood logistic regression on principal components reduction for mass spectral data. The second experiment contrasts proteomic spectra from 18 dystrophin-deficient mdx mice with those from 74 controls.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679.
Bartlett, M. S. (1957). A comment on D. V. Lindley’s statistical paradox. Biometrika, 44, 533–534.
Coombes, K. R., Kooman, J. M., Baggerly, K. A., & Kobayashi, R. (2005). Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionisation by denoising spectra with the undecimated discrete wavelet transform. Proteomics, 5, 4107–4117.
de Noo, M. E., Mertens, B. J., Ozalp, A., Bladergroen, M. R., van der Werff, M. P. J., van de Velde, C. J., et al. (2006). Detection of colorectal cancer using MALDI-TOF serum protein profiling. European Journal of Cancer, 42(8), 1068–1076.
Denison, D. G. T., Holmes, C. C., Mallick, B. K., & Smith, A. F. M. (2002). Bayesian methods for nonlinear classification and regression. New York: Wiley.
Devroye, L. (1986). Non-uniform random variate generation. New York: Springer.
Eilers, P. H. (2004). Parametric time warping. Analytical Chemistry, 76, 404–11.
Goldstein, M., & Smith, A. F. M. (1974). Ridge-type estimators for regression analysis. Journal of the Royal Statistical Society, B, 36(2), 284–291.
Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711–732.
Holmes, C. C., & Held, L. (2006). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1(1), 145–168.
Jeffreys, H. (1967). Theory of probability. Oxford: Oxford University Press.
Krzanowski, W. J., Jonathan, P., McCarthy, W. V., & Thomas, M. R. (1995). Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data. Applied Statistics, 44(1), 101–115.
Lindley, D. V. (1957). A statistical paradox. Biometrika, 44, 187–192.
Martens, H., & Naes, T. (1989). Multivariate calibration. Chichester: Wiley.
Mertens, B. J. A. (2016). Transformation, normalization and batch effect in the analysis of mass spectrometry data for omics studies. In Statistical analysis of proteomics, metabolomics, and lipidomics data using mass spectrometry. New York: Springer.
Naes, T., Isaksson, T., Fearn, T., & Davies, T. (2002). A user-friendly guide to multivariate calibration and classification. Chichester: NIR Publications.
Petricoin, E. F. III, Ardekani, A. M., Hitt, B. A., Levine, P. J., Fusaro, V. A., Steinberg, S. M., et al. (2002). Use of proteomic patterns in serum to identify ovarian cancer. Lancet, 359, 572–577.
Ramsay, J. O., & Silverman, B. W. (1997). Functional data analysis. New York: Springer.
Richardson, S., & Green, P. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society, B, 59, 731–792.
Stone, M., & Jonathan, P. (1993). Statistical thinking and technique for QSAR and related studies. Part 1: General theory. Journal of Chemometrics, 7, 455–475.
Stone, M., & Jonathan, P. (1994). Statistical thinking and technique for QSAR and related studies. Part 2: Specific methods. Journal of Chemometrics, 8, 1–20.
Acknowledgements
This work was supported by funding from the European Community’s Seventh Framework Programme FP7/2011: Marie Curie Initial Training Network MEDIASRES (“Novel Statistical Methodology for Diagnostic/Prognostic and Therapeutic Studies and Systematic Reviews,” www.mediasres-itn.eu) with the Grant Agreement Number 290025 and by funding from the European Union’s Seventh Framework Programme FP7/ Health/F5/2012: MIMOmics (“Methods for Integrated Analysis of Multiple Omics Datasets,” http://www.mimomics.eu) under the Grant Agreement Number 305280.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Mertens, B.J.A. (2017). Logistic Regression Modeling on Mass Spectrometry Data in Proteomics Case-Control Discriminant Studies. In: Datta, S., Mertens, B. (eds) Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-45809-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-45809-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45807-6
Online ISBN: 978-3-319-45809-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)