Logistic Regression Modeling on Mass Spectrometry Data in Proteomics Case-Control Discriminant Studies

Mertens, Bart J. A.

doi:10.1007/978-3-319-45809-0_12

Bart J. A. Mertens⁸

Part of the book series: Frontiers in Probability and the Statistical Sciences ((FROPROSTAS))

2969 Accesses

Abstract

We present an adaption of the logistic regression model for the evaluation of mass spectrometry data in proteomics case-control studies. We parameterize the predictor as a linear combination of Gaussian basis functions along the mass/charge axis. The location of these basis functions is treated as a random variable and must be estimated from the data. A fully Bayesian implementation is pursued, which allows the number of functional components within the regression parameter vector to be specified as a random variable. Calculations are implemented through birth–death process modeling. We evaluate the model on data from a block-randomized case-control designed experiment, as well as on a proteomic model-mouse study, which were both carried out at the Leiden University Medical Center. The first experiment compares mass spectra of serum samples of 63 colon cancer patients with spectra from 50 control patients. We present a-posteriori analyses of the fitted models which allow researchers to select specific spectral regions for further investigation and identification of the associated differentially expressed peptides. A sensitivity study is presented which links some of our results to those which may be obtained through standard maximum likelihood logistic regression on principal components reduction for mass spectral data. The second experiment contrasts proteomic spectra from 18 dystrophin-deficient mdx mice with those from 74 controls.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679.
Article MathSciNet MATH Google Scholar
Bartlett, M. S. (1957). A comment on D. V. Lindley’s statistical paradox. Biometrika, 44, 533–534.
Article MathSciNet MATH Google Scholar
Coombes, K. R., Kooman, J. M., Baggerly, K. A., & Kobayashi, R. (2005). Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionisation by denoising spectra with the undecimated discrete wavelet transform. Proteomics, 5, 4107–4117.
Article Google Scholar
de Noo, M. E., Mertens, B. J., Ozalp, A., Bladergroen, M. R., van der Werff, M. P. J., van de Velde, C. J., et al. (2006). Detection of colorectal cancer using MALDI-TOF serum protein profiling. European Journal of Cancer, 42(8), 1068–1076.
Article Google Scholar
Denison, D. G. T., Holmes, C. C., Mallick, B. K., & Smith, A. F. M. (2002). Bayesian methods for nonlinear classification and regression. New York: Wiley.
MATH Google Scholar
Devroye, L. (1986). Non-uniform random variate generation. New York: Springer.
Book MATH Google Scholar
Eilers, P. H. (2004). Parametric time warping. Analytical Chemistry, 76, 404–11.
Article Google Scholar
Goldstein, M., & Smith, A. F. M. (1974). Ridge-type estimators for regression analysis. Journal of the Royal Statistical Society, B, 36(2), 284–291.
MathSciNet MATH Google Scholar
Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711–732.
Article MathSciNet MATH Google Scholar
Holmes, C. C., & Held, L. (2006). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1(1), 145–168.
Article MathSciNet MATH Google Scholar
Jeffreys, H. (1967). Theory of probability. Oxford: Oxford University Press.
MATH Google Scholar
Krzanowski, W. J., Jonathan, P., McCarthy, W. V., & Thomas, M. R. (1995). Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data. Applied Statistics, 44(1), 101–115.
Article MATH Google Scholar
Lindley, D. V. (1957). A statistical paradox. Biometrika, 44, 187–192.
Article MathSciNet MATH Google Scholar
Martens, H., & Naes, T. (1989). Multivariate calibration. Chichester: Wiley.
MATH Google Scholar
Mertens, B. J. A. (2016). Transformation, normalization and batch effect in the analysis of mass spectrometry data for omics studies. In Statistical analysis of proteomics, metabolomics, and lipidomics data using mass spectrometry. New York: Springer.
Google Scholar
Naes, T., Isaksson, T., Fearn, T., & Davies, T. (2002). A user-friendly guide to multivariate calibration and classification. Chichester: NIR Publications.
Google Scholar
Petricoin, E. F. III, Ardekani, A. M., Hitt, B. A., Levine, P. J., Fusaro, V. A., Steinberg, S. M., et al. (2002). Use of proteomic patterns in serum to identify ovarian cancer. Lancet, 359, 572–577.
Article Google Scholar
Ramsay, J. O., & Silverman, B. W. (1997). Functional data analysis. New York: Springer.
Book MATH Google Scholar
Richardson, S., & Green, P. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society, B, 59, 731–792.
Article MathSciNet MATH Google Scholar
Stone, M., & Jonathan, P. (1993). Statistical thinking and technique for QSAR and related studies. Part 1: General theory. Journal of Chemometrics, 7, 455–475.
Article Google Scholar
Stone, M., & Jonathan, P. (1994). Statistical thinking and technique for QSAR and related studies. Part 2: Specific methods. Journal of Chemometrics, 8, 1–20.
Article Google Scholar

Download references

Acknowledgements

This work was supported by funding from the European Community’s Seventh Framework Programme FP7/2011: Marie Curie Initial Training Network MEDIASRES (“Novel Statistical Methodology for Diagnostic/Prognostic and Therapeutic Studies and Systematic Reviews,” www.mediasres-itn.eu) with the Grant Agreement Number 290025 and by funding from the European Union’s Seventh Framework Programme FP7/ Health/F5/2012: MIMOmics (“Methods for Integrated Analysis of Multiple Omics Datasets,” http://www.mimomics.eu) under the Grant Agreement Number 305280.

Author information

Authors and Affiliations

Department of Medical Statistics, Leiden University Medical Center, 9600, 2300 RC, Leiden, The Netherlands
Bart J. A. Mertens

Authors

Bart J. A. Mertens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bart J. A. Mertens .

Editor information

Editors and Affiliations

Department of Biostatistics, University of Florida, Gainesville, Florida, USA
Susmita Datta
Department of Medical Statistics and Bioinformatics, Leiden University Medical Centre, RC Leiden, The Netherlands
Bart J. A. Mertens

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mertens, B.J.A. (2017). Logistic Regression Modeling on Mass Spectrometry Data in Proteomics Case-Control Discriminant Studies. In: Datta, S., Mertens, B. (eds) Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-45809-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-45809-0_12
Published: 16 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45807-6
Online ISBN: 978-3-319-45809-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics