Skip to main content

Logistic Regression Modeling on Mass Spectrometry Data in Proteomics Case-Control Discriminant Studies

  • Chapter
  • First Online:
Book cover Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry

Part of the book series: Frontiers in Probability and the Statistical Sciences ((FROPROSTAS))

  • 2969 Accesses

Abstract

We present an adaption of the logistic regression model for the evaluation of mass spectrometry data in proteomics case-control studies. We parameterize the predictor as a linear combination of Gaussian basis functions along the mass/charge axis. The location of these basis functions is treated as a random variable and must be estimated from the data. A fully Bayesian implementation is pursued, which allows the number of functional components within the regression parameter vector to be specified as a random variable. Calculations are implemented through birth–death process modeling. We evaluate the model on data from a block-randomized case-control designed experiment, as well as on a proteomic model-mouse study, which were both carried out at the Leiden University Medical Center. The first experiment compares mass spectra of serum samples of 63 colon cancer patients with spectra from 50 control patients. We present a-posteriori analyses of the fitted models which allow researchers to select specific spectral regions for further investigation and identification of the associated differentially expressed peptides. A sensitivity study is presented which links some of our results to those which may be obtained through standard maximum likelihood logistic regression on principal components reduction for mass spectral data. The second experiment contrasts proteomic spectra from 18 dystrophin-deficient mdx mice with those from 74 controls.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679.

    Article  MathSciNet  MATH  Google Scholar 

  2. Bartlett, M. S. (1957). A comment on D. V. Lindley’s statistical paradox. Biometrika, 44, 533–534.

    Article  MathSciNet  MATH  Google Scholar 

  3. Coombes, K. R., Kooman, J. M., Baggerly, K. A., & Kobayashi, R. (2005). Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionisation by denoising spectra with the undecimated discrete wavelet transform. Proteomics, 5, 4107–4117.

    Article  Google Scholar 

  4. de Noo, M. E., Mertens, B. J., Ozalp, A., Bladergroen, M. R., van der Werff, M. P. J., van de Velde, C. J., et al. (2006). Detection of colorectal cancer using MALDI-TOF serum protein profiling. European Journal of Cancer, 42(8), 1068–1076.

    Article  Google Scholar 

  5. Denison, D. G. T., Holmes, C. C., Mallick, B. K., & Smith, A. F. M. (2002). Bayesian methods for nonlinear classification and regression. New York: Wiley.

    MATH  Google Scholar 

  6. Devroye, L. (1986). Non-uniform random variate generation. New York: Springer.

    Book  MATH  Google Scholar 

  7. Eilers, P. H. (2004). Parametric time warping. Analytical Chemistry, 76, 404–11.

    Article  Google Scholar 

  8. Goldstein, M., & Smith, A. F. M. (1974). Ridge-type estimators for regression analysis. Journal of the Royal Statistical Society, B, 36(2), 284–291.

    MathSciNet  MATH  Google Scholar 

  9. Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711–732.

    Article  MathSciNet  MATH  Google Scholar 

  10. Holmes, C. C., & Held, L. (2006). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1(1), 145–168.

    Article  MathSciNet  MATH  Google Scholar 

  11. Jeffreys, H. (1967). Theory of probability. Oxford: Oxford University Press.

    MATH  Google Scholar 

  12. Krzanowski, W. J., Jonathan, P., McCarthy, W. V., & Thomas, M. R. (1995). Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data. Applied Statistics, 44(1), 101–115.

    Article  MATH  Google Scholar 

  13. Lindley, D. V. (1957). A statistical paradox. Biometrika, 44, 187–192.

    Article  MathSciNet  MATH  Google Scholar 

  14. Martens, H., & Naes, T. (1989). Multivariate calibration. Chichester: Wiley.

    MATH  Google Scholar 

  15. Mertens, B. J. A. (2016). Transformation, normalization and batch effect in the analysis of mass spectrometry data for omics studies. In Statistical analysis of proteomics, metabolomics, and lipidomics data using mass spectrometry. New York: Springer.

    Google Scholar 

  16. Naes, T., Isaksson, T., Fearn, T., & Davies, T. (2002). A user-friendly guide to multivariate calibration and classification. Chichester: NIR Publications.

    Google Scholar 

  17. Petricoin, E. F. III, Ardekani, A. M., Hitt, B. A., Levine, P. J., Fusaro, V. A., Steinberg, S. M., et al. (2002). Use of proteomic patterns in serum to identify ovarian cancer. Lancet, 359, 572–577.

    Article  Google Scholar 

  18. Ramsay, J. O., & Silverman, B. W. (1997). Functional data analysis. New York: Springer.

    Book  MATH  Google Scholar 

  19. Richardson, S., & Green, P. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society, B, 59, 731–792.

    Article  MathSciNet  MATH  Google Scholar 

  20. Stone, M., & Jonathan, P. (1993). Statistical thinking and technique for QSAR and related studies. Part 1: General theory. Journal of Chemometrics, 7, 455–475.

    Article  Google Scholar 

  21. Stone, M., & Jonathan, P. (1994). Statistical thinking and technique for QSAR and related studies. Part 2: Specific methods. Journal of Chemometrics, 8, 1–20.

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by funding from the European Community’s Seventh Framework Programme FP7/2011: Marie Curie Initial Training Network MEDIASRES (“Novel Statistical Methodology for Diagnostic/Prognostic and Therapeutic Studies and Systematic Reviews,” www.mediasres-itn.eu) with the Grant Agreement Number 290025 and by funding from the European Union’s Seventh Framework Programme FP7/ Health/F5/2012: MIMOmics (“Methods for Integrated Analysis of Multiple Omics Datasets,” http://www.mimomics.eu) under the Grant Agreement Number 305280.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bart J. A. Mertens .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Mertens, B.J.A. (2017). Logistic Regression Modeling on Mass Spectrometry Data in Proteomics Case-Control Discriminant Studies. In: Datta, S., Mertens, B. (eds) Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-45809-0_12

Download citation

Publish with us

Policies and ethics