Probabilistic and Likelihood-Based Methods for Protein Identification from MS/MS Data

Gill, Ryan; Datta, Susmita

doi:10.1007/978-3-319-45809-0_4

Probabilistic and Likelihood-Based Methods for Protein Identification from MS/MS Data

Ryan Gill⁸ &
Susmita Datta⁹

Chapter
First Online: 16 December 2016

2908 Accesses

Part of the book series: Frontiers in Probability and the Statistical Sciences ((FROPROSTAS))

Abstract

The process of identification of peptides from the mass spectra and the constituent proteins in a sample is called protein identification. In the current literature, there exist many proposed approaches for the protein identification problem based on tandem mass spectrometry (MS/MS) data. While there are many two-step protein identification procedures that first identify peptides in a separate process and then use the results in protein identification, in recent years there have been attempts to develop a one-step solution to the problem through simultaneous identification of proteins and peptides in a sample. We briefly introduce the probabilistic and likelihood-based two-step and one-step procedures and report some comparative performances of these procedures for different MS/MS data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Yates, J. R., Ruse, C. I., & Nakorchevsky, M. (2009). Proteomics by mass spectrometry: Approaches, advances, and applications. Annual Review of Biomedical Engineering, 11(1), 49–79.
Article Google Scholar
Eng, J. K., McCormack, A. L., & Yates, J. R., III. (1994). An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry, 5(11), 976–989.
Article Google Scholar
Eng, J. K., Fischer, B., Grossmann, J., & Maccoss, M. J. (2008). A fast SEQUEST cross correlation algorithm. Journal of Proteome Research, 7(10), 4598–4602.
Article Google Scholar
Diament, B. J., & Noble, W. S. (2011). Faster SEQUEST searching for peptide identification from tandem mass spectra. Journal of Proteome Research, 10(9), 3871–3879.
Article Google Scholar
Craig, R., & Beavis, R. C. (2004). TANDEM: Matching proteins with tandem mass spectra. Bioinformatics, 20(9), 1466–1467.
Article Google Scholar
Perkins, D. N., Pappin, D. J., Creasy, D. M., & Cottrell, J. S. (1999). Probability-based protein identification by searching sequence databases using mass spectrometry. Electrophoresis, 20(18), 3551–3567.
Article Google Scholar
Clauser, K. R., Baker, P., & Burlingame, A. L. (1999). Role of accurate mass measurement (+/− 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. Analytical Chemistry, 71(14), 2871–2882.
Article Google Scholar
Kim, S., Gupta, N., & Pevzner, P. A. (2008). Spectral probabilities and generating functions of tandem mass spectra: A strike against decoy databases. Journal of Proteome Research, 7(8), 3354–3363.
Article Google Scholar
Swaney, D. L., Wenger, C. D., & Coon, J. J. (2010). Value of using multiple proteases for large-scale mass spectrometry-based proteomics. Journal of Proteome Research, 9(3), 1323–1329.
Article Google Scholar
Granholm, V., Kim, S., Navarro, J. C. F., Sjolund, E., Smith, R. D., & Kall, L. (2014). Fast and accurate database searches with MSGF+ Percolator. Journal of Proteome Research, 13(2), 890–897.
Article Google Scholar
Keller, A., Purvine, S., Nesvizhskii, A. I., Stolyar, S., Goodlett, D. R., & Kolker, E. (2002). Experimental protein mixture for validating tandem mass spectral analysis. Omics, 6(2), 207–212.
Article Google Scholar
Nesvizhskii, A. I., & Aebersold, R. (2004). Analysis, statistical validation and dissemination of large-scale proteomics data sets generated by tandem MS. Drug Discovery Today, 9(4), 173–181.
Article Google Scholar
Nesvizhskii, A. I., Keller, A., Kolker, E., & Aebersold, R. (2003). A statistical model for identifying proteins by tandem mass spectrometry. Analytical Chemistry, 75(17), 4646–4658.
Article Google Scholar
Shen, C., Wang, Z., Shankar, G., Zhang, X., & Li, L. (2008). A hierarchical statistical model to assess the confidence of peptides and proteins inferred from tandem mass spectrometry. Bioinformatics, 24(2), 202–208.
Article Google Scholar
Sikdar, S., Gill, R., & Datta, S. (2015). Improving protein identification from tandem mass spectrometry data by one-step methods and integrating data from other platforms. Briefings in Bioinformatics, 17(2), 262–269.
Google Scholar
Keller, A., Nesvizhskii, A. I., Kolker, E., & Aebersold, R. (2002). Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Analytical Chemistry, 74(20), 5383–5592.
Article Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.
Book MATH Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.
MathSciNet MATH Google Scholar
Shteynberg, D., Deutsch, E. W., Lam, H., Eng, J. K., Sun, Z., Tasman, N., et al. (2011). iProphet: Multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Molecular & Cellular Proteomics, 10(12), 1–15.
Article Google Scholar
Mitra, R., Gill, R., Sikdar, S., & Datta, S. (2015). Bayesian hierarchical model for protein identifications. Under review.
Google Scholar
Li, Q., MacCoss, M., & Stephens, M. (2010). A nested mixture model for protein identification using mass spectrometry. The Annals of Applied Statistics, 4(2), 962–987.
Article MathSciNet MATH Google Scholar
Huang, T., Wang, J., Yu, W., & He, Z. (2012). Protein inference: A review. Briefings in Bioinformatics, 13(5), 586–614.
Article Google Scholar
Nesvizhskii, A. I., Vitek, O., & Aebersold, R. (2007). Analysis and validation of proteomic data generated by tandem mass spectrometry. Nature Methods, 4(10), 787–797.
Article Google Scholar
Serang, O., & Noble, W. (2012). A review of statistical methods for protein identification using tandem mass spectrometry. Stat Interface, 5(1), 3–20.
Article MathSciNet MATH Google Scholar
Bern, M. W., & Kil, Y. J. (2011). Two-dimensional target decoy strategy for shotgun proteomics. Journal of Proteome Research, 10(12), 5296–5301.
Article Google Scholar
Shi, J., & Wu, F.-X. (2012). A feedback framework for protein inference with peptides identified from tandem mass spectra. Proteome Science, 10, 68.
Article Google Scholar
Shi, J., Chen, B., & Wu, F.-X. (2013). Unifying protein inference and peptide identification with feedback to update consistency between peptides. Proteomics, 13(2), 239–247.
Article Google Scholar
Spivak, M., Weston, J., Tomazela, D., Maccoss, M. J., & Noble, W. S. (2012). Direct maximization of protein identifications from tandem mass spectra. Molecular & Cellular Proteomics, 11(2), M111.012161.
Google Scholar
Purvine, S., Picone, A. F., & Kolker, E. (2004). Standard mixtures for proteome studies. OMICS, 8(1), 79–92.
Article Google Scholar
Elias, J. E., Haas, W., Faherty, B. K., & Gygi, S. P. (2005). Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nature Methods, 2(9), 667–675.
Article Google Scholar
Kall, L., Canterbury, J., Weston, J., Noble, M. J., & MacCoss, W. S. (2007). A semi-supervised machine learning technique for peptide identification from shotgun proteomics datasets. Nature Methods, 4, 923–925.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of Louisville, Louisville, KY, 40292, USA
Ryan Gill
Department of Biostatistics, University of Florida, Gainesville, FL, USA
Susmita Datta (Professor)

Authors

Ryan Gill
View author publications
You can also search for this author in PubMed Google Scholar
Susmita Datta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ryan Gill .

Editor information

Editors and Affiliations

Department of Biostatistics, University of Florida, Gainesville, Florida, USA
Susmita Datta
Department of Medical Statistics and Bioinformatics, Leiden University Medical Centre, RC Leiden, The Netherlands
Bart J. A. Mertens

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gill, R., Datta, S. (2017). Probabilistic and Likelihood-Based Methods for Protein Identification from MS/MS Data. In: Datta, S., Mertens, B. (eds) Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-45809-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-45809-0_4
Published: 16 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45807-6
Online ISBN: 978-3-319-45809-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics