Abstract
The process of identification of peptides from the mass spectra and the constituent proteins in a sample is called protein identification. In the current literature, there exist many proposed approaches for the protein identification problem based on tandem mass spectrometry (MS/MS) data. While there are many two-step protein identification procedures that first identify peptides in a separate process and then use the results in protein identification, in recent years there have been attempts to develop a one-step solution to the problem through simultaneous identification of proteins and peptides in a sample. We briefly introduce the probabilistic and likelihood-based two-step and one-step procedures and report some comparative performances of these procedures for different MS/MS data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Yates, J. R., Ruse, C. I., & Nakorchevsky, M. (2009). Proteomics by mass spectrometry: Approaches, advances, and applications. Annual Review of Biomedical Engineering, 11(1), 49–79.
Eng, J. K., McCormack, A. L., & Yates, J. R., III. (1994). An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry, 5(11), 976–989.
Eng, J. K., Fischer, B., Grossmann, J., & Maccoss, M. J. (2008). A fast SEQUEST cross correlation algorithm. Journal of Proteome Research, 7(10), 4598–4602.
Diament, B. J., & Noble, W. S. (2011). Faster SEQUEST searching for peptide identification from tandem mass spectra. Journal of Proteome Research, 10(9), 3871–3879.
Craig, R., & Beavis, R. C. (2004). TANDEM: Matching proteins with tandem mass spectra. Bioinformatics, 20(9), 1466–1467.
Perkins, D. N., Pappin, D. J., Creasy, D. M., & Cottrell, J. S. (1999). Probability-based protein identification by searching sequence databases using mass spectrometry. Electrophoresis, 20(18), 3551–3567.
Clauser, K. R., Baker, P., & Burlingame, A. L. (1999). Role of accurate mass measurement (+/− 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. Analytical Chemistry, 71(14), 2871–2882.
Kim, S., Gupta, N., & Pevzner, P. A. (2008). Spectral probabilities and generating functions of tandem mass spectra: A strike against decoy databases. Journal of Proteome Research, 7(8), 3354–3363.
Swaney, D. L., Wenger, C. D., & Coon, J. J. (2010). Value of using multiple proteases for large-scale mass spectrometry-based proteomics. Journal of Proteome Research, 9(3), 1323–1329.
Granholm, V., Kim, S., Navarro, J. C. F., Sjolund, E., Smith, R. D., & Kall, L. (2014). Fast and accurate database searches with MSGF+ Percolator. Journal of Proteome Research, 13(2), 890–897.
Keller, A., Purvine, S., Nesvizhskii, A. I., Stolyar, S., Goodlett, D. R., & Kolker, E. (2002). Experimental protein mixture for validating tandem mass spectral analysis. Omics, 6(2), 207–212.
Nesvizhskii, A. I., & Aebersold, R. (2004). Analysis, statistical validation and dissemination of large-scale proteomics data sets generated by tandem MS. Drug Discovery Today, 9(4), 173–181.
Nesvizhskii, A. I., Keller, A., Kolker, E., & Aebersold, R. (2003). A statistical model for identifying proteins by tandem mass spectrometry. Analytical Chemistry, 75(17), 4646–4658.
Shen, C., Wang, Z., Shankar, G., Zhang, X., & Li, L. (2008). A hierarchical statistical model to assess the confidence of peptides and proteins inferred from tandem mass spectrometry. Bioinformatics, 24(2), 202–208.
Sikdar, S., Gill, R., & Datta, S. (2015). Improving protein identification from tandem mass spectrometry data by one-step methods and integrating data from other platforms. Briefings in Bioinformatics, 17(2), 262–269.
Keller, A., Nesvizhskii, A. I., Kolker, E., & Aebersold, R. (2002). Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Analytical Chemistry, 74(20), 5383–5592.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.
Shteynberg, D., Deutsch, E. W., Lam, H., Eng, J. K., Sun, Z., Tasman, N., et al. (2011). iProphet: Multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Molecular & Cellular Proteomics, 10(12), 1–15.
Mitra, R., Gill, R., Sikdar, S., & Datta, S. (2015). Bayesian hierarchical model for protein identifications. Under review.
Li, Q., MacCoss, M., & Stephens, M. (2010). A nested mixture model for protein identification using mass spectrometry. The Annals of Applied Statistics, 4(2), 962–987.
Huang, T., Wang, J., Yu, W., & He, Z. (2012). Protein inference: A review. Briefings in Bioinformatics, 13(5), 586–614.
Nesvizhskii, A. I., Vitek, O., & Aebersold, R. (2007). Analysis and validation of proteomic data generated by tandem mass spectrometry. Nature Methods, 4(10), 787–797.
Serang, O., & Noble, W. (2012). A review of statistical methods for protein identification using tandem mass spectrometry. Stat Interface, 5(1), 3–20.
Bern, M. W., & Kil, Y. J. (2011). Two-dimensional target decoy strategy for shotgun proteomics. Journal of Proteome Research, 10(12), 5296–5301.
Shi, J., & Wu, F.-X. (2012). A feedback framework for protein inference with peptides identified from tandem mass spectra. Proteome Science, 10, 68.
Shi, J., Chen, B., & Wu, F.-X. (2013). Unifying protein inference and peptide identification with feedback to update consistency between peptides. Proteomics, 13(2), 239–247.
Spivak, M., Weston, J., Tomazela, D., Maccoss, M. J., & Noble, W. S. (2012). Direct maximization of protein identifications from tandem mass spectra. Molecular & Cellular Proteomics, 11(2), M111.012161.
Purvine, S., Picone, A. F., & Kolker, E. (2004). Standard mixtures for proteome studies. OMICS, 8(1), 79–92.
Elias, J. E., Haas, W., Faherty, B. K., & Gygi, S. P. (2005). Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nature Methods, 2(9), 667–675.
Kall, L., Canterbury, J., Weston, J., Noble, M. J., & MacCoss, W. S. (2007). A semi-supervised machine learning technique for peptide identification from shotgun proteomics datasets. Nature Methods, 4, 923–925.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Gill, R., Datta, S. (2017). Probabilistic and Likelihood-Based Methods for Protein Identification from MS/MS Data. In: Datta, S., Mertens, B. (eds) Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-45809-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-45809-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45807-6
Online ISBN: 978-3-319-45809-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)