Skip to main content

Advertisement

Log in

Refining comparative proteomics by spectral counting to account for shared peptides and multiple search engines

  • Original Paper
  • Published:
Analytical and Bioanalytical Chemistry Aims and scope Submit manuscript

Abstract

Spectral counting has become a widely used approach for measuring and comparing protein abundance in label-free shotgun proteomics. However, when analyzing complex samples, the ambiguity of matching between peptides and proteins greatly affects the assessment of peptide and protein inventories, differentiation, and quantification. Meanwhile, the configuration of database searching algorithms that assign peptides to MS/MS spectra may produce different results in comparative proteomic analysis. Here, we present three strategies to improve comparative proteomics through spectral counting. We show that comparing spectral counts for peptide groups rather than for protein groups forestalls problems introduced by shared peptides. We demonstrate the advantage and flexibility of this new method in two datasets. We present four models to combine four popular search engines that lead to significant gains in spectral counting differentiation. Among these models, we demonstrate a powerful vote counting model that scales well for multiple search engines. We also show that semi-tryptic searching outperforms tryptic searching for comparative proteomics. Overall, these techniques considerably improve protein differentiation on the basis of spectral count tables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Abbreviations

FP:

False positive

MM:

Myrimatch

SQ:

Sequest

TP:

True positive

TR:

TagRecon

XT:

X!Tandem

References

  1. Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass-spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5:976–989

    Article  CAS  Google Scholar 

  2. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20(18):3551–3567

    Article  CAS  Google Scholar 

  3. Craig R, Beavis RC (2004) TANDEM: matching proteins with tandem mass spectra. Bioinforma 20(9):1466–1467

    Article  CAS  Google Scholar 

  4. Zhang B, Chambers MC, Tabb DL (2007) Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. J Proteome Res 6(9):3549–3557

    Article  CAS  Google Scholar 

  5. Kall L, Storey JD, MacCoss MJ, Noble WS (2008) Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J Proteome Res 7(1):29–34

    Article  Google Scholar 

  6. Yang X, Dondeti V, Dezube R, Maynard DM, Geer LY, Epstein J, Chen X, Markey SP, Kowalak JA (2004) DBParser: web-based software for shotgun proteomic data analyses. J Proteome Res 3(5):1002–1008

    Article  CAS  Google Scholar 

  7. Elias JE, Gygi SP (2007) Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 4(3):207–214

    Article  CAS  Google Scholar 

  8. Choi H, Ghosh D, Nesvizhskii AI (2008) Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling. J Proteome Res 7(1):286–292

    Article  CAS  Google Scholar 

  9. Ma ZQ, Dasari S, Chambers MC, Litton MD, Sobecki SM, Zimmerman LJ, Halvey PJ, Schilling B, Drake PM, Gibson BW, Tabb DL (2009) IDPicker 2.0: improved protein assembly with high discrimination peptide identification filtering. J Proteome Res 8(8):3872–3881

    Article  CAS  Google Scholar 

  10. Liu H, Sadygov RG, Yates JR 3rd (2004) A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal Chem 76(14):4193–4201

    Article  CAS  Google Scholar 

  11. Zybailov B, Coleman MK, Florens L, Washburn MP (2005) Correlation of relative abundance ratios derived from peptide ion chromatograms and spectrum counting for quantitative proteomic analysis using stable isotope labeling. Anal Chem 77(19):6218–6224

    Article  CAS  Google Scholar 

  12. Fu X, Gharib SA, Green PS, Aitken ML, Frazer DA, Park DR, Vaisar T, Heinecke JW (2008) Spectral index for assessment of differential protein expression in shotgun proteomics. J Proteome Res 7(3):845–854

    Article  CAS  Google Scholar 

  13. Fermin D, Basrur V, Yocum AK, Nesvizhskii AI (2011) Abacus: a computational tool for extracting and pre-processing spectral count data for label-free quantitative proteomic analysis. Proteome 11(7):1340–1345

    Article  CAS  Google Scholar 

  14. Jin S, Daly DS, Springer DL, Miller JH (2008) The effects of shared peptides on protein quantitation in label-free proteomics by LC/MS/MS. J Proteome Res 7(1):164–169

    Article  CAS  Google Scholar 

  15. Nesvizhskii AI, Keller A, Kolker E, Aebersold R (2003) A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 75(17):4646–4658

    Article  CAS  Google Scholar 

  16. Keshishian H, Addona T, Burgess M, Kuhn E, Carr SA (2007) Quantitative, multiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol Cell Proteomics 6(12):2212–2229

    Article  CAS  Google Scholar 

  17. Tabb DL, Vega-Montoto L, Rudnick PA, Variyath AM, Ham AJ, Bunk DM, Kilpatrick LE, Billheimer DD, Blackman RK, Cardasis HL, Carr SA, Clauser KR, Jaffe JD, Kowalski KA, Neubert TA, Regnier FE, Schilling B, Tegeler TJ, Wang M, Wang P, Whiteaker JR, Zimmerman LJ, Fisher SJ, Gibson BW, Kinsinger CR, Mesri M, Rodriguez H, Stein SE, Tempst P, Paulovich AG, Liebler DC, Spiegelman C (2010) Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. J Proteome Res 9(2):761–776

    Article  CAS  Google Scholar 

  18. Picotti P, Aebersold R, Domon B (2007) The implications of proteolytic background for shotgun proteomics. Mol Cell Proteomics 6(9):1589–1598

    Article  CAS  Google Scholar 

  19. Jones AR, Siepen JA, Hubbard SJ, Paton NW (2009) Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines. Proteome 9(5):1220–1229

    Article  CAS  Google Scholar 

  20. Searle BC, Turner M, Nesvizhskii AI (2008) Improving sensitivity by probabilistically combining results from multiple MS/MS search methodologies. J Proteome Res 7(1):245–253

    Article  CAS  Google Scholar 

  21. Kwon T, Choi H, Vogel C, Nesvizhskii AI, Marcotte EM (2011) MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines. J Proteome Res 10(7):2949–2958

    Article  CAS  Google Scholar 

  22. Li M, Gray W, Zhang H, Chung CH, Billheimer D, Yarbrough WG, Liebler DC, Shyr Y, Slebos RJ (2010) Comparative shotgun proteomics using spectral count data and quasi-likelihood modeling. J Proteome Res 9(8):4295–4305

    Article  CAS  Google Scholar 

  23. Halvey PJ, Zhang B, Coffey RJ, Liebler DC, Slebos RJ (2012) Proteomic consequences of a single gene mutation in a colorectal cancer model. J Proteome Res 11(2):1184–1195

    Article  CAS  Google Scholar 

  24. Kessner D, Chambers M, Burke R, Agus D, Mallick P (2008) ProteoWizard: open source software for rapid proteomics tools development. Bioinforma 24(21):2534–2536

    Article  CAS  Google Scholar 

  25. Dasari S, Chambers MC, Slebos RJ, Zimmerman LJ, Ham AJ, Tabb DL (2010) TagRecon: high-throughput mutation identification through sequence tagging. J Proteome Res 9(4):1716–1726

    Article  CAS  Google Scholar 

  26. Tabb DL, Fernando CG, Chambers MC (2007) MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res 6(2):654–661

    Article  CAS  Google Scholar 

  27. Ma ZQ, Tabb DL, Burden J, Chambers MC, Cox MB, Cantrell MJ, Ham AJ, Litton MD, Oreto MR, Schultz WC, Sobecki SM, Tsui TY, Wernke GR, Liebler DC (2011) Supporting tool suite for production proteomics. Bioinforma 27(22):3214–3215

    Article  CAS  Google Scholar 

  28. Benjamini Y, Drai D, Elmer G, Kafkafi N, Golani I (2001) Controlling the false discovery rate in behavior genetics research. Behav Brain Res 125(1–2):279–284

    Article  CAS  Google Scholar 

  29. Pyne S, Futcher B, Skiena S (2006) Meta-analysis based on control of false discovery rate: combining yeast ChIP-chip datasets. Bioinforma 22(20):2516–2522

    Article  CAS  Google Scholar 

  30. Whitlock MC (2005) Combining probability from independent tests: the weighted Z-method is superior to Fisher's approach. J Evol Biol 18(5):1368–1373

    Article  CAS  Google Scholar 

  31. Stouffer SA, Suchman EA, DeVinney LC, Star SA, Williams RM Jr (1949) The American soldier: adjustment during army life. Princeton University Press, Princeton

    Google Scholar 

  32. Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM (2004) Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci U S A 101(25):9309–9314

    Article  CAS  Google Scholar 

  33. Edwards N, Wu X, Tseng C-W (2009) An unsupervised, model-free, machine-learning combiner for peptide identifications from tandem mass spectra. Clin Proteonomics 5(1):23–36

    Article  CAS  Google Scholar 

  34. Sanchez-Tillo E, Lazaro A, Torrent R, Cuatrecasas M, Vaquero EC, Castells A, Engel P, Postigo A (2010) ZEB1 represses E-cadherin and induces an EMT by recruiting the SWI/SNF chromatin-remodeling protein BRG1. Oncogene 29(24):3490–3500

    Article  CAS  Google Scholar 

  35. Wan D, Gong Y, Qin W, Zhang P, Li J, Wei L, Zhou X, Li H, Qiu X, Zhong F, He L, Yu J, Yao G, Jiang H, Qian L, Yu Y, Shu H, Chen X, Xu H, Guo M, Pan Z, Chen Y, Ge C, Yang S, Gu J (2004) Large-scale cDNA transfection screening for genes related to cancer development and progression. Proc Natl Acad Sci U S A 101(44):15724–15729

    Article  CAS  Google Scholar 

  36. Keller A, Nesvizhskii AI, Kolker E, Aebersold R (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74(20):5383–5392

    Article  CAS  Google Scholar 

Download references

Acknowledgments

D. L. Tabb and Y.-Y. Chen were supported by U01 CA152647 from the National Cancer Institute. S. Dasari, Z.-Q. Ma, and L. J. Vega-Montoto were supported by R01 CA126218 from the National Cancer Institute.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David L. Tabb.

Additional information

Published in the topical issue Quantitative Mass Spectrometry in Proteomics with guest editors Bernhard Kuster and Marcus Bantscheff.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 1.30 MB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, YY., Dasari, S., Ma, ZQ. et al. Refining comparative proteomics by spectral counting to account for shared peptides and multiple search engines. Anal Bioanal Chem 404, 1115–1125 (2012). https://doi.org/10.1007/s00216-012-6011-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00216-012-6011-x

Keywords

Navigation