Refining comparative proteomics by spectral counting to account for shared peptides and multiple search engines

Chen, Yao-Yi; Dasari, Surendra; Ma, Ze-Qiang; Vega-Montoto, Lorenzo J.; Li, Ming; Tabb, David L.

doi:10.1007/s00216-012-6011-x

Refining comparative proteomics by spectral counting to account for shared peptides and multiple search engines

Original Paper
Published: 03 May 2012

Volume 404, pages 1115–1125, (2012)
Cite this article

Analytical and Bioanalytical Chemistry Aims and scope Submit manuscript

Yao-Yi Chen¹,
Surendra Dasari¹,
Ze-Qiang Ma¹,
Lorenzo J. Vega-Montoto¹,
Ming Li² &
…
David L. Tabb¹

700 Accesses
6 Citations
Explore all metrics

Abstract

Spectral counting has become a widely used approach for measuring and comparing protein abundance in label-free shotgun proteomics. However, when analyzing complex samples, the ambiguity of matching between peptides and proteins greatly affects the assessment of peptide and protein inventories, differentiation, and quantification. Meanwhile, the configuration of database searching algorithms that assign peptides to MS/MS spectra may produce different results in comparative proteomic analysis. Here, we present three strategies to improve comparative proteomics through spectral counting. We show that comparing spectral counts for peptide groups rather than for protein groups forestalls problems introduced by shared peptides. We demonstrate the advantage and flexibility of this new method in two datasets. We present four models to combine four popular search engines that lead to significant gains in spectral counting differentiation. Among these models, we demonstrate a powerful vote counting model that scales well for multiple search engines. We also show that semi-tryptic searching outperforms tryptic searching for comparative proteomics. Overall, these techniques considerably improve protein differentiation on the basis of spectral count tables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Global Quantitative Proteomics Using Spectral Counting: An Inexpensive Experimental and Bioinformatics Workflow for Deep Proteome Coverage

Multi-Q 2 software facilitates isobaric labeling quantitation analysis with improved accuracy and coverage

Article Open access 26 January 2021

DISMS2: A flexible algorithm for direct proteome- wide distance calculation of LC-MS/MS runs

Article Open access 03 March 2017

Abbreviations

FP:: False positive
MM:: Myrimatch
SQ:: Sequest
TP:: True positive
TR:: TagRecon
XT:: X!Tandem

References

Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass-spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5:976–989
Article CAS Google Scholar
Perkins DN, Pappin DJ, Creasy DM, Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20(18):3551–3567
Article CAS Google Scholar
Craig R, Beavis RC (2004) TANDEM: matching proteins with tandem mass spectra. Bioinforma 20(9):1466–1467
Article CAS Google Scholar
Zhang B, Chambers MC, Tabb DL (2007) Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. J Proteome Res 6(9):3549–3557
Article CAS Google Scholar
Kall L, Storey JD, MacCoss MJ, Noble WS (2008) Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J Proteome Res 7(1):29–34
Article Google Scholar
Yang X, Dondeti V, Dezube R, Maynard DM, Geer LY, Epstein J, Chen X, Markey SP, Kowalak JA (2004) DBParser: web-based software for shotgun proteomic data analyses. J Proteome Res 3(5):1002–1008
Article CAS Google Scholar
Elias JE, Gygi SP (2007) Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 4(3):207–214
Article CAS Google Scholar
Choi H, Ghosh D, Nesvizhskii AI (2008) Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling. J Proteome Res 7(1):286–292
Article CAS Google Scholar
Ma ZQ, Dasari S, Chambers MC, Litton MD, Sobecki SM, Zimmerman LJ, Halvey PJ, Schilling B, Drake PM, Gibson BW, Tabb DL (2009) IDPicker 2.0: improved protein assembly with high discrimination peptide identification filtering. J Proteome Res 8(8):3872–3881
Article CAS Google Scholar
Liu H, Sadygov RG, Yates JR 3rd (2004) A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal Chem 76(14):4193–4201
Article CAS Google Scholar
Zybailov B, Coleman MK, Florens L, Washburn MP (2005) Correlation of relative abundance ratios derived from peptide ion chromatograms and spectrum counting for quantitative proteomic analysis using stable isotope labeling. Anal Chem 77(19):6218–6224
Article CAS Google Scholar
Fu X, Gharib SA, Green PS, Aitken ML, Frazer DA, Park DR, Vaisar T, Heinecke JW (2008) Spectral index for assessment of differential protein expression in shotgun proteomics. J Proteome Res 7(3):845–854
Article CAS Google Scholar
Fermin D, Basrur V, Yocum AK, Nesvizhskii AI (2011) Abacus: a computational tool for extracting and pre-processing spectral count data for label-free quantitative proteomic analysis. Proteome 11(7):1340–1345
Article CAS Google Scholar
Jin S, Daly DS, Springer DL, Miller JH (2008) The effects of shared peptides on protein quantitation in label-free proteomics by LC/MS/MS. J Proteome Res 7(1):164–169
Article CAS Google Scholar
Nesvizhskii AI, Keller A, Kolker E, Aebersold R (2003) A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 75(17):4646–4658
Article CAS Google Scholar
Keshishian H, Addona T, Burgess M, Kuhn E, Carr SA (2007) Quantitative, multiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol Cell Proteomics 6(12):2212–2229
Article CAS Google Scholar
Tabb DL, Vega-Montoto L, Rudnick PA, Variyath AM, Ham AJ, Bunk DM, Kilpatrick LE, Billheimer DD, Blackman RK, Cardasis HL, Carr SA, Clauser KR, Jaffe JD, Kowalski KA, Neubert TA, Regnier FE, Schilling B, Tegeler TJ, Wang M, Wang P, Whiteaker JR, Zimmerman LJ, Fisher SJ, Gibson BW, Kinsinger CR, Mesri M, Rodriguez H, Stein SE, Tempst P, Paulovich AG, Liebler DC, Spiegelman C (2010) Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. J Proteome Res 9(2):761–776
Article CAS Google Scholar
Picotti P, Aebersold R, Domon B (2007) The implications of proteolytic background for shotgun proteomics. Mol Cell Proteomics 6(9):1589–1598
Article CAS Google Scholar
Jones AR, Siepen JA, Hubbard SJ, Paton NW (2009) Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines. Proteome 9(5):1220–1229
Article CAS Google Scholar
Searle BC, Turner M, Nesvizhskii AI (2008) Improving sensitivity by probabilistically combining results from multiple MS/MS search methodologies. J Proteome Res 7(1):245–253
Article CAS Google Scholar
Kwon T, Choi H, Vogel C, Nesvizhskii AI, Marcotte EM (2011) MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines. J Proteome Res 10(7):2949–2958
Article CAS Google Scholar
Li M, Gray W, Zhang H, Chung CH, Billheimer D, Yarbrough WG, Liebler DC, Shyr Y, Slebos RJ (2010) Comparative shotgun proteomics using spectral count data and quasi-likelihood modeling. J Proteome Res 9(8):4295–4305
Article CAS Google Scholar
Halvey PJ, Zhang B, Coffey RJ, Liebler DC, Slebos RJ (2012) Proteomic consequences of a single gene mutation in a colorectal cancer model. J Proteome Res 11(2):1184–1195
Article CAS Google Scholar
Kessner D, Chambers M, Burke R, Agus D, Mallick P (2008) ProteoWizard: open source software for rapid proteomics tools development. Bioinforma 24(21):2534–2536
Article CAS Google Scholar
Dasari S, Chambers MC, Slebos RJ, Zimmerman LJ, Ham AJ, Tabb DL (2010) TagRecon: high-throughput mutation identification through sequence tagging. J Proteome Res 9(4):1716–1726
Article CAS Google Scholar
Tabb DL, Fernando CG, Chambers MC (2007) MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res 6(2):654–661
Article CAS Google Scholar
Ma ZQ, Tabb DL, Burden J, Chambers MC, Cox MB, Cantrell MJ, Ham AJ, Litton MD, Oreto MR, Schultz WC, Sobecki SM, Tsui TY, Wernke GR, Liebler DC (2011) Supporting tool suite for production proteomics. Bioinforma 27(22):3214–3215
Article CAS Google Scholar
Benjamini Y, Drai D, Elmer G, Kafkafi N, Golani I (2001) Controlling the false discovery rate in behavior genetics research. Behav Brain Res 125(1–2):279–284
Article CAS Google Scholar
Pyne S, Futcher B, Skiena S (2006) Meta-analysis based on control of false discovery rate: combining yeast ChIP-chip datasets. Bioinforma 22(20):2516–2522
Article CAS Google Scholar
Whitlock MC (2005) Combining probability from independent tests: the weighted Z-method is superior to Fisher's approach. J Evol Biol 18(5):1368–1373
Article CAS Google Scholar
Stouffer SA, Suchman EA, DeVinney LC, Star SA, Williams RM Jr (1949) The American soldier: adjustment during army life. Princeton University Press, Princeton
Google Scholar
Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM (2004) Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci U S A 101(25):9309–9314
Article CAS Google Scholar
Edwards N, Wu X, Tseng C-W (2009) An unsupervised, model-free, machine-learning combiner for peptide identifications from tandem mass spectra. Clin Proteonomics 5(1):23–36
Article CAS Google Scholar
Sanchez-Tillo E, Lazaro A, Torrent R, Cuatrecasas M, Vaquero EC, Castells A, Engel P, Postigo A (2010) ZEB1 represses E-cadherin and induces an EMT by recruiting the SWI/SNF chromatin-remodeling protein BRG1. Oncogene 29(24):3490–3500
Article CAS Google Scholar
Wan D, Gong Y, Qin W, Zhang P, Li J, Wei L, Zhou X, Li H, Qiu X, Zhong F, He L, Yu J, Yao G, Jiang H, Qian L, Yu Y, Shu H, Chen X, Xu H, Guo M, Pan Z, Chen Y, Ge C, Yang S, Gu J (2004) Large-scale cDNA transfection screening for genes related to cancer development and progression. Proc Natl Acad Sci U S A 101(44):15724–15729
Article CAS Google Scholar
Keller A, Nesvizhskii AI, Kolker E, Aebersold R (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74(20):5383–5392
Article CAS Google Scholar

Download references

Acknowledgments

D. L. Tabb and Y.-Y. Chen were supported by U01 CA152647 from the National Cancer Institute. S. Dasari, Z.-Q. Ma, and L. J. Vega-Montoto were supported by R01 CA126218 from the National Cancer Institute.

Author information

Authors and Affiliations

Department of Biomedical Informatics, Vanderbilt University Medical School, Nashville, TN, 37232-8575, USA
Yao-Yi Chen, Surendra Dasari, Ze-Qiang Ma, Lorenzo J. Vega-Montoto & David L. Tabb
Division of Cancer Biostatistics, Vanderbilt University Medical School, Nashville, TN, 37232-6848, USA
Ming Li

Authors

Yao-Yi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Surendra Dasari
View author publications
You can also search for this author in PubMed Google Scholar
Ze-Qiang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo J. Vega-Montoto
View author publications
You can also search for this author in PubMed Google Scholar
Ming Li
View author publications
You can also search for this author in PubMed Google Scholar
David L. Tabb
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David L. Tabb.

Additional information

Published in the topical issue Quantitative Mass Spectrometry in Proteomics with guest editors Bernhard Kuster and Marcus Bantscheff.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 1.30 MB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, YY., Dasari, S., Ma, ZQ. et al. Refining comparative proteomics by spectral counting to account for shared peptides and multiple search engines. Anal Bioanal Chem 404, 1115–1125 (2012). https://doi.org/10.1007/s00216-012-6011-x

Download citation

Received: 30 January 2012
Revised: 22 March 2012
Accepted: 02 April 2012
Published: 03 May 2012
Issue Date: September 2012
DOI: https://doi.org/10.1007/s00216-012-6011-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Refining comparative proteomics by spectral counting to account for shared peptides and multiple search engines

Abstract

Access this article

Similar content being viewed by others

Global Quantitative Proteomics Using Spectral Counting: An Inexpensive Experimental and Bioinformatics Workflow for Deep Proteome Coverage

Multi-Q 2 software facilitates isobaric labeling quantitation analysis with improved accuracy and coverage

DISMS2: A flexible algorithm for direct proteome- wide distance calculation of LC-MS/MS runs

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Refining comparative proteomics by spectral counting to account for shared peptides and multiple search engines

Abstract

Access this article

Similar content being viewed by others

Global Quantitative Proteomics Using Spectral Counting: An Inexpensive Experimental and Bioinformatics Workflow for Deep Proteome Coverage

Multi-Q 2 software facilitates isobaric labeling quantitation analysis with improved accuracy and coverage

DISMS2: A flexible algorithm for direct proteome- wide distance calculation of LC-MS/MS runs

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation