Abstract
Shotgun proteomics experiments require the collection of thousands of tandem mass spectra; these sets of data will continue to grow as new instruments become available that can scan at even higher rates. Such data contain substantial amounts of redundancy with spectra from a particular peptide being acquired many times during a single LC-MS/MS experiment. In this article, we present MS2Grouper, an algorithm that detects spectral duplication, assesses groups of related spectra, and replaces these groups with synthetic representative spectra. Errors in detecting spectral similarity are corrected using a paraclique criterion—spectra are only assessed as groups if they are part of a clique of at least three completely interrelated spectra or are subsequently added to such cliques by being similar to all but one of the clique members. A greedy algorithm constructs a representative spectrum for each group by iteratively removing the tallest peaks from the spectral collection and matching to peaks in the other spectra. This strategy is shown to be effective in reducing spectral counts by up to 20% in LC-MS/MS datasets from protein standard mixtures and proteomes, reducing database search times without a concomitant reduction in identified peptides.
Article PDF
Similar content being viewed by others
References
Cantin, G. T.; Yates, J. R., III. J. Chromatogr. A 2004, 1053, 7–14.
McDonald, W. H.; Yates, J. R., III. Curr. Opin. Mol. Ther. 2003, 5, 302–309.
Wu, C. C.; Yates, J. R. III Nat. Biotechnol. 2003, 21, 262–267.
Link, A. J.; Eng, J.; Schieltz, D. M.; Carmack, E.; Mize, G. J.; Morris, D. R.; Garvik, B. M.; Yates, J. R., III Nat. Biotechnol. 1999, 17, 676–682.
Washburn, M. P.; Wolters, D.; Yates, J. R. III. Nat. Biotechnol. 2001, 19, 242–247.
Wolters, D. A.; Washburn, M. P.; Yates, J. R. III. Anal. Chem. 2001, 73, 5683–5690.
ThermoElectron Product Support Bulletin 105. http://www.thermofinnigan.com.
Tabb, D. L.; MacCoss, M. J.; Wu, C. C.; Anderson, S. D.; Yates, J. R. III. Anal. Chem. 2003, 75, 2470–2477.
Stein, S. E.; Scott, D. R.. J. Am. Soc. Mass Spectrom 1994, 5, 859–866.
Gan, F.; Yang, J. H.; Liang, Y. Z.. Anal. Sci. 2001, 17, 635–638.
Wan, K. X.; Vidavsky, I.; Gross, M. L.. J. Am. Soc. Mass Spectrom. 2002, 13, 85–88.
Hansen, B. T.; Jones, J. A.; Mason, D. E.; Liebler, D. C.. Anal. Chem. 2001, 73, 1676–1683.
Beer, I.; Barnea, E.; Ziv, T.; Admon, A.. Proteomics 2004, 4, 950–960.
Yates, J. R., III; Morgan, S. F.; Gatlin, C. L.; Griffin, P. R.; Eng, J. K.. Anal. Chem. 1998, 70, 3557–3565.
McDonald, W. H.; Ohi, R.; Miyamoto, D.; Mitchison, T. J.; Yates, J. R., III Int. J. Mass Spectrom 2002, 219, 245–251.
MacCoss, M. J.; McDonald, W. H.; Saraf, A.; Sadygov, R.; Clark, J. M.; Tasto, J. J.; Gould, K. L.; Wolters, D.; Washburn, M.; Weiss, A.; Clark, J. I.; Yates, J. R., III. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 7900–7905.
McDonald, W. H.; Tabb, D. L.; Sadygov, R. G.; MacCoss, M. J.; Venable, J.; Graumann, J.; Johnson, J. R.; Cociorva, D.; Yates, J. R., III. Rapid Commun. Mass Spectrom. 2004, 18, 2162–2168.
Bron, C.; Kerbosch, J.. Commun. ACM 1973, 16, 575–577.
Chesler, E. J.; Lu, L.; Shou, S.; Qu, Y.; Gu, J.; Wang, J.; Hsu, H. C.; Mountz, J. D.; Baldwin, N. E.; Langston, M. A.; Hogenesch, J. B.; Threadgill, D. W.; Manly, K. F.; Williams, R. W. Nat. Genet. 2005, in press.
Tabb, D. L.; Narasimhan, C.; Strader, M. B.; Hettich, R. L. Anal. Chem. 2005, in press.
Eng, J. K.; McCormack, A. L.; Yates, J. R., III J. Am. Soc. Mass Spectrom 1994, 5, 976.
Larimer, F. W.; Chain, P.; Hauser, L.; Lamerdin, J.; Malfatti, S.; Do, L.; Land, M. L.; Pelletier, D. A.; Beatty, J. T.; Lang, A. S.; Tabita, F. R.; Gibson, J. L.; Hanson, T. E.; Bobst, C.; Torres, J. L.; Peres, C.; Harrison, F. H.; Gibson, J.; Harwood, C. S.. Nat. Biotechnol. 2004, 22, 55–61.
Heidelberg, J. F.; Paulsen, I. T.; Nelson, K. E.; Gaidos, E. J.; Nelson, W. C.; Read, T. D.; Eisen, J. A.; Seshadri, R.; Ward, N.; Methe, B.; Clayton, R. A.; Meyer, T.; Tsapin, A.; Scott, J.; Beanan, M.; Brinkac, L.; Daugherty, S.; DeBoy, R. T.; Dodson, R. J.; Durkin, A. S.; Haft, D. H.; Kolonay, J. F.; Madupu, R.; Peterson, J. D.; Umayam, L. A.; White, O.; Wolf, A. M.; Vamathevan, J.; Weidman, J.; Impraim, M.; Lee, K.; Berry, K.; Lee, C.; Mueller, J.; Khouri, H.; Gill, J.; Utterback, T. R.; McDonald, L. A.; Feldblyum, T. V.; Smith, H. O.; Venter, J. C.; Nealson, K. H.; Fraser, C. M.. Nat. Biotechnol. 2002, 20, 1118–1123.
Daraselia, N.; Dernovoy, D.; Tian, Y.; Borodovsky, M.; Tatusov, R.; Tatusova, T.. Omics 2003, 7, 171–175.
Tabb, D. L.; McDonald, W. H.; Yates, J. R., III. J. Proteome Res. 2002, 1, 21–26.
Geer, L. Y.; Markey, S. P.; Kowalak, J. A.; Wagner, L.; Xu, M.; Maynard, D. M.; Yang, X.; Shi, W.; Bryant, S. H.. J. Proteome Res. 2004, 3, 958–964.
Sadygov, R. G.; Yates, J. R., III. Anal. Chem. 2003, 75, 3792–3798.
Fridman, T.; Razumovskaya, J.; VerBerkmoes, N.; Hurst, G.; Protopopescu, V.; Xu, Y. J. Bioinformatics Computat. Bio. 2005, in press.
Author information
Authors and Affiliations
Corresponding author
Additional information
Published online June 23, 2005
Rights and permissions
About this article
Cite this article
Tabb, D.L., Thompson, M.R., Khalsa-Moyers, G. et al. MS2Grouper: Group assessment and synthetic replacement of duplicate proteomic tandem mass spectra. J Am Soc Mass Spectrom 16, 1250–1261 (2005). https://doi.org/10.1016/j.jasms.2005.04.010
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1016/j.jasms.2005.04.010