A Statistical Comparison of SimTandem with State-of-the-Art Peptide Identification Tools

  • Jiří Novák
  • Timo Sachsenberg
  • David Hoksza
  • Tomáš Skopal
  • Oliver Kohlbacher
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 222)


The similarity search in theoretical mass spectra generated from protein sequence databases is a widely accepted approach for identification of peptides from query mass spectra generated by shotgun proteomics. Since query spectra contain many inaccuracies and the sizes of databases grow rapidly in recent years, demands on more accurate mass spectra similarities and on the utilization of database indexing techniques are still desirable. We propose a statistical comparison of parameterized Hausdorff distance with freely available tools OMSSA, X!Tandem and with the cosine similarity. We show that a precursor mass filter in combination with a modification of previously proposed parameterized Hausdorff distance outperforms state-of-the-art tools in both – the speed of search and the number of identified peptide sequences (even though the q-value is only 0.001). Our method is implemented in the freely available application SimTandem which can be used in the framework TOPP based on OpenMS.


peptide identification tandem mass spectrometry similarity search parameterized Hausdorff distance precursor mass filter SimTandem 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Beck, M., et al.: The quantitative proteome of a human cell line. Molecular Systems Biology 7, 549 (2011)CrossRefGoogle Scholar
  2. 2.
    Craig, R., Beavis, R.C.: TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20(9), 1466–1467 (2004)CrossRefGoogle Scholar
  3. 3.
    Eidhammer, I., Flikka, K., Martens, L., Mikalsen, S.O.: Computational Methods for Mass Spectrometry Proteomics. John Wiley & Sons, England (2007)CrossRefGoogle Scholar
  4. 4.
    Eng, J., McCormack, A., Yates, J.: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. of the Am. Soc. for Mass Spec. 5, 976–989 (1994)CrossRefGoogle Scholar
  5. 5.
    Geer, L.Y., et al.: Open Mass Spectrometry Search Algorithm. Journal of Proteome Research 3, 958–964 (2004)CrossRefGoogle Scholar
  6. 6.
    Käll, L., et al.: Assigning Significance to Peptides Identified by Tandem Mass Spectrometry Using Decoy Databases. Journal of Proteome Research 7, 29–34 (2008)CrossRefGoogle Scholar
  7. 7.
    Kohlbacher, O., et al.: TOPP – the OpenMS proteomics pipeline. Bioinformatics 23(2), e191–e197 (2007)CrossRefGoogle Scholar
  8. 8.
    Liu, J., et al.: Methods for peptide identification by spectral comparison. Proteome Science 5(3) (2007)Google Scholar
  9. 9.
  10. 10.
  11. 11.
    Nesvizhskii, A.I.: A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. Journal of Proteomics 73(11), 2092–2123 (2010)CrossRefGoogle Scholar
  12. 12.
    Novák, J., Hoksza, D.: Parametrised Hausdorff Distance as a Non-Metric Similarity Model for Tandem Mass Spectrometry. In: CEUR Proc. DATESO, pp. 1–12 (2010)Google Scholar
  13. 13.
    Perkins, D.N., et al.: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20(18), 3551–3567 (1999)CrossRefGoogle Scholar
  14. 14.
    Pevzner, P.A., et al.: Efficiency of Database Search for Identification of Mutated and Modified Proteins via Mass Spectrometry. Genome Research 11(2), 290–299 (2001)CrossRefGoogle Scholar
  15. 15.
    Sturm, M., et al.: OpenMS – An open-source software framework for mass spectrometry. BMC Bioinformatics 9, 163 (2008)Google Scholar
  16. 16.

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Jiří Novák
    • 1
  • Timo Sachsenberg
    • 2
  • David Hoksza
    • 1
  • Tomáš Skopal
    • 1
  • Oliver Kohlbacher
    • 2
  1. 1.Faculty of Mathematics and PhysicsCharles University in PraguePragueCzech Republic
  2. 2.Applied Bioinformatics Group, Sand 14Eberhard-Karls-Universität TübingenTübingenGermany

Personalised recommendations