Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics

  • Thilo Muth
  • Erdmann Rapp
  • Frode S. Berven
  • Harald Barsnes
  • Marc Vaudel
Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 919)


Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.


de novo identification Mass spectrum sequencing Quality control Visualization 



Peptide Spectrum Match


False Discovery Rate


Mass over Charge


Basic Local Alignment Search Tool


Post-Translational Modification



T.M. and E.R. acknowledge the support by Max Planck Society. H.B. is supported by the Research Council of Norway.


  1. 1.
    Edman P, Begg G (1967) A protein sequenator. Eur J Biochem 1:80–91CrossRefPubMedGoogle Scholar
  2. 2.
    Martinsen DP, Song B-H (1985) Computer applications in mass spectral interpretation: a recent review. Mass Spectrom Rev 4:461–490CrossRefGoogle Scholar
  3. 3.
    Johnson RS, Biemann K (1989) Computer program (SEQPEP) to aid in the interpretation of high-energy collision tandem mass spectra of peptides. Biomed Environ Mass Spectrom 18:945–957CrossRefPubMedGoogle Scholar
  4. 4.
    Henzel WJ, Billeci TM, Stults JT et al (1993) Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. Proc Natl Acad Sci U S A 90:5011–5015CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5:976–989CrossRefPubMedGoogle Scholar
  6. 6.
    Pappin DJ, Hojrup P, Bleasby AJ (1993) Rapid identification of proteins by peptide-mass fingerprinting. Curr Biol 3:327–332CrossRefPubMedGoogle Scholar
  7. 7.
    Perkins DN, Pappin DJ, Creasy DM et al (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551–3567CrossRefPubMedGoogle Scholar
  8. 8.
    Barsnes H, Eidhammer I, Martens L (2011) A global analysis of peptide fragmentation variability. Proteomics 11:1181–1188CrossRefPubMedGoogle Scholar
  9. 9.
    Mann M, Wilm M (1994) Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal Chem 66:4390–4399CrossRefPubMedGoogle Scholar
  10. 10.
    Sunyaev S, Liska AJ, Golod A et al (2003) MultiTag: multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry. Anal Chem 75:1307–1315CrossRefPubMedGoogle Scholar
  11. 11.
    Tabb DL, Saraf A, Yates JR 3rd (2003) GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. Anal Chem 75:6415–6421CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Tabb DL, Ma ZQ, Martin DB et al (2008) DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring. J Proteome Res 7:3838–3846CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Vaudel M, Venne AS, Berven FS et al (2014) Shedding light on black boxes in protein identification. Proteomics 14:1001–1005CrossRefPubMedGoogle Scholar
  14. 14.
    Dorfer V, Pichler P, Stranzl T et al (2014) MS Amanda, a universal identification algorithm optimized for high accuracy tandem mass spectra. J Proteome Res 13:3679–3684CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Kim S, Gupta N, Pevzner PA (2008) Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J Proteome Res 7:3354–3363CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Tabb DL, Fernando CG, Chambers MC (2007) MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res 6:654–661CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Geer LY, Markey SP, Kowalak JA et al (2004) Open mass spectrometry search algorithm. J Proteome Res 3:958–964CrossRefPubMedGoogle Scholar
  18. 18.
    Craig R, Beavis RC (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20:1466–1467CrossRefPubMedGoogle Scholar
  19. 19.
    Chen T, Kao MY, Tepel M et al (2001) A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. J Comput Biol 8:325–337CrossRefPubMedGoogle Scholar
  20. 20.
    Taylor JA, Johnson RS (1997) Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom 11:1067–1075CrossRefPubMedGoogle Scholar
  21. 21.
    Fernandez-de-Cossio J, Gonzalez J, Satomi Y et al (2000) Automated interpretation of low-energy collision-induced dissociation spectra by SeqMS, a software aid for de novo sequencing by tandem mass spectrometry. Electrophoresis 21:1694–1699CrossRefPubMedGoogle Scholar
  22. 22.
    Lu B, Chen T (2003) A suboptimal algorithm for de novo peptide sequencing via tandem mass spectrometry. J Comput Biol 10:1–12CrossRefPubMedGoogle Scholar
  23. 23.
    Fischer B, Roth V, Roos F et al (2005) NovoHMM: a hidden Markov model for de novo peptide sequencing. Anal Chem 77:7265–7273CrossRefPubMedGoogle Scholar
  24. 24.
    Grossmann J, Roos FF, Cieliebak M et al (2005) AUDENS: a tool for automated peptide de novo sequencing. J Proteome Res 4:1768–1774CrossRefPubMedGoogle Scholar
  25. 25.
    Mo L, Dutta D, Wan Y et al (2007) MSNovo: a dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry. Anal Chem 79:4870–4878CrossRefPubMedGoogle Scholar
  26. 26.
    Pan C, Park BH, McDonald WH et al (2010) A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry. BMC Bioinf 11:118CrossRefGoogle Scholar
  27. 27.
    Castellana NE, Pham V, Arnott D et al (2010) Template proteogenomics: sequencing whole proteins using an imperfect database. Mol Cell Proteomics 9:1260–1270CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Bertsch A, Leinenbach A, Pervukhin A et al (2009) De novo peptide sequencing by tandem MS using complementary CID and electron transfer dissociation. Electrophoresis 30:3736–3747CrossRefPubMedGoogle Scholar
  29. 29.
    Guthals A, Clauser KR, Frank AM et al (2013) Sequencing-grade de novo analysis of MS/MS triplets (CID/HCD/ETD) from overlapping peptides. J Proteome Res 12:2846–2857CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Chi H, Chen H, He K et al (2013) pNovo+: de novo peptide sequencing using complementary HCD and ETD tandem mass spectra. J Proteome Res 12:615–625CrossRefPubMedGoogle Scholar
  31. 31.
    Frank A, Pevzner P (2005) PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal Chem 77:964–973CrossRefPubMedGoogle Scholar
  32. 32.
    Jeong K, Kim S, Pevzner PA (2013) UniNovo: a universal tool for de novo peptide sequencing. Bioinformatics 29:1953–1962CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    Ma B, Zhang K, Hendrie C et al (2003) PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom 17:2337–2342CrossRefPubMedGoogle Scholar
  34. 34.
    Bern M, Cai Y, Goldberg D (2007) Lookup peaks: a hybrid of de novo sequencing and database search for protein identification by tandem mass spectrometry. Anal Chem 79:1393–1400CrossRefPubMedGoogle Scholar
  35. 35.
    Henry VJ, Bandrowski AE, Pepin AS et al (2014) OMICtools: an informative directory for multi-omic data analysis. Database J Biol Databases Curation 2014. Available from: http://www.ncbi.nlm.nih.gov/pubmed/25024350
  36. 36.
    Apweiler R, Bairoch A, Wu CH et al (2004) UniProt: the Universal Protein knowledgebase. Nucleic Acids Res 32:D115–D119CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410CrossRefPubMedGoogle Scholar
  38. 38.
    Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85:2444–2448CrossRefPubMedPubMedCentralGoogle Scholar
  39. 39.
    Dasari S, Chambers MC, Slebos RJ et al (2010) TagRecon: high-throughput mutation identification through sequence tagging. J Proteome Res 9:1716–1726CrossRefPubMedPubMedCentralGoogle Scholar
  40. 40.
    Carapito C, Burel A, Guterl P et al (2014) MSDA, a proteomics software suite for in-depth Mass Spectrometry Data Analysis using grid computing. Proteomics 14:1014–1019CrossRefPubMedGoogle Scholar
  41. 41.
    Leprevost FV, Valente RH, Borges DL et al (2014) PepExplorer: a similarity-driven tool for analyzing de novo sequencing results. Mol Cell Proteomics 13(9):2480–2489CrossRefPubMedPubMedCentralGoogle Scholar
  42. 42.
    Zhang J, Xin L, Shan B et al (2012) PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol Cell Proteomics 11:M111.010587CrossRefPubMedGoogle Scholar
  43. 43.
    Ma ZQ, Dasari S, Chambers MC et al (2009) IDPicker 2.0: improved protein assembly with high discrimination peptide identification filtering. J Proteome Res 8:3872–3881CrossRefPubMedPubMedCentralGoogle Scholar
  44. 44.
    Zhang B, Chambers MC, Tabb DL (2007) Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. J Proteome Res 6:3549–3557CrossRefPubMedPubMedCentralGoogle Scholar
  45. 45.
    Jones AR, Eisenacher M, Mayer G et al (2012) The mzIdentML data standard for mass spectrometry-based proteomics results. Mol Cell Proteomics 11:M111.014381CrossRefPubMedPubMedCentralGoogle Scholar
  46. 46.
    Dasari S, Chambers MC, Martinez MA et al (2012) Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment. J Proteome Res 11:1686–1695CrossRefPubMedPubMedCentralGoogle Scholar
  47. 47.
    Muth T, Weilnbock L, Rapp E et al (2014) DeNovoGUI: an open source graphical user interface for de novo sequencing of tandem mass spectra. J Proteome Res 13(2):1143–1146CrossRefPubMedGoogle Scholar
  48. 48.
    Barsnes H, Vaudel M, Colaert N et al (2011) compomics-utilities: an open-source Java library for computational proteomics. BMC Bioinf 12:70CrossRefGoogle Scholar
  49. 49.
    Elias JE, Gygi SP (2010) Target-decoy search strategy for mass spectrometry-based proteomics. Methods Mol Biol 604:55–71CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Thilo Muth
    • 1
    • 2
  • Erdmann Rapp
    • 1
    • 2
  • Frode S. Berven
    • 3
    • 4
    • 5
  • Harald Barsnes
    • 3
  • Marc Vaudel
    • 3
  1. 1.Max Planck Institute for Dynamics of Complex Technical SystemsMagdeburgGermany
  2. 2.glyXera GmbHMagdeburgGermany
  3. 3.Proteomics Unit, Department of BiomedicineUniversity of BergenBergenNorway
  4. 4.KG Jebsen Centre for Multiple Sclerosis Research, Department of Clinical MedicineUniversity of BergenBergenNorway
  5. 5.Norwegian Multiple Sclerosis Competence Centre, Department of NeurologyHaukeland University HospitalBergenNorway

Personalised recommendations