Protein Identification by Spectral Networks Analysis

  • Nuno BandeiraEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 694)


While advances in tandem mass spectrometry (MS/MS) steadily increase the rate of generation of MS/MS spectra, standard algorithmic approaches for peptide identification recently seemed to be reaching the limit on the amount of information that could be extracted from MS/MS spectra. However, a closer look reveals that a common limiting procedure is to analyze each spectrum in isolation, even though high throughput mass spectrometry regularly generates many spectra from related peptides. By capitalizing on this redundancy we show that, similarly to the alignment of protein sequences, unidentified MS/MS spectra can also be aligned for the identification of modified and unmodified variants of the same peptide. Moreover, this alignment procedure can be iterated for the accurate grouping of multiple modification variants of the same peptides. Furthermore, the combination of shotgun proteomics with the alignment of spectra from overlapping peptides led to the development of Shotgun Protein Sequencing – similarly to the assembly of DNA reads into whole genomic sequences, we show that assembly of MS/MS spectra enables the highest ever de novo sequencing accuracy, while recovering nearly complete protein sequences. We further show that shotgun protein sequencing has the potential to overcome the limitations of ­current protein sequencing approaches and thus catalyze the otherwise impractical applications of proteomics methodologies in studies of unknown proteins.

Key words

Tandem mass spectrometry MS/MS Alignment Assembly Spectral networks Shotgun protein sequencing Algorithms 


  1. 1.
    Aebersold, R. and Mann, M. (2003) Mass spectrometry-based proteomics. Nature, 422, 198–207.PubMedCrossRefGoogle Scholar
  2. 2.
    Yates, J. R. (2004) Mass spectrometry as an emerging tool for systems biology. Biotechniques, 36, 917–919.PubMedGoogle Scholar
  3. 3.
    Biemann, K., Cone, C., Webster, B., and Arsenault, G. (1966) Determination of the amino acid sequence in oligopeptides by computer interpretation of their high-resolution mass spectra. J Am Chem Soc, 88, 5598–5606.PubMedCrossRefGoogle Scholar
  4. 4.
    Henzel, W. J., Billeci, T. M., Stults, J. T., Wong, S. C., Grimley, C., and Watanabe, C. (1993) Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. Proc Natl Acad Sci USA, 90, 5011–5015.PubMedCrossRefGoogle Scholar
  5. 5.
    Yates, J., Eng, J., and McCormack, A. (1995) Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. Anal Chem, 67, 3202–3210.PubMedCrossRefGoogle Scholar
  6. 6.
    Keller, A., Nesvizhskii, A., Kolker, E., and Aebersold, R. (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem, 74, 5383–5392.PubMedCrossRefGoogle Scholar
  7. 7.
    Nesvizhskii, A. I. and Aebersold, R. (2005) Interpretation of shotgun proteomic data: the protein inference problem. Mol Cell Proteomics, 4, 1419–1440.PubMedCrossRefGoogle Scholar
  8. 8.
    Fischer, B., Roth, V., Roos, F., Grossmann, J., Baginsky, S., Widmayer, P., Gruissem, W., and Buhmann, J. M. (2005) Novohmm: a hidden Markov model for de novo peptide sequencing. Anal Chem, 77, 7265–7273.PubMedCrossRefGoogle Scholar
  9. 9.
    MacCoss, M., et al. (2002) Shotgun identification of protein modifications from protein complexes and lens tissue. Proc Natl Acad Sci USA, 99, 7900–7905.PubMedCrossRefGoogle Scholar
  10. 10.
    Englander, J., Del Mar, C., Li, W., Englander, S., Kim, J., Stranz, D., Hamuro, Y., and Woods, V. (2003) Protein structure change studied by hydrogen-deuterium exchange, functional labeling, and mass spectrometry. Proc Natl Acad Sci USA, 100, 7057–7062.PubMedCrossRefGoogle Scholar
  11. 11.
    Bandeira, N., Tsur, D., Frank, A., and Pevzner, P. (2007) Protein identification via spectral networks analysis. Proc Natl Acad Sci USA, 104, 6140–6145.PubMedCrossRefGoogle Scholar
  12. 12.
    Siuzdak, G. (2003) Mass Spectrometry in Biotechnology. MCC Press, San Diego.Google Scholar
  13. 13.
    Tabb, D., MacCoss, M., Wu, C., Anderson, S., and Yates, JR. (2003) Similarity among tandem mass spectra from proteomic experiments: detection, significance, and utility. Anal Chem, 75, 2470–2477.PubMedCrossRefGoogle Scholar
  14. 14.
    Beer, I., Barnea, E., Ziv, T., and Admon, A. (2004) Improving large-scale proteomics by clustering of mass spectrometry data. Proteomics, 4, 950–960.PubMedCrossRefGoogle Scholar
  15. 15.
    Bandeira, N., Tang, H., Bafna, V., and Pevzner, P. (2004) Shotgun protein sequencing by tandem mass spectra assembly. Anal Chem, 76, 7221–7233.PubMedCrossRefGoogle Scholar
  16. 16.
    Klammer, A. A. and MacCoss, M. J. (2006) Effects of modified digestion schemes on the identification of proteins from complex mixtures. J Proteome Res, 5, 695–700.PubMedCrossRefGoogle Scholar
  17. 17.
    Hunyadi-Gulyas, E. and Medzihradszky, K. (2004) Factors that contribute to the complexity of protein digests. DDT Targets, 3, 3–10.Google Scholar
  18. 18.
    Tanner, S., Shu, H., Frank, A., Wang, L., Zandi, E., Mumby, M., Pevzner, P., and Bafna, V. (2005) InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. Anal Chem, 77, 4626–4639.PubMedCrossRefGoogle Scholar
  19. 19.
    Tsur, D., Tanner, S., Zandi, E., Bafna, V., and Pevzner, P. A. (2005) Identification of post-translational modifications by blind search of mass spectra. Nat Biotechnol, 23, 1562–1567.PubMedCrossRefGoogle Scholar
  20. 20.
    Wilmarth, P. A., Tanner, S., Dasari, S., Nagalla, S. R., Riviere, M. A., Bafna, V., Pevzner, P. A., and David, L. L. (2006) Age-related changes in human crystallins determined from comparative analysis of post-translational modifications in young and aged lens: does deamidation contribute to crystallin insolubility? J Proteome Res, 5, 2554–2566.PubMedCrossRefGoogle Scholar
  21. 21.
    Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences. J Mol Biol, 147(1), 195–197.PubMedCrossRefGoogle Scholar
  22. 22.
    Pevzner, P., Dancík, V., and Tang, C. (2000) Mutation-tolerant protein identification by mass spectrometry. J Comput Biol, 7, 777–787PubMedCrossRefGoogle Scholar
  23. 23.
    Bandeira, N., Tsur, D., Frank, A., and Pevzner, P. (2006) A New Approach to Protein Identification. Apostolico, A., Guerra, C., Istrail, S., Pevzner, P. A., and Waterman, M. (eds.), Proceeding of the Tenth Annual 21 International Conference in Research in Computational Molecular Biology (RECOMB 2006), vol. 3909 of Lecture Notes in Computer Science, pp. 363–378, Springer, Germany.Google Scholar
  24. 24.
    Gearhart, P. J. (2002) Immunology: the roots of antibody diversity. Nature, 419, 29–31.PubMedCrossRefGoogle Scholar
  25. 25.
    Wiles, M. and Andreassen, P. (2006) Monoclonals – the billion dollar molecules of the future. Drug Discov World, Fall 2006, 17–23.Google Scholar
  26. 26.
    Haurum, J. S. (2006) Recombinant polyclonal antibodies: the next generation of antibody therapeutics? Drug Discov Today, 11, 655–660.PubMedCrossRefGoogle Scholar
  27. 27.
    Lewis, R. J. and Garcia, M. L. (2003) Therapeutic potential of venom peptides. Nat Rev Drug Discov, 2, 790–802.PubMedCrossRefGoogle Scholar
  28. 28.
    Pimenta, A. M. and De Lima, M. E. (2005) Small peptides, big world: biotechnological potential in neglected bioactive peptides from arthropod venoms. J Pept Sci, 11, 670–676.PubMedCrossRefGoogle Scholar
  29. 29.
    Joseph, J. and Kini, R. (2004) Snake venom prothrombin activators similar to blood coagulation factor Xa. Curr Drug Targets Cardiovasc Haematol Disord, 4, 397–416.PubMedCrossRefGoogle Scholar
  30. 30.
    Swenson, S., Toombs, C., Pena, L., Johansson, J., and Markland, F.(2004) Alpha-fibrinogenases. Curr Drug Targets Cardiovasc Haematol Disord, 4, 417–435.PubMedCrossRefGoogle Scholar
  31. 31.
    Kini, R., Rao, V., and Joseph, J. (2001) Procoagulant proteins from snake venoms. Haemostasis, 31, 218–224.PubMedGoogle Scholar
  32. 32.
    Swenson, S., Costa, F., Minea, R., Sherwin, R., Ernst, W., Fujii, G., Yang, D., and Markland, F. (2004) Intravenous liposomal delivery of the snake venom disintegrin contortrostatin limits breast cancer progression. Mol Cancer Ther, 3, 499–511.PubMedGoogle Scholar
  33. 33.
    Pal, S. K., Gomes, A., Dasgupta, S. C., and Gomes, A. (2002) Snake venom as therapeutic agents: from toxin to drug development. Indian J Exp Biol, 40, 1353–1358.PubMedGoogle Scholar
  34. 34.
    Markland, F., Shieh, K., Zhou, Q., Golubkov, V., Sherwin, R., Richters, V., and Sposto, R. (2001) A novel snake venom disintegrin that inhibits human ovarian cancer dissemination and angiogenesis in an orthotopic nude mouse model. Haemostasis, 31, 183–191.PubMedGoogle Scholar
  35. 35.
    Zugasti-Cruz, A., Maillo, M., López-Vera, E., Falcón, A., Heimer de la Cotera, E. P., Olivera, B. M., and Aguilar, M. B. (2006) Amino acid sequence and biological activity of a gamma-conotoxin-like peptide from the worm-hunting snail Conus austini. Peptides, 27, 506–511.PubMedCrossRefGoogle Scholar
  36. 36.
    Ogawa, Y., Yanoshita, R., Kuch, U., Samejima, Y., and Mebs, D. (2004) Complete amino acid sequence and phylogenetic analysis of a long-chain neurotoxin from the venom of the African banded water cobra, Boulengerina annulata. Toxicon, 43, 855–858.PubMedCrossRefGoogle Scholar
  37. 37.
    Johnson, R. and Biemann, K. (1987) The primary structure of thioredoxin from Chromatium vinosum determined by high-performance tandem mass spectrometry. Biochemistry, 26, 1209–1214.PubMedCrossRefGoogle Scholar
  38. 38.
    Pham, V., Henzel, W. J., Arnott, D., Hymowitz, S., Sandoval, W. N., Truong, B. T., Lowman, H., and Lill, J. R. (2006) De novo proteomic sequencing of a monoclonal antibody raised against ox40 ligand. Anal Biochem, 352, 77–86.PubMedCrossRefGoogle Scholar
  39. 39.
    Bandeira, N., Clauser, K., and Pevzner, P. (2007) Shotgun protein sequencing: assembly of tandem mass spectra from mixtures of modified proteins. Mol Cell Proteomics, 6, 1123–1134.PubMedCrossRefGoogle Scholar
  40. 40.
    Han, Y., Ma, B., and Zhang, K. (2005) Spider: software for protein identification from sequence tags with de novo sequencing error. J Bioinform Comput Biol, 3, 697–716.PubMedCrossRefGoogle Scholar
  41. 41.
    Savitski, M. M., Nielsen, M. L., and Zubarev, R. A. (2006) Modificomb, a new proteomic tool for mapping substoichiometric post-translational modifications, finding novel types of modifications, and fingerprinting complex protein mixtures. Mol Cell Proteomics, 5, 935–948.PubMedCrossRefGoogle Scholar
  42. 42.
    Pevzner, P., Mulyukov, Z., Dancik, V., and Tang, C. (2001) Efficiency of database search for identification of mutated and modified proteins via mass spectrometry. Genome Res, 11, 290–299.PubMedCrossRefGoogle Scholar
  43. 43.
    Ferrara, N., Hillan, K. J., Gerber, H. P., and Novotny, W. (2004) Discovery and development of bevacizumab, an anti-vegf antibody for treating cancer. Nat Rev Drug Discov, 3, 391–400.PubMedCrossRefGoogle Scholar
  44. 44.
    Reichert, J. M. and Valge-Archer, V. E. (2007) Development trends for monoclonal antibody cancer therapeutics. Nat Rev Drug Discov, 6, 349–356.PubMedCrossRefGoogle Scholar
  45. 45.
    Bandeira, N., Pham, V., Pevzner, P., Arnott, D., and Lill, J.R. (2008) Automated de novo protein sequencing of monoclonal antibodies. Nat Biotechnol, 26, 1336–1338.PubMedCrossRefGoogle Scholar
  46. 46.
    Savitski, M. M., Nielsen, M. L., and Zubarev, R. A. (2005) New data base-independent, sequence tag-based scoring of peptide ms/ms data validates mowse scores, recovers below threshold data, singles out modified peptides, and assesses the quality of ms/ms techniques. Mol Cell Proteomics, 4, 1180–1188.PubMedCrossRefGoogle Scholar
  47. 47.
    Savitski, M. M., Nielsen, M. L., Kjeldsen, F., and Zubarev, R. A. (2005) Proteomics-grade de novo sequencing approach. J Proteome Res, 4, 2348–2354.PubMedCrossRefGoogle Scholar
  48. 48.
    Frank, A. M., Savitski, M. M., Nielsen, M. L., Zubarev, R. A., and Pevzner, P. A. (2007) De novo peptide sequencing and identification with precision mass spectrometry. J Proteome Res, 6, 114–123.PubMedCrossRefGoogle Scholar
  49. 49.
    Shevchenko, A., Chernushevich, I., Ens, W., Standing, K. G., Thomson, B., Wilm, M., and Mann, M. (1997) Rapid “de novo” peptide sequencing by a combination of nanoelectrospray, isotopic labeling and a quadrupole/time-of-flight mass spectrometer. Rapid Commun Mass Spectrom, 11, 1015–1024.PubMedCrossRefGoogle Scholar
  50. 50.
    Zhou, Q., Smith, J. B., and Grossman, M. H. (1995) Molecular cloning and expression of catrocollastatin, a snake-venom protein from Crotalus atrox (western diamondback rattlesnake) which inhibits platelet adhesion to collagen. Biochem J, 307(Pt 2), 411–417.PubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Center for Computational Mass SpectrometryUniversity of CaliforniaSan Diego, La JollaUSA

Personalised recommendations