Generating Peptide Candidates from Amino-Acid Sequence Databases for Protein Identification via Mass Spectrometry

  • Nathan Edwards
  • Ross Lippert
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2452)


Protein identification via mass spectrometry forms the foundation of high-throughput proteomics. Tandem mass spectrometry, when applied to a complex mixture of peptides, selects and fragments each peptide to reveal its amino-acid sequence structure. The successful analysis of such an experiment typically relies on amino-acid sequence databases to provide a set of biologically relevant peptides to examine. A key sub-problem, then, for amino-acid sequence database search engines that analyze tandem mass spectra is to efficiently generate all the peptide candidates from a sequence database with mass equal to one of a large set of observed peptide masses. We demonstrate that to solve the problem efficiently, we must deal with substring redundancy in the amino-acid sequence database and focus our attention on looking up the observed peptide masses quickly. We show that it is possible, with some preprocessing and memory overhead, to solve the peptide candidate generation problem in time asymptotically proportional to the size of the sequence database and the number of peptide candidates output.


Sequence Database Lookup Table Tandem Mass Spectrum Substring Density Peptide Candidate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    V. Bafna and N. Edwards. Scope: A probabilistic model for scoring tandem mass spectra against a peptide database. Bioinformatics, 17(Suppl. 1):S13–S21, 2001.Google Scholar
  2. 2.
    T. Chen, M. Kao, M. Tepel, J. Rush, and G. Church. A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. In ACMSIAM Symposium on Discrete Algorithms, 2000.Google Scholar
  3. 3.
    M. Cieliebak, T. Erlebach, S. Lipták, J. Stoye, and E. Welzl. Algorithmic complexity of protein identification: Combinatorics of weighted strings. Submitted to Discrete Applied Mathematics special issue on Combinatorics of Searching, Sorting, and Coding., 2002.Google Scholar
  4. 4.
    J. Cottrell and C. Sutton. The identification of electrophoretically separated proteins by peptide mass fingerprinting. Methods in Molecular Biology, 61:67–82, 1996.Google Scholar
  5. 5.
    V. Dancik, T. Addona, K. Clauser, J. Vath, and P. Pevzner. De novo peptide sequencing via tandem mass spectrometry. Journal of Computational Biology, 6:327–342, 1999.CrossRefGoogle Scholar
  6. 6.
    J. Eng, A. McCormack, and J. Yates. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of American Society of Mass Spectrometry, 5:976–989, 1994.CrossRefGoogle Scholar
  7. 7.
    D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.Google Scholar
  8. 8.
    P. James, M. Quadroni, E. Carafoli, and G. Gonnet. Protein identification in dna databases by peptide mass fingerprinting. Protein Science, 3(8):1347–1350, 1994.CrossRefGoogle Scholar
  9. 9.
    S. Kurtz. Reducing the space requirement of suffix trees. Software-Practice and xperience, 29(13):1149–1171, 1999.CrossRefGoogle Scholar
  10. 10.
    D. Pappin. Peptide mass fingerprinting using maldi-tof mass spectrometry. Methods in Molecular Biology, 64:165–173, 1997.Google Scholar
  11. 11.
    D. Pappin, P. Hojrup, and A. Bleasby. Rapid identification of proteins by peptidemass fingerprinting. Currents in Biology, 3(6):327–332, 1993.CrossRefGoogle Scholar
  12. 12.
    D. Perkins, D. Pappin, D. Creasy, and J. Cottrell. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis, 20(18):3551–3567, 1997.CrossRefGoogle Scholar
  13. 13.
    P. Pevzner, V. Dancik, and C. Tang. Mutation-tolerant protein identification by mass-spectrometry. In R. Shamir, S. Miyano, S. Istrail, P. Pevzner, and M. Waterman, editors, International Conference on Computational Molecular Biology (RECOMB), pages 231–236. ACM Press, 2000.Google Scholar
  14. 14.
    J. Taylor and R. Johnson. Sequence database searches via de novo peptide sequencing by mass spectrometry. Rapid Communications in Mass Spectrometry, 11:1067–1075, 1997.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Nathan Edwards
    • 1
  • Ross Lippert
    • 1
  1. 1.Celera GenomicsRockville

Personalised recommendations