Generating Peptide Candidates from Amino-Acid Sequence Databases for Protein Identification via Mass Spectrometry
Protein identification via mass spectrometry forms the foundation of high-throughput proteomics. Tandem mass spectrometry, when applied to a complex mixture of peptides, selects and fragments each peptide to reveal its amino-acid sequence structure. The successful analysis of such an experiment typically relies on amino-acid sequence databases to provide a set of biologically relevant peptides to examine. A key sub-problem, then, for amino-acid sequence database search engines that analyze tandem mass spectra is to efficiently generate all the peptide candidates from a sequence database with mass equal to one of a large set of observed peptide masses. We demonstrate that to solve the problem efficiently, we must deal with substring redundancy in the amino-acid sequence database and focus our attention on looking up the observed peptide masses quickly. We show that it is possible, with some preprocessing and memory overhead, to solve the peptide candidate generation problem in time asymptotically proportional to the size of the sequence database and the number of peptide candidates output.
KeywordsSequence Database Lookup Table Tandem Mass Spectrum Substring Density Peptide Candidate
Unable to display preview. Download preview PDF.
- 1.V. Bafna and N. Edwards. Scope: A probabilistic model for scoring tandem mass spectra against a peptide database. Bioinformatics, 17(Suppl. 1):S13–S21, 2001.Google Scholar
- 2.T. Chen, M. Kao, M. Tepel, J. Rush, and G. Church. A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. In ACMSIAM Symposium on Discrete Algorithms, 2000.Google Scholar
- 3.M. Cieliebak, T. Erlebach, S. Lipták, J. Stoye, and E. Welzl. Algorithmic complexity of protein identification: Combinatorics of weighted strings. Submitted to Discrete Applied Mathematics special issue on Combinatorics of Searching, Sorting, and Coding., 2002.Google Scholar
- 4.J. Cottrell and C. Sutton. The identification of electrophoretically separated proteins by peptide mass fingerprinting. Methods in Molecular Biology, 61:67–82, 1996.Google Scholar
- 7.D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.Google Scholar
- 10.D. Pappin. Peptide mass fingerprinting using maldi-tof mass spectrometry. Methods in Molecular Biology, 64:165–173, 1997.Google Scholar
- 13.P. Pevzner, V. Dancik, and C. Tang. Mutation-tolerant protein identification by mass-spectrometry. In R. Shamir, S. Miyano, S. Istrail, P. Pevzner, and M. Waterman, editors, International Conference on Computational Molecular Biology (RECOMB), pages 231–236. ACM Press, 2000.Google Scholar