Abstract
We consider a string matching problem where the pattern is a template that matches many different strings with various degrees of perfection. The quality of a match is given by a penalty matrix that assigns each pair of characters a score that characterizes how well the characters match. Superfluous characters in the text and superfluous characters in the pattern may also occur and the respective penalties for such gaps in the alignment are also given by the penalty matrix. For a text T of length n, and a template P of length m, we wish to find the best alignment of T with P n, which is the concatenation of n copies of P, (m will typically be much smaller than n). Such an alignment can simply be obtained by solving a dynamic programming problem of size O(n2 m), and ignoring the periodic character of P n. We show that the structure of Pn can be exploited and the problem reduced to essentially solving a dynamic programming of size O(mn). If the complexity of computing gap penalties is O(1), (which is frequently the case), our algorithm runs in O(mn) time. The problem was motivated by a protein structure problem.
Partially supported by NSF grant CCR-8908286
Partially supported by NSF grant CCR-9110255 and the New York State Science and Technology Foundation Center for Advanced Technology
Preview
Unable to display preview. Download preview PDF.
References
P.Y. Chou and G.D. Fasman, “Prediction of protein conformation,” Biochemistry, Vol. 13, 1974, pp. 222–245.
C. Cohen, and D.A.D. Parry, “Alpha-helical coiled coils — a widespread motif in proteins,” T.I.B.S., Vol. 11, 1986, pp. 245–248.
J. F. Conway and D. A. D. Parry, “Structural features in the heptad substructure and longer range repeats of two-stranded alpha-fibrous proteins,” Int. J. Biol. Macromol., Vol. 4, 1990, pp. 328–333.
V. A. Fischetti, V. Pancholi, P. Sellers, J. Schmidt, G. Landau, X. Xu, O. Schneewind, Streptococcal M protein: A common Structural Motif Used by Gram-positive Bacteria for Biological Active Surface Molecules, to appear Molecular Recognition in Host-Parasite Interactions: Mechanisms in viral, bacterial and parasite infections. Published by Plenum Publishing.
Z. Galil and R. Giancarlo, “Speeding up dynamic programming with applications to molecular biology,” Theoretical Computer Science, Vol. 64, 1989, pp. 107–118.
M. Gribskov, A.D. McLachlan, and D. Eisenberg, “Profile analysis: Detection of distantly related proteins,” Proc. Natl. Acad. Sci., Vol. 84, 1987, pp. 4355–4358.
J. Garnier, D.J. Osguthorpe, and B. Robson, “Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins,” J. Molecular Biology, Vol. 120, 1978, pp. 97–120.
Z. Galil and K. Park, “An Improved Algorithm for Approximate String Matching,” SIAM J. Comp., Vol. 19, 1990, pp. 989–999.
A. Lupas, M. Van Dyke, J. Stock, “Predicting Coiled Coil from Protein Sequences, Science Vol. 252, 1990, pp. 1162–1164.
R. Lüthy, A. D. McLachlan, and D. Eisenberg Secondary Structure-Based Profiles: Use of Structure-Conserving Scoring Tables in Searching Protein Sequence Databases for Structural Similarities'” Proteins, Vol. 10, 1991, pp. 229–239.
G.M. Landau and U. Vishkin, “Fast parallel and serial approximate string matching,” Journal of Algorithms, Vol. 10, No. 2, June 1989, pp. 157–169.
S.B. Needleman and C.D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequences of two proteins,” J. Molecular Biology, Vol. 48, 1969, pp. 443–453.
P.H. Sellers, “On the theory and computation of evolutionary distance,” SIAM J. Appl. Math, Vol. 26, No. 4, 1974, pp. 787–793.
D. Sankoff and J.B. Kruskal (editors), Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, Addison-Wesley, Reading, MA, 1983.
E. Ukkonen, “On approximate string matching,” Proc. Int. Conf. Found. Comp. Theor., Lecture Notes in Computer Science 158, Springer-Verlag, 1983, pp. 487–495.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1992 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fischetti, V.A., Landau, G.M., Schmidt, J.P., Sellers, P.H. (1992). Identifying periodic occurrences of a template with applications to protein structure. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds) Combinatorial Pattern Matching. CPM 1992. Lecture Notes in Computer Science, vol 644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56024-6_9
Download citation
DOI: https://doi.org/10.1007/3-540-56024-6_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56024-1
Online ISBN: 978-3-540-47357-2
eBook Packages: Springer Book Archive