Abstract
Ribonucleic acid (RNA) strings are strings over the four-letter alphabet {A, C, G, U} with a secondary structure of base-pairing between A-U and C-G pairs in the string. Edges are drawn between two bases that are paired in the secondary structure and these edges have traditionally been assumed to be noncrossing. The noncrossing base-pairing naturally leads to a tree-like representation of the secondary structure of RNA strings.
In this paper, we address several notions of similarity between two RNA strings that take into account both the primary sequence and secondary base-pairing structure of the strings. We present efficient algorithms for exact matching and approximate matching between two RNA strings. We define a notion of alignment between two RNA strings and devise algorithms based on dynamic programming. We then present a method for optimally aligning a given RNA string with unknown secondary structure to one with known sequence and structure, thus attacking the structure prediction problem in the case when the structure of a closely related sequence is known. The techniques employed to prove our results include reductions to well-known string matching problems allowing wild cards and ranges, and speeding up dynamic programming by using the tree structures implicit in the secondary structure of RNA strings.
Research supported by DIMACS (Center for Discrete Mathematics and Theoretical Computer Science), a National Science Foundation Science and Technology Center under NSF contract STC-8809648.
Preview
Unable to display preview. Download preview PDF.
References
K. Abrahamson. Generalized string matching. SIAM J. Comp., 1987, 1039–1051.
A. Amir and M. Farach. Efficient 2-dimensional Approximate Matching of Non-rectangular Figures. Proc of 2nd Ann ACM Symp on Discrete Algorithms, 1991, 212–222.
D. Eppstein, Z. Galil, R. Giancarlo, and G.F. Italiano, “Sparse dynamic programming I: Linear cost functions,” JACM, Vol. 39, No. 3, 519–545 (1992).
D. Eppstein, Z. Galil, R. Giancarlo, and G.F. Italiano, “Sparse dynamic programming II: Convex and concave cost functions,” JACM, Vol. 39, No. 3, 546–567 (1992).
M. Fischer and M. Paterson. String Matching and other Products. SIAM-AMS Proceedings, Vol. 7, 113–125, 1974.
L. Grate, M. Hebster. R. Hughey, D, Haussler, I. S. Mian and H. Noller, “RNA modeling using Gibbs sampling and stochastic context free grammars,” Second Intl. Conf. on Intelligent Systems for Molecular Biology (1994).
T. Jiang, L. Wang and K. Zhang, “Alignment of trees — an alternative to tree edit,” Proc. Combinatorial Pattern Matching Conf. 94, LNCS 807, 75–86 (1994).
P. KilpelÄinen and H. Mannila, “Query primitives for tree-structured data,” Proc. Combinatorial Pattern Matching Conf. 94, LNCS 807, 213–225 (1994).
D. E. Knuth, J. H. Morris, and V. R. Pratt. Fast pattern matching in strings. SIAM J. Computing, 6:323–350, 1977.
L. L. Larmore and B. Schieber, “On-line dynamic programming with applications to the prediction of RNA secondary structure,” Prof. First ACM-SIAM Symp. on Discrete Algorithms, 503–512 (1990).
S-Y Le, J. Owens, R. Nussinov, J-H. Chen, B. Shapiro and J. V. Maizel, “RNA secondary structures: comparison and determination of frequently recurring substructures by consensus,” CABIOS Vol. 5, No. 3, 205–210 (1989).
S. Muthukrishnan. New results and open problems related to nonstandard stringology. Manuscript, 1995.
S. E. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino-acid sequences of two proteins,” J. Mol. Bio., 48, 443–453 (1970).
R. Nussinov, G. Pieczenik, J. R. Griggs and D. J. Kleitman, “Algorithms for loop matchings,” SIAM J. Appl. Math., 35, 68–82 (1978).
Y. Sakakibara, M. Brown, I. S. Mian, R. Underwood, and D. Haussler, “Stochastic context free grammars for modeling RNA,” Proc. the Hawaii Intl. Conf. on System Sciences, IEEE Computer Society Press, Los Alamitos, CA, (1994).
Y. Sakakibara, M. Brown, R. Hughey, I. S. Mian, K. Sjölander, R. C. Underwood and D. Haussler, “Recent methods for RNA modeling using stochastic context-free grammars,” Proc. Combinatorial Pattern Matching Conf., LNCS 807, 289–306 (1994).
D. Sankoff, “Simultaneous solution of the RNA folding, alignment and protosequence problems,” SIAM J. Appl. Math. Vol. 45, No. 5, 810–825 (1985).
B. A. Shapiro, “An algorithm for comparing multiple RNA secondary structures,” CABIOS, Vol. 4, No. 3, 387–393 (1988).
B. A. Shapiro and K. Zhang, “Comparing multiple RNA secondary structures using tree comparisons,” CABIOS Vol. 6, No. 4, 309–318 (1990).
T. F. Smith and M. S. Waterman, “The identification of common molecular subsequences,” J. Mol. Biol. 147, 195–197 (1981).
T. F. Smith and M. S. Waterman, “Comparison of biosequences,” Adv. in App. Math. 2, 482–489 (1981).
K-C Tai, “The tree to tree correction problem,” JACM, Vol. 26, No. 3, 422–433 (1979).
M. S. Waterman, “Secondary structure of single-stranded nucleic acids,” Studies in Foundations and Combinatorics, Advances in Mathematics supplementary studies VOl 1, Academic press, New York, 167–212 (1978).
M. S. Waterman and T. F. Smith, “RNA secondary structure: a complete mathematical analysis,” Math. Biosci. 42, 257–266 (1978).
K. Zhang and D. Shasha, “Simple fast algorithms for the editing distance between trees and related problems, SIAM J. Comput. 18, 1245–1262 (1989).
K. Zhang, R. Statman, and D. Shasha, “On the editing distance between unordered labeled trees,” Inform. Proc. Lett. 42, 133–139 (1992).
M. Zuker, “On finding all suboptimal foldings of an RNA molecule,” Science, 244 7, 48–52 (1989).
M. Zuker and D. Sankoff, “RNA secondary structures and their prediction,” Bull. Math. Biol. 46, 591–621 (1984).
M. Zuker and P. Stiegler, “Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information,” Nucleic Acid Res. 9, 133–148 (1981).
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bafna, V., Muthukrishnan, S., Ravi, R. (1995). Computing similarity between RNA strings. In: Galil, Z., Ukkonen, E. (eds) Combinatorial Pattern Matching. CPM 1995. Lecture Notes in Computer Science, vol 937. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60044-2_30
Download citation
DOI: https://doi.org/10.1007/3-540-60044-2_30
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60044-2
Online ISBN: 978-3-540-49412-6
eBook Packages: Springer Book Archive