Computing similarity between RNA strings

Bafna, Vineet; Muthukrishnan, S.; Ravi, R.

doi:10.1007/3-540-60044-2_30

Vineet Bafna¹^nAff2,
S. Muthukrishnan² &
R. Ravi^nAff3

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 937))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

219 Accesses
31 Citations

Abstract

Ribonucleic acid (RNA) strings are strings over the four-letter alphabet {A, C, G, U} with a secondary structure of base-pairing between A-U and C-G pairs in the string. Edges are drawn between two bases that are paired in the secondary structure and these edges have traditionally been assumed to be noncrossing. The noncrossing base-pairing naturally leads to a tree-like representation of the secondary structure of RNA strings.

In this paper, we address several notions of similarity between two RNA strings that take into account both the primary sequence and secondary base-pairing structure of the strings. We present efficient algorithms for exact matching and approximate matching between two RNA strings. We define a notion of alignment between two RNA strings and devise algorithms based on dynamic programming. We then present a method for optimally aligning a given RNA string with unknown secondary structure to one with known sequence and structure, thus attacking the structure prediction problem in the case when the structure of a closely related sequence is known. The techniques employed to prove our results include reductions to well-known string matching problems allowing wild cards and ranges, and speeding up dynamic programming by using the tree structures implicit in the secondary structure of RNA strings.

Research supported by DIMACS (Center for Discrete Mathematics and Theoretical Computer Science), a National Science Foundation Science and Technology Center under NSF contract STC-8809648.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

K. Abrahamson. Generalized string matching. SIAM J. Comp., 1987, 1039–1051.
Google Scholar
A. Amir and M. Farach. Efficient 2-dimensional Approximate Matching of Non-rectangular Figures. Proc of 2nd Ann ACM Symp on Discrete Algorithms, 1991, 212–222.
Google Scholar
D. Eppstein, Z. Galil, R. Giancarlo, and G.F. Italiano, “Sparse dynamic programming I: Linear cost functions,” JACM, Vol. 39, No. 3, 519–545 (1992).
Article Google Scholar
D. Eppstein, Z. Galil, R. Giancarlo, and G.F. Italiano, “Sparse dynamic programming II: Convex and concave cost functions,” JACM, Vol. 39, No. 3, 546–567 (1992).
Article Google Scholar
M. Fischer and M. Paterson. String Matching and other Products. SIAM-AMS Proceedings, Vol. 7, 113–125, 1974.
Google Scholar
L. Grate, M. Hebster. R. Hughey, D, Haussler, I. S. Mian and H. Noller, “RNA modeling using Gibbs sampling and stochastic context free grammars,” Second Intl. Conf. on Intelligent Systems for Molecular Biology (1994).
Google Scholar
T. Jiang, L. Wang and K. Zhang, “Alignment of trees — an alternative to tree edit,” Proc. Combinatorial Pattern Matching Conf. 94, LNCS 807, 75–86 (1994).
Google Scholar
P. KilpelÄinen and H. Mannila, “Query primitives for tree-structured data,” Proc. Combinatorial Pattern Matching Conf. 94, LNCS 807, 213–225 (1994).
Google Scholar
D. E. Knuth, J. H. Morris, and V. R. Pratt. Fast pattern matching in strings. SIAM J. Computing, 6:323–350, 1977.
Article Google Scholar
L. L. Larmore and B. Schieber, “On-line dynamic programming with applications to the prediction of RNA secondary structure,” Prof. First ACM-SIAM Symp. on Discrete Algorithms, 503–512 (1990).
Google Scholar
S-Y Le, J. Owens, R. Nussinov, J-H. Chen, B. Shapiro and J. V. Maizel, “RNA secondary structures: comparison and determination of frequently recurring substructures by consensus,” CABIOS Vol. 5, No. 3, 205–210 (1989).
PubMed Google Scholar
S. Muthukrishnan. New results and open problems related to nonstandard stringology. Manuscript, 1995.
Google Scholar
S. E. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino-acid sequences of two proteins,” J. Mol. Bio., 48, 443–453 (1970).
Article Google Scholar
R. Nussinov, G. Pieczenik, J. R. Griggs and D. J. Kleitman, “Algorithms for loop matchings,” SIAM J. Appl. Math., 35, 68–82 (1978).
Article Google Scholar
Y. Sakakibara, M. Brown, I. S. Mian, R. Underwood, and D. Haussler, “Stochastic context free grammars for modeling RNA,” Proc. the Hawaii Intl. Conf. on System Sciences, IEEE Computer Society Press, Los Alamitos, CA, (1994).
Google Scholar
Y. Sakakibara, M. Brown, R. Hughey, I. S. Mian, K. Sjölander, R. C. Underwood and D. Haussler, “Recent methods for RNA modeling using stochastic context-free grammars,” Proc. Combinatorial Pattern Matching Conf., LNCS 807, 289–306 (1994).
Google Scholar
D. Sankoff, “Simultaneous solution of the RNA folding, alignment and protosequence problems,” SIAM J. Appl. Math. Vol. 45, No. 5, 810–825 (1985).
Google Scholar
B. A. Shapiro, “An algorithm for comparing multiple RNA secondary structures,” CABIOS, Vol. 4, No. 3, 387–393 (1988).
PubMed Google Scholar
B. A. Shapiro and K. Zhang, “Comparing multiple RNA secondary structures using tree comparisons,” CABIOS Vol. 6, No. 4, 309–318 (1990).
PubMed Google Scholar
T. F. Smith and M. S. Waterman, “The identification of common molecular subsequences,” J. Mol. Biol. 147, 195–197 (1981).
PubMed Google Scholar
T. F. Smith and M. S. Waterman, “Comparison of biosequences,” Adv. in App. Math. 2, 482–489 (1981).
Google Scholar
K-C Tai, “The tree to tree correction problem,” JACM, Vol. 26, No. 3, 422–433 (1979).
Google Scholar
M. S. Waterman, “Secondary structure of single-stranded nucleic acids,” Studies in Foundations and Combinatorics, Advances in Mathematics supplementary studies VOl 1, Academic press, New York, 167–212 (1978).
Google Scholar
M. S. Waterman and T. F. Smith, “RNA secondary structure: a complete mathematical analysis,” Math. Biosci. 42, 257–266 (1978).
Google Scholar
K. Zhang and D. Shasha, “Simple fast algorithms for the editing distance between trees and related problems, SIAM J. Comput. 18, 1245–1262 (1989).
Article Google Scholar
K. Zhang, R. Statman, and D. Shasha, “On the editing distance between unordered labeled trees,” Inform. Proc. Lett. 42, 133–139 (1992).
Google Scholar
M. Zuker, “On finding all suboptimal foldings of an RNA molecule,” Science, 244 7, 48–52 (1989).
PubMed MathSciNet Google Scholar
M. Zuker and D. Sankoff, “RNA secondary structures and their prediction,” Bull. Math. Biol. 46, 591–621 (1984).
Google Scholar
M. Zuker and P. Stiegler, “Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information,” Nucleic Acid Res. 9, 133–148 (1981).
PubMed Google Scholar

Download references

Author information

Vineet Bafna
Present address: DIMACS Center, P. O. Box 1179, 08855, Piscataway, NJ
R. Ravi
Present address: DIMACS, Department of Computer Science, Princeton University, 08544, NJ

Authors and Affiliations

DIMACS Center, P. O. Box 1179, 08855, Piscataway, NJ
Vineet Bafna
DIMACS Center, P. O. Box 1179, 08855, Piscataway, NJ
S. Muthukrishnan

Authors

Vineet Bafna
View author publications
You can also search for this author in PubMed Google Scholar
S. Muthukrishnan
View author publications
You can also search for this author in PubMed Google Scholar
R. Ravi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Zvi Galil Esko Ukkonen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bafna, V., Muthukrishnan, S., Ravi, R. (1995). Computing similarity between RNA strings. In: Galil, Z., Ukkonen, E. (eds) Combinatorial Pattern Matching. CPM 1995. Lecture Notes in Computer Science, vol 937. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60044-2_30

Download citation

DOI: https://doi.org/10.1007/3-540-60044-2_30
Published: 31 May 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60044-2
Online ISBN: 978-3-540-49412-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics