Skip to main content

Computing similarity between RNA strings

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 1995)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 937))

Included in the following conference series:

Abstract

Ribonucleic acid (RNA) strings are strings over the four-letter alphabet {A, C, G, U} with a secondary structure of base-pairing between A-U and C-G pairs in the string. Edges are drawn between two bases that are paired in the secondary structure and these edges have traditionally been assumed to be noncrossing. The noncrossing base-pairing naturally leads to a tree-like representation of the secondary structure of RNA strings.

In this paper, we address several notions of similarity between two RNA strings that take into account both the primary sequence and secondary base-pairing structure of the strings. We present efficient algorithms for exact matching and approximate matching between two RNA strings. We define a notion of alignment between two RNA strings and devise algorithms based on dynamic programming. We then present a method for optimally aligning a given RNA string with unknown secondary structure to one with known sequence and structure, thus attacking the structure prediction problem in the case when the structure of a closely related sequence is known. The techniques employed to prove our results include reductions to well-known string matching problems allowing wild cards and ranges, and speeding up dynamic programming by using the tree structures implicit in the secondary structure of RNA strings.

Research supported by DIMACS (Center for Discrete Mathematics and Theoretical Computer Science), a National Science Foundation Science and Technology Center under NSF contract STC-8809648.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. K. Abrahamson. Generalized string matching. SIAM J. Comp., 1987, 1039–1051.

    Google Scholar 

  2. A. Amir and M. Farach. Efficient 2-dimensional Approximate Matching of Non-rectangular Figures. Proc of 2nd Ann ACM Symp on Discrete Algorithms, 1991, 212–222.

    Google Scholar 

  3. D. Eppstein, Z. Galil, R. Giancarlo, and G.F. Italiano, “Sparse dynamic programming I: Linear cost functions,” JACM, Vol. 39, No. 3, 519–545 (1992).

    Article  Google Scholar 

  4. D. Eppstein, Z. Galil, R. Giancarlo, and G.F. Italiano, “Sparse dynamic programming II: Convex and concave cost functions,” JACM, Vol. 39, No. 3, 546–567 (1992).

    Article  Google Scholar 

  5. M. Fischer and M. Paterson. String Matching and other Products. SIAM-AMS Proceedings, Vol. 7, 113–125, 1974.

    Google Scholar 

  6. L. Grate, M. Hebster. R. Hughey, D, Haussler, I. S. Mian and H. Noller, “RNA modeling using Gibbs sampling and stochastic context free grammars,” Second Intl. Conf. on Intelligent Systems for Molecular Biology (1994).

    Google Scholar 

  7. T. Jiang, L. Wang and K. Zhang, “Alignment of trees — an alternative to tree edit,” Proc. Combinatorial Pattern Matching Conf. 94, LNCS 807, 75–86 (1994).

    Google Scholar 

  8. P. KilpelÄinen and H. Mannila, “Query primitives for tree-structured data,” Proc. Combinatorial Pattern Matching Conf. 94, LNCS 807, 213–225 (1994).

    Google Scholar 

  9. D. E. Knuth, J. H. Morris, and V. R. Pratt. Fast pattern matching in strings. SIAM J. Computing, 6:323–350, 1977.

    Article  Google Scholar 

  10. L. L. Larmore and B. Schieber, “On-line dynamic programming with applications to the prediction of RNA secondary structure,” Prof. First ACM-SIAM Symp. on Discrete Algorithms, 503–512 (1990).

    Google Scholar 

  11. S-Y Le, J. Owens, R. Nussinov, J-H. Chen, B. Shapiro and J. V. Maizel, “RNA secondary structures: comparison and determination of frequently recurring substructures by consensus,” CABIOS Vol. 5, No. 3, 205–210 (1989).

    PubMed  Google Scholar 

  12. S. Muthukrishnan. New results and open problems related to nonstandard stringology. Manuscript, 1995.

    Google Scholar 

  13. S. E. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino-acid sequences of two proteins,” J. Mol. Bio., 48, 443–453 (1970).

    Article  Google Scholar 

  14. R. Nussinov, G. Pieczenik, J. R. Griggs and D. J. Kleitman, “Algorithms for loop matchings,” SIAM J. Appl. Math., 35, 68–82 (1978).

    Article  Google Scholar 

  15. Y. Sakakibara, M. Brown, I. S. Mian, R. Underwood, and D. Haussler, “Stochastic context free grammars for modeling RNA,” Proc. the Hawaii Intl. Conf. on System Sciences, IEEE Computer Society Press, Los Alamitos, CA, (1994).

    Google Scholar 

  16. Y. Sakakibara, M. Brown, R. Hughey, I. S. Mian, K. Sjölander, R. C. Underwood and D. Haussler, “Recent methods for RNA modeling using stochastic context-free grammars,” Proc. Combinatorial Pattern Matching Conf., LNCS 807, 289–306 (1994).

    Google Scholar 

  17. D. Sankoff, “Simultaneous solution of the RNA folding, alignment and protosequence problems,” SIAM J. Appl. Math. Vol. 45, No. 5, 810–825 (1985).

    Google Scholar 

  18. B. A. Shapiro, “An algorithm for comparing multiple RNA secondary structures,” CABIOS, Vol. 4, No. 3, 387–393 (1988).

    PubMed  Google Scholar 

  19. B. A. Shapiro and K. Zhang, “Comparing multiple RNA secondary structures using tree comparisons,” CABIOS Vol. 6, No. 4, 309–318 (1990).

    PubMed  Google Scholar 

  20. T. F. Smith and M. S. Waterman, “The identification of common molecular subsequences,” J. Mol. Biol. 147, 195–197 (1981).

    PubMed  Google Scholar 

  21. T. F. Smith and M. S. Waterman, “Comparison of biosequences,” Adv. in App. Math. 2, 482–489 (1981).

    Google Scholar 

  22. K-C Tai, “The tree to tree correction problem,” JACM, Vol. 26, No. 3, 422–433 (1979).

    Google Scholar 

  23. M. S. Waterman, “Secondary structure of single-stranded nucleic acids,” Studies in Foundations and Combinatorics, Advances in Mathematics supplementary studies VOl 1, Academic press, New York, 167–212 (1978).

    Google Scholar 

  24. M. S. Waterman and T. F. Smith, “RNA secondary structure: a complete mathematical analysis,” Math. Biosci. 42, 257–266 (1978).

    Google Scholar 

  25. K. Zhang and D. Shasha, “Simple fast algorithms for the editing distance between trees and related problems, SIAM J. Comput. 18, 1245–1262 (1989).

    Article  Google Scholar 

  26. K. Zhang, R. Statman, and D. Shasha, “On the editing distance between unordered labeled trees,” Inform. Proc. Lett. 42, 133–139 (1992).

    Google Scholar 

  27. M. Zuker, “On finding all suboptimal foldings of an RNA molecule,” Science, 244 7, 48–52 (1989).

    PubMed  MathSciNet  Google Scholar 

  28. M. Zuker and D. Sankoff, “RNA secondary structures and their prediction,” Bull. Math. Biol. 46, 591–621 (1984).

    Google Scholar 

  29. M. Zuker and P. Stiegler, “Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information,” Nucleic Acid Res. 9, 133–148 (1981).

    PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Zvi Galil Esko Ukkonen

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bafna, V., Muthukrishnan, S., Ravi, R. (1995). Computing similarity between RNA strings. In: Galil, Z., Ukkonen, E. (eds) Combinatorial Pattern Matching. CPM 1995. Lecture Notes in Computer Science, vol 937. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60044-2_30

Download citation

  • DOI: https://doi.org/10.1007/3-540-60044-2_30

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60044-2

  • Online ISBN: 978-3-540-49412-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics