Skip to main content

Recent methods for RNA modeling using stochastic context-free grammars

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 1994)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 807))

Included in the following conference series:

Abstract

Stochastic context-free grammars (SCFGs) can be applied to the problems of folding, aligning and modeling families of homologous RNA sequences. SCFGs capture the sequences' common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. This paper discusses our new algorithm, Tree-Grammar EM, for deducing SCFG parameters automatically from unaligned, unfolded training sequences. Tree-Grammar EM, a generalization of the HMM forward-backward algorithm, is based on tree grammars and is faster than the previously proposed inside-outside SCFG training algorithm. Independently, Sean Eddy and Richard Durbin have introduced a trainable “covariance model” (CM) to perform similar tasks. We compare and contrast our methods with theirs.

We thank Anders Krogh, Harry Noller and Bryn Weiser for discussions and assistance, and Michael Waterman and David Searls for discussions. This work was supported by NSF grants CDA-9115268 and IRI-9123692 and NIH grant number GM17129. This material is based upon work supported under a National Science Foundation Graduate Research Fellowship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. V. Aho and J. D. Ullman. The Theory of Parsing, Translation and Compiling, Vol. I: Parsing. Prentice Hall, Englewood Cliffs, N.J., 1972.

    Google Scholar 

  2. J. K. Baker. Trainable grammars for speech recognition. Speech Communication Papers for the 97th Meeting of the Acoustical Society of America, pages 547–550, 1979.

    Google Scholar 

  3. J. W. Brown, E. S. Haas, B. D. James, D. A. Hunt, J. S. Liu, and N. R. Pace. Phylogenetic analysis and evolution of RNase P RNA in proteobacteria. Journal of Bacteriology, 173:3855–3863, 1991.

    Google Scholar 

  4. M. P. Brown, R. Hughey, A. Krogh, I. S. Mian, K. Sjölander, and D. Haussler. Dirichlet mixture priors for HMMs. In preparation, 1993.

    Google Scholar 

  5. M. P. Brown, R. Hughey, A. Krogh, I. S. Mian, K. Sjölander, and D. Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In L. Hunter, D. Searls, and J. Shavlik, editors, Proc. of First Int. Conf. on Intelligent Systems for Molecular Biology, pages 47–55, Memo Park, CA, July 1993. AAAI/MIT Press.

    Google Scholar 

  6. S. R. Eddy and R. Durbin. RNA sequence analysis using covariance models. Submitted to Nucleic Acids Research, 1994.

    Google Scholar 

  7. J. Engelfriet and G. Rozenberg. Graph grammars based on node rewriting: An introduction to NLC graph grammars. In E. Ehrig, H. J. Kreowski, and G. Rozenberg, editors, Lecture Notes in Computer Science, volume 532, pages 12–23. Springer-Verlag, 1991.

    Google Scholar 

  8. K. S. Fu. Syntactic pattern recognition and applications. Prentice-Hall, Englewood Cliffs, NJ, 1982.

    Google Scholar 

  9. G. E. Fox and C. R. Woese. 5S RNA secondary structure. Nature, 256:505–507, 1975.

    Google Scholar 

  10. M. Gouy. Secondary structure prediction of RNA. In M. J. Bishop and C. R. Rawlings, editors, Nucleic acid and protein sequence analysis, a practical approach, pages 259–284. IRL Press, Oxford, England, 1987.

    Google Scholar 

  11. C. Guthrie and B. Patterson. Spliceosomal snRNAs. Annual Review of Genetics, 22:387–419, 1988.

    Google Scholar 

  12. R. R. Gutell, A. Power, G. Z. Hertz, E. J. Putz, and G. D. Stormo. Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. Nucleic Acids Research, 20:5785–5795, 1992.

    Google Scholar 

  13. D. Haussler, A. Krogh, I. S. Mian, and K. Sjölander. Protein modeling using hidden Markov models: Analysis of globins. In Proceedings of the Hawaii International Conference on System Sciences, volume 1, pages 792–802, Los Alamitos, CA, 1993. IEEE Computer Society Press.

    Google Scholar 

  14. T. Klinger and D. Brutlag. Detection of correlations in tRNA sequences with structural implications. In Lawrence Hunter, David Searls, and Jude Shavlik, editors, First International Conference on Intelligent Systems for Molecular Biology, Menlo Park, 1993. AAAI Press.

    Google Scholar 

  15. A. Krogh, M. Brown, I. S. Mian, K. Sjölander, and D. Haussler. Hidden Markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology, 235:1501–1531, Feb. 1994.

    Google Scholar 

  16. Allan Lapedes. Private communication, 1992.

    Google Scholar 

  17. N. Larsen, G. J. Olsen, B. L. Maidak, M. J. McCaughey, R. Overbeek, T. J. Macke, T. L. Marsh, and C. R. Woese. The ribosomal database project. Nucleic Acids Research, 21:3021–3023, 1993.

    Google Scholar 

  18. R. H. Lathrop and T. F. Smith. A branch-and-bound algorithm for optimal protein threading with pairwise (contact potential) amino acid interactions. In Proceedings of the 27th Hawaii International Conference on System Sciences, Los Alamitos, CA, 1994. IEEE Computer Society Press.

    Google Scholar 

  19. K. Lari and S. J. Young. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language, 4:35–56, 1990.

    Google Scholar 

  20. F. Michel, A. D. Ellington, S. Couture, and J. W. Szostak. Phylogenetic and genetic evidence for base-triples in the catalytic domain of group I introns. Nature, 347:578–580, 1990.

    Google Scholar 

  21. F. Michel, K. Umesono, and H. Ozeki. Comparative and functional anatomy of group II catalytic introns-a review. Gene, 82:5–30, 1989.

    Google Scholar 

  22. F. Michel and E. Westhof. Modelling of the three-dimensional architecture of group I catalytic introns based on comparative sequence analysis. Journal of Molecular Biology, 216:585–610, 1990.

    Google Scholar 

  23. R. Nussinov, G. Pieczenik, J. R. Griggs, and D. J. Kleitman. Algorithms for loop matchings. SIAM Journal of Applied Mathematics, 35:68–82, 1978.

    Google Scholar 

  24. L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE, 77(2):257–286, 1989.

    Google Scholar 

  25. W. Saenger. Principles of nucleic acid structure. Springer Advanced Texts in Chemistry. Springer-Verlag, New York, 1984.

    Google Scholar 

  26. Y. Sakakibara. Efficient learning of context-free grammars from positive structural examples. Information and Computation, 97:23–60, 1992.

    Google Scholar 

  27. D. Sankoff. Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J. Appl. Math., 45:810–825, 1985.

    Google Scholar 

  28. Y. Sakakibara, M. Brown, R. Hughey, I. S. Mian, K. Sjölander, R. Underwood, and D. Haussler. The application of stochastic context-free grammars to folding, aligning and modeling homologous RNA sequences. Submitted for publication, 1993.

    Google Scholar 

  29. Y. Sakakibara, M. Brown, I. S. Mian, R. Underwood, and D. Haussler. Stochastic context-free grammars for modeling RNA. In Proceedings of the Hawaii International Conference on System Sciences, Los Alamitos, CA, 1994. IEEE Computer Society Press.

    Google Scholar 

  30. Y. Sakakibara, M. Brown, R. Underwood, I. S. Mian, and D. Haussler. Stochastic context-free grammars for modeling RNA. Technical Report UCSC-CRL-93-16, UC Santa Cruz, Computer and Information Sciences Dept., Santa Cruz, CA 95064, 1993.

    Google Scholar 

  31. T. J. Santner and D. E. Duffy. The Statistical Analysis of Discrete Data. Springer Verlag, New York, 1989.

    Google Scholar 

  32. D. B. Searls and S. Dong. A syntactic pattern recognition system for DNA sequences. In Proc. 2nd Int. Conf. on Bioinformatics, Supercomputing and complex genome analysis. World Scientific, 1993. In press.

    Google Scholar 

  33. David B. Searls. The linguistics of DNA. American Scientist, 80:579–591, November–December 1992.

    Google Scholar 

  34. D. B. Searls. The computational linguistics of biological sequences. In Artificial Intelligence and Molecular Biology, chapter 2, pages 47–120. AAAI Press, 1993.

    Google Scholar 

  35. D. B. Searls. String variable grammar: a logic grammar formalism for DNA sequences, 1993. Unpublished.

    Google Scholar 

  36. B. A. Shapiro and K. Zhang. Comparing multiple RNA secondary structures using tree comparisons. CABIOS, 6(4):309–318, 1990.

    Google Scholar 

  37. A. J. Tranguch and D. R. Engelke. Comparative structural analysis of nuclear RNase P RNAs from yeast. Journal of Biological Chemistry, 268:14045–1455, 1993.

    Google Scholar 

  38. D. H. Turner, N. Sugimoto, and S. M. Freier. RNA structure prediction. Annual Review of Biophysics and Biophysical Chemistry, 17:167–192, 1988.

    Google Scholar 

  39. I. Tinoco Jr., O. C. Uhlenbeck, and M. D. Levine. Estimation of secondary structure in ribonucleic acids. Nature, 230:363–367, 1971.

    Google Scholar 

  40. J. W. Thatcher and J. B. Wright. Generalized finite automata theory with an application to a decision problem of second-order logic. Mathematical Systems Theory, 2:57–81, 1968.

    Google Scholar 

  41. M. S. Waterman. Computer analysis of nucleic acid sequences. Methods in Enzymology, 164:765–792, 1988.

    Google Scholar 

  42. M. S. Waterman. Consensus methods for folding single-stranded nucleic acids. In M. S. Waterman, editor, Mathematical Methods for DNA Sequences, chapter 8. CRC Press, 1989.

    Google Scholar 

  43. C. R. Woese, R. R. Gutell, R. Gupta, and H. F. Noller. Detailed analysis of the higher-order structure of 16S-like ribosomal ribonucleic acids. Microbiology Reviews, 47(4):621–669, 1983.

    Google Scholar 

  44. S. Winker, R. Overbeek, C.R. Woese, G.J. Olsen, and N. Pfluger. Structure detection through automated covariance search. Computer Applications in the Biosciences, 6:365–371, 1990.

    Google Scholar 

  45. J. R. Wyatt, J. D. Puglisi, and I. Tinoco Jr. RNA folding: pseudoknots, loops and bulges. BioEssays, 11(4):100–106, 1989.

    Google Scholar 

  46. M. Zuker. On finding all suboptimal foldings of an RNA molecule. Science, 244:48–52, 1989.

    Google Scholar 

  47. C. Zwieb. Structure and function of signal recognition particle RNA. Progress in Nucleic Acid Research and Molecular Biology, 37:207–234, 1989.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Maxime Crochemore Dan Gusfield

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sakakibara, Y. et al. (1994). Recent methods for RNA modeling using stochastic context-free grammars. In: Crochemore, M., Gusfield, D. (eds) Combinatorial Pattern Matching. CPM 1994. Lecture Notes in Computer Science, vol 807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58094-8_25

Download citation

  • DOI: https://doi.org/10.1007/3-540-58094-8_25

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58094-2

  • Online ISBN: 978-3-540-48450-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics