Recent methods for RNA modeling using stochastic context-free grammars

Sakakibara, Yasubumi; Brown, Michael; Hughey, Richard; Mian, Saira; Sjölander, Kimmen; Underwood, Rebecca C.; Haussler, David

doi:10.1007/3-540-58094-8_25

Yasubumi Sakakibara¹^nAff4,
Michael Brown¹,
Richard Hughey²,
Saira Mian³,
Kimmen Sjölander¹,
Rebecca C. Underwood¹ &
…
David Haussler¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 807))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

203 Accesses
8 Citations

Abstract

Stochastic context-free grammars (SCFGs) can be applied to the problems of folding, aligning and modeling families of homologous RNA sequences. SCFGs capture the sequences' common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. This paper discusses our new algorithm, Tree-Grammar EM, for deducing SCFG parameters automatically from unaligned, unfolded training sequences. Tree-Grammar EM, a generalization of the HMM forward-backward algorithm, is based on tree grammars and is faster than the previously proposed inside-outside SCFG training algorithm. Independently, Sean Eddy and Richard Durbin have introduced a trainable “covariance model” (CM) to perform similar tasks. We compare and contrast our methods with theirs.

We thank Anders Krogh, Harry Noller and Bryn Weiser for discussions and assistance, and Michael Waterman and David Searls for discussions. This work was supported by NSF grants CDA-9115268 and IRI-9123692 and NIH grant number GM17129. This material is based upon work supported under a National Science Foundation Graduate Research Fellowship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. V. Aho and J. D. Ullman. The Theory of Parsing, Translation and Compiling, Vol. I: Parsing. Prentice Hall, Englewood Cliffs, N.J., 1972.
Google Scholar
J. K. Baker. Trainable grammars for speech recognition. Speech Communication Papers for the 97th Meeting of the Acoustical Society of America, pages 547–550, 1979.
Google Scholar
J. W. Brown, E. S. Haas, B. D. James, D. A. Hunt, J. S. Liu, and N. R. Pace. Phylogenetic analysis and evolution of RNase P RNA in proteobacteria. Journal of Bacteriology, 173:3855–3863, 1991.
Google Scholar
M. P. Brown, R. Hughey, A. Krogh, I. S. Mian, K. Sjölander, and D. Haussler. Dirichlet mixture priors for HMMs. In preparation, 1993.
Google Scholar
M. P. Brown, R. Hughey, A. Krogh, I. S. Mian, K. Sjölander, and D. Haussler. Using Dirichlet mixture priors to derive hidden Markov models for protein families. In L. Hunter, D. Searls, and J. Shavlik, editors, Proc. of First Int. Conf. on Intelligent Systems for Molecular Biology, pages 47–55, Memo Park, CA, July 1993. AAAI/MIT Press.
Google Scholar
S. R. Eddy and R. Durbin. RNA sequence analysis using covariance models. Submitted to Nucleic Acids Research, 1994.
Google Scholar
J. Engelfriet and G. Rozenberg. Graph grammars based on node rewriting: An introduction to NLC graph grammars. In E. Ehrig, H. J. Kreowski, and G. Rozenberg, editors, Lecture Notes in Computer Science, volume 532, pages 12–23. Springer-Verlag, 1991.
Google Scholar
K. S. Fu. Syntactic pattern recognition and applications. Prentice-Hall, Englewood Cliffs, NJ, 1982.
Google Scholar
G. E. Fox and C. R. Woese. 5S RNA secondary structure. Nature, 256:505–507, 1975.
Google Scholar
M. Gouy. Secondary structure prediction of RNA. In M. J. Bishop and C. R. Rawlings, editors, Nucleic acid and protein sequence analysis, a practical approach, pages 259–284. IRL Press, Oxford, England, 1987.
Google Scholar
C. Guthrie and B. Patterson. Spliceosomal snRNAs. Annual Review of Genetics, 22:387–419, 1988.
Google Scholar
R. R. Gutell, A. Power, G. Z. Hertz, E. J. Putz, and G. D. Stormo. Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. Nucleic Acids Research, 20:5785–5795, 1992.
Google Scholar
D. Haussler, A. Krogh, I. S. Mian, and K. Sjölander. Protein modeling using hidden Markov models: Analysis of globins. In Proceedings of the Hawaii International Conference on System Sciences, volume 1, pages 792–802, Los Alamitos, CA, 1993. IEEE Computer Society Press.
Google Scholar
T. Klinger and D. Brutlag. Detection of correlations in tRNA sequences with structural implications. In Lawrence Hunter, David Searls, and Jude Shavlik, editors, First International Conference on Intelligent Systems for Molecular Biology, Menlo Park, 1993. AAAI Press.
Google Scholar
A. Krogh, M. Brown, I. S. Mian, K. Sjölander, and D. Haussler. Hidden Markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology, 235:1501–1531, Feb. 1994.
Google Scholar
Allan Lapedes. Private communication, 1992.
Google Scholar
N. Larsen, G. J. Olsen, B. L. Maidak, M. J. McCaughey, R. Overbeek, T. J. Macke, T. L. Marsh, and C. R. Woese. The ribosomal database project. Nucleic Acids Research, 21:3021–3023, 1993.
Google Scholar
R. H. Lathrop and T. F. Smith. A branch-and-bound algorithm for optimal protein threading with pairwise (contact potential) amino acid interactions. In Proceedings of the 27th Hawaii International Conference on System Sciences, Los Alamitos, CA, 1994. IEEE Computer Society Press.
Google Scholar
K. Lari and S. J. Young. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language, 4:35–56, 1990.
Google Scholar
F. Michel, A. D. Ellington, S. Couture, and J. W. Szostak. Phylogenetic and genetic evidence for base-triples in the catalytic domain of group I introns. Nature, 347:578–580, 1990.
Google Scholar
F. Michel, K. Umesono, and H. Ozeki. Comparative and functional anatomy of group II catalytic introns-a review. Gene, 82:5–30, 1989.
Google Scholar
F. Michel and E. Westhof. Modelling of the three-dimensional architecture of group I catalytic introns based on comparative sequence analysis. Journal of Molecular Biology, 216:585–610, 1990.
Google Scholar
R. Nussinov, G. Pieczenik, J. R. Griggs, and D. J. Kleitman. Algorithms for loop matchings. SIAM Journal of Applied Mathematics, 35:68–82, 1978.
Google Scholar
L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE, 77(2):257–286, 1989.
Google Scholar
W. Saenger. Principles of nucleic acid structure. Springer Advanced Texts in Chemistry. Springer-Verlag, New York, 1984.
Google Scholar
Y. Sakakibara. Efficient learning of context-free grammars from positive structural examples. Information and Computation, 97:23–60, 1992.
Google Scholar
D. Sankoff. Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J. Appl. Math., 45:810–825, 1985.
Google Scholar
Y. Sakakibara, M. Brown, R. Hughey, I. S. Mian, K. Sjölander, R. Underwood, and D. Haussler. The application of stochastic context-free grammars to folding, aligning and modeling homologous RNA sequences. Submitted for publication, 1993.
Google Scholar
Y. Sakakibara, M. Brown, I. S. Mian, R. Underwood, and D. Haussler. Stochastic context-free grammars for modeling RNA. In Proceedings of the Hawaii International Conference on System Sciences, Los Alamitos, CA, 1994. IEEE Computer Society Press.
Google Scholar
Y. Sakakibara, M. Brown, R. Underwood, I. S. Mian, and D. Haussler. Stochastic context-free grammars for modeling RNA. Technical Report UCSC-CRL-93-16, UC Santa Cruz, Computer and Information Sciences Dept., Santa Cruz, CA 95064, 1993.
Google Scholar
T. J. Santner and D. E. Duffy. The Statistical Analysis of Discrete Data. Springer Verlag, New York, 1989.
Google Scholar
D. B. Searls and S. Dong. A syntactic pattern recognition system for DNA sequences. In Proc. 2nd Int. Conf. on Bioinformatics, Supercomputing and complex genome analysis. World Scientific, 1993. In press.
Google Scholar
David B. Searls. The linguistics of DNA. American Scientist, 80:579–591, November–December 1992.
Google Scholar
D. B. Searls. The computational linguistics of biological sequences. In Artificial Intelligence and Molecular Biology, chapter 2, pages 47–120. AAAI Press, 1993.
Google Scholar
D. B. Searls. String variable grammar: a logic grammar formalism for DNA sequences, 1993. Unpublished.
Google Scholar
B. A. Shapiro and K. Zhang. Comparing multiple RNA secondary structures using tree comparisons. CABIOS, 6(4):309–318, 1990.
Google Scholar
A. J. Tranguch and D. R. Engelke. Comparative structural analysis of nuclear RNase P RNAs from yeast. Journal of Biological Chemistry, 268:14045–1455, 1993.
Google Scholar
D. H. Turner, N. Sugimoto, and S. M. Freier. RNA structure prediction. Annual Review of Biophysics and Biophysical Chemistry, 17:167–192, 1988.
Google Scholar
I. Tinoco Jr., O. C. Uhlenbeck, and M. D. Levine. Estimation of secondary structure in ribonucleic acids. Nature, 230:363–367, 1971.
Google Scholar
J. W. Thatcher and J. B. Wright. Generalized finite automata theory with an application to a decision problem of second-order logic. Mathematical Systems Theory, 2:57–81, 1968.
Google Scholar
M. S. Waterman. Computer analysis of nucleic acid sequences. Methods in Enzymology, 164:765–792, 1988.
Google Scholar
M. S. Waterman. Consensus methods for folding single-stranded nucleic acids. In M. S. Waterman, editor, Mathematical Methods for DNA Sequences, chapter 8. CRC Press, 1989.
Google Scholar
C. R. Woese, R. R. Gutell, R. Gupta, and H. F. Noller. Detailed analysis of the higher-order structure of 16S-like ribosomal ribonucleic acids. Microbiology Reviews, 47(4):621–669, 1983.
Google Scholar
S. Winker, R. Overbeek, C.R. Woese, G.J. Olsen, and N. Pfluger. Structure detection through automated covariance search. Computer Applications in the Biosciences, 6:365–371, 1990.
Google Scholar
J. R. Wyatt, J. D. Puglisi, and I. Tinoco Jr. RNA folding: pseudoknots, loops and bulges. BioEssays, 11(4):100–106, 1989.
Google Scholar
M. Zuker. On finding all suboptimal foldings of an RNA molecule. Science, 244:48–52, 1989.
Google Scholar
C. Zwieb. Structure and function of signal recognition particle RNA. Progress in Nucleic Acid Research and Molecular Biology, 37:207–234, 1989.
Google Scholar

Download references

Author information

Yasubumi Sakakibara
Present address: Fujitsu Labs Ltd., ISIS, 140, Miyamoto, Numazu, 410-03, Shizuoka, Japan

Authors and Affiliations

Computer and Information Sciences, University of California, 95064, Santa Cruz, CA, USA
Yasubumi Sakakibara, Michael Brown, Kimmen Sjölander, Rebecca C. Underwood & David Haussler
Computer Engineering, University of California, 95064, Santa Cruz, CA, USA
Richard Hughey
Sinsheimer Laboratories, University of California, 95064, Santa Cruz, CA, USA
Saira Mian

Authors

Yasubumi Sakakibara
View author publications
You can also search for this author in PubMed Google Scholar
Michael Brown
View author publications
You can also search for this author in PubMed Google Scholar
Richard Hughey
View author publications
You can also search for this author in PubMed Google Scholar
Saira Mian
View author publications
You can also search for this author in PubMed Google Scholar
Kimmen Sjölander
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca C. Underwood
View author publications
You can also search for this author in PubMed Google Scholar
David Haussler
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Maxime Crochemore Dan Gusfield

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sakakibara, Y. et al. (1994). Recent methods for RNA modeling using stochastic context-free grammars. In: Crochemore, M., Gusfield, D. (eds) Combinatorial Pattern Matching. CPM 1994. Lecture Notes in Computer Science, vol 807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58094-8_25

Download citation

DOI: https://doi.org/10.1007/3-540-58094-8_25
Published: 07 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58094-2
Online ISBN: 978-3-540-48450-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics