SCFGs in RNA Secondary Structure Prediction : A Hands-on Approach

Sükösd, Zsuzsanna; Andersen, Ebbe S.; Lyngsø, Rune

doi:10.1007/978-1-62703-709-9_8

SCFGs in RNA Secondary Structure Prediction : A Hands-on Approach

Zsuzsanna Sükösd⁴,
Ebbe S. Andersen⁵ &
Rune Lyngsø⁶

Protocol
First Online: 01 January 2013

8150 Accesses
1 Citations

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1097))

Abstract

Stochastic context-free grammars (SCFGs) were first established in the context of natural language modelling, and only later found their applications in RNA secondary structure prediction. In this chapter, we discuss the basic SCFG algorithms (CYK and inside–outside algorithms) in an application-centered manner and use the pfold grammar as a case study to show how the algorithms can be adapted to a grammar in a nonstandard form. We extend our discussion to the use of grammars with additional information (such as evolutionary information) to improve the quality of predictions. Finally, we provide a brief survey of programs that use stochastic context-free grammars for RNA secondary structure prediction and modelling.

This is a preview of subscription content, log in via an institution.

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

Notes

1.
The symbol^∗ is the Kleene star operator; it is a widely used regular expression and denotes any (possibly empty) string that is produced by concatenating elements drawn from the set. The elements can occur any number of times and in any order in this string.
2.
In practice, we need not even store the array of backpointers at all. Starting from P[1, n, S], one can simply do the calculations “backwards” and determine which choice must have been made in each step.
3.
They can also be sequence-structure pairs, depending on the model.
4.
A similar approach has recently been applied to a thermodynamic model, see [7].

References

Chomsky N (1956) Three models for the description of language. IRE Trans Inf Theory 2(3):113–124
Article Google Scholar
Younger DH (1967) Recognition and parsing of context-free languages in time n3. Inf Control 10(2):189–208
Article Google Scholar
Baker JK (1979) Trainable grammars for speech recognition. Speech communication papers for the 97th meeting of the acoustical society of America, pp 547–550, Boston, MA, 1979
Google Scholar
Rivas E, Eddy SR (2000) The language of RNA: A formal grammar that includes pseudoknots. Bioinformatics 16(4): 334–340
Article CAS PubMed Google Scholar
Sükösd Z, Knudsen B, Værum M, Kjems J, Andersen ES (2011) Mulithreaded comparative RNA secondary structure prediction using stochastic context-free grammars. BMC Bioinformatics 12:103
Article PubMed Central PubMed Google Scholar
Xia F, Dou Y, Zhou D, Li X (2010) Fine-grained parallel RNA secondary structure prediction using SCFGs on FGA. Parallel Comput 36:516–530
Article Google Scholar
Lu ZJ, Gloor JW, Mathews DH (2009) Improved RNA secondary structure prediction by maximizing expected pair accuracy. RNA 10:1805–1813
Article Google Scholar
Sudkamp TA (2005) Languages and machines: An introduction to the theory of computer science, 3rd edn. Addison Wesley, Reading, MA
Google Scholar
Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: Probalistic models of proteins and nucleic acids. Cambridge University Press, Cambridge
Book Google Scholar
Knudsen B, Hein J (1999) RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15(6):446–454
Article CAS PubMed Google Scholar
Knudsen B, Hein J (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res. 31(13):3423–3428
Article CAS PubMed Central PubMed Google Scholar
Pedersen JS, MeyerI, Forsberg R, Simmonds P, Hein J (2004) A comparative method for finding and folding RNA secondary structures within protein-coding regions. Nucleic Acids Res. 32:4925–4936
Article CAS PubMed Central PubMed Google Scholar
Klosterman P, Uzilov A, Bendana Y, Bradley R, Chao S, Kosiol C, Goldman N, Holmes I (2006) Xrate: a fast prototyping, training and annotation tool for phylo-grammars. BMC Bioinformatics 7(1):428
Article PubMed Central PubMed Google Scholar
Nawrocki EP, Kolbe DL, Eddy SR (2009) Infernal 1.0: Inference of RNA alignments. Bioinformatics 25:1335–1337
Article CAS PubMed Central PubMed Google Scholar
Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A (2005) Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33: D121–D124
Article CAS PubMed Central PubMed Google Scholar
Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A (2009) Rfam: updates to the RNA families database. Nucleic Acids Res. 37: D136–D140
Article CAS PubMed Central PubMed Google Scholar
Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25:955–964
Article CAS PubMed Central PubMed Google Scholar
Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, Haussler D (2006) Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol 2(4):e33
Article CAS PubMed Central PubMed Google Scholar
Dowell RD, Eddy SR (2006) Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. BMC Bioinformatics 7:400
Article PubMed Central PubMed Google Scholar
Sankoff D (1985) Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J Appl Math 45(5): 810–825
Article Google Scholar
Do CB, Woods DA, Batzoglou S (2006) Contrafold: RNA secondary structure prediction without physics-based models. Bioinformatics 22(14):90–98
Article Google Scholar
Bradley RK, Pachter L, Holmes I (2008) Specific alignment of structured RNA: Stochastic grammars and sequence annealing. Bioinformatics 24(23): 2677–2683
Article CAS PubMed Central PubMed Google Scholar
Holmes I (2005) Accelerated probabilistic inference of RNA structure evolution. BMC Bioinformatics 6:73
Article PubMed Central PubMed Google Scholar

Download references

Acknowledgments

ZS would like to thank Robert Giegerich and Paula Tataru for their comments on the manuscript, and Christine Heitsch and her group at Georgia Tech for useful discussions.

Author information

Authors and Affiliations

Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
Zsuzsanna Sükösd
Interdisciplinary Nanoscience Center, Aarhus University, Aarhus, Denmark
Ebbe S. Andersen
Department of Statistics, University of Oxford, Oxford, UK
Rune Lyngsø

Authors

Zsuzsanna Sükösd
View author publications
You can also search for this author in PubMed Google Scholar
Ebbe S. Andersen
View author publications
You can also search for this author in PubMed Google Scholar
Rune Lyngsø
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for non-coding RNA in Technology and Health, IKVH University of Copenhagen, Frederiksberg, Denmark
Jan Gorodkin
University of Washington Dept. Computer Science & Engineering, Seattle, Washington, USA
Walter L. Ruzzo

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Sükösd, Z., Andersen, E.S., Lyngsø, R. (2014). SCFGs in RNA Secondary Structure Prediction : A Hands-on Approach. In: Gorodkin, J., Ruzzo, W. (eds) RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods. Methods in Molecular Biology, vol 1097. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-709-9_8

Download citation

DOI: https://doi.org/10.1007/978-1-62703-709-9_8
Published: 02 December 2013
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-708-2
Online ISBN: 978-1-62703-709-9
eBook Packages: Springer Protocols

Publish with us

Policies and ethics