Skip to main content

SCFGs in RNA Secondary Structure Prediction : A Hands-on Approach

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1097))

Abstract

Stochastic context-free grammars (SCFGs) were first established in the context of natural language modelling, and only later found their applications in RNA secondary structure prediction. In this chapter, we discuss the basic SCFG algorithms (CYK and inside–outside algorithms) in an application-centered manner and use the pfold grammar as a case study to show how the algorithms can be adapted to a grammar in a nonstandard form. We extend our discussion to the use of grammars with additional information (such as evolutionary information) to improve the quality of predictions. Finally, we provide a brief survey of programs that use stochastic context-free grammars for RNA secondary structure prediction and modelling.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

Notes

  1. 1.

    The symbol is the Kleene star operator; it is a widely used regular expression and denotes any (possibly empty) string that is produced by concatenating elements drawn from the set. The elements can occur any number of times and in any order in this string.

  2. 2.

    In practice, we need not even store the array of backpointers at all. Starting from P[1, n, S], one can simply do the calculations “backwards” and determine which choice must have been made in each step.

  3. 3.

    They can also be sequence-structure pairs, depending on the model.

  4. 4.

    A similar approach has recently been applied to a thermodynamic model, see [7].

References

  1. Chomsky N (1956) Three models for the description of language. IRE Trans Inf Theory 2(3):113–124

    Article  Google Scholar 

  2. Younger DH (1967) Recognition and parsing of context-free languages in time n3. Inf Control 10(2):189–208

    Article  Google Scholar 

  3. Baker JK (1979) Trainable grammars for speech recognition. Speech communication papers for the 97th meeting of the acoustical society of America, pp 547–550, Boston, MA, 1979

    Google Scholar 

  4. Rivas E, Eddy SR (2000) The language of RNA: A formal grammar that includes pseudoknots. Bioinformatics 16(4): 334–340

    Article  CAS  PubMed  Google Scholar 

  5. Sükösd Z, Knudsen B, Værum M, Kjems J, Andersen ES (2011) Mulithreaded comparative RNA secondary structure prediction using stochastic context-free grammars. BMC Bioinformatics 12:103

    Article  PubMed Central  PubMed  Google Scholar 

  6. Xia F, Dou Y, Zhou D, Li X (2010) Fine-grained parallel RNA secondary structure prediction using SCFGs on FGA. Parallel Comput 36:516–530

    Article  Google Scholar 

  7. Lu ZJ, Gloor JW, Mathews DH (2009) Improved RNA secondary structure prediction by maximizing expected pair accuracy. RNA 10:1805–1813

    Article  Google Scholar 

  8. Sudkamp TA (2005) Languages and machines: An introduction to the theory of computer science, 3rd edn. Addison Wesley, Reading, MA

    Google Scholar 

  9. Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: Probalistic models of proteins and nucleic acids. Cambridge University Press, Cambridge

    Book  Google Scholar 

  10. Knudsen B, Hein J (1999) RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15(6):446–454

    Article  CAS  PubMed  Google Scholar 

  11. Knudsen B, Hein J (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res. 31(13):3423–3428

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  12. Pedersen JS, MeyerI, Forsberg R, Simmonds P, Hein J (2004) A comparative method for finding and folding RNA secondary structures within protein-coding regions. Nucleic Acids Res. 32:4925–4936

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  13. Klosterman P, Uzilov A, Bendana Y, Bradley R, Chao S, Kosiol C, Goldman N, Holmes I (2006) Xrate: a fast prototyping, training and annotation tool for phylo-grammars. BMC Bioinformatics 7(1):428

    Article  PubMed Central  PubMed  Google Scholar 

  14. Nawrocki EP, Kolbe DL, Eddy SR (2009) Infernal 1.0: Inference of RNA alignments. Bioinformatics 25:1335–1337

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  15. Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A (2005) Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33: D121–D124

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  16. Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A (2009) Rfam: updates to the RNA families database. Nucleic Acids Res. 37: D136–D140

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  17. Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25:955–964

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  18. Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, Haussler D (2006) Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol 2(4):e33

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  19. Dowell RD, Eddy SR (2006) Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. BMC Bioinformatics 7:400

    Article  PubMed Central  PubMed  Google Scholar 

  20. Sankoff D (1985) Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J Appl Math 45(5): 810–825

    Article  Google Scholar 

  21. Do CB, Woods DA, Batzoglou S (2006) Contrafold: RNA secondary structure prediction without physics-based models. Bioinformatics 22(14):90–98

    Article  Google Scholar 

  22. Bradley RK, Pachter L, Holmes I (2008) Specific alignment of structured RNA: Stochastic grammars and sequence annealing. Bioinformatics 24(23): 2677–2683

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  23. Holmes I (2005) Accelerated probabilistic inference of RNA structure evolution. BMC Bioinformatics 6:73

    Article  PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgments

ZS would like to thank Robert Giegerich and Paula Tataru for their comments on the manuscript, and Christine Heitsch and her group at Georgia Tech for useful discussions.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this protocol

Cite this protocol

Sükösd, Z., Andersen, E.S., Lyngsø, R. (2014). SCFGs in RNA Secondary Structure Prediction : A Hands-on Approach. In: Gorodkin, J., Ruzzo, W. (eds) RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods. Methods in Molecular Biology, vol 1097. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-709-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-709-9_8

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-708-2

  • Online ISBN: 978-1-62703-709-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics