SCFGs in RNA Secondary Structure Prediction: A Hands-on Approach

  • Zsuzsanna Sükösd
  • Ebbe S. Andersen
  • Rune Lyngsø
Part of the Methods in Molecular Biology book series (MIMB, volume 1097)


Stochastic context-free grammars (SCFGs) were first established in the context of natural language modelling, and only later found their applications in RNA secondary structure prediction. In this chapter, we discuss the basic SCFG algorithms (CYK and inside–outside algorithms) in an application-centered manner and use the pfold grammar as a case study to show how the algorithms can be adapted to a grammar in a nonstandard form. We extend our discussion to the use of grammars with additional information (such as evolutionary information) to improve the quality of predictions. Finally, we provide a brief survey of programs that use stochastic context-free grammars for RNA secondary structure prediction and modelling.

Key words

SCFGs CYK algorithm Inside–outside algorithm Pfold 



ZS would like to thank Robert Giegerich and Paula Tataru for their comments on the manuscript, and Christine Heitsch and her group at Georgia Tech for useful discussions.


  1. 1.
    Chomsky N (1956) Three models for the description of language. IRE Trans Inf Theory 2(3):113–124CrossRefGoogle Scholar
  2. 2.
    Younger DH (1967) Recognition and parsing of context-free languages in time n3. Inf Control 10(2):189–208CrossRefGoogle Scholar
  3. 3.
    Baker JK (1979) Trainable grammars for speech recognition. Speech communication papers for the 97th meeting of the acoustical society of America, pp 547–550, Boston, MA, 1979Google Scholar
  4. 4.
    Rivas E, Eddy SR (2000) The language of RNA: A formal grammar that includes pseudoknots. Bioinformatics 16(4): 334–340PubMedCrossRefGoogle Scholar
  5. 5.
    Sükösd Z, Knudsen B, Værum M, Kjems J, Andersen ES (2011) Mulithreaded comparative RNA secondary structure prediction using stochastic context-free grammars. BMC Bioinformatics 12:103PubMedCentralPubMedCrossRefGoogle Scholar
  6. 6.
    Xia F, Dou Y, Zhou D, Li X (2010) Fine-grained parallel RNA secondary structure prediction using SCFGs on FGA. Parallel Comput 36:516–530CrossRefGoogle Scholar
  7. 7.
    Lu ZJ, Gloor JW, Mathews DH (2009) Improved RNA secondary structure prediction by maximizing expected pair accuracy. RNA 10:1805–1813CrossRefGoogle Scholar
  8. 8.
    Sudkamp TA (2005) Languages and machines: An introduction to the theory of computer science, 3rd edn. Addison Wesley, Reading, MAGoogle Scholar
  9. 9.
    Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: Probalistic models of proteins and nucleic acids. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  10. 10.
    Knudsen B, Hein J (1999) RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15(6):446–454PubMedCrossRefGoogle Scholar
  11. 11.
    Knudsen B, Hein J (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res. 31(13):3423–3428PubMedCentralPubMedCrossRefGoogle Scholar
  12. 12.
    Pedersen JS, MeyerI, Forsberg R, Simmonds P, Hein J (2004) A comparative method for finding and folding RNA secondary structures within protein-coding regions. Nucleic Acids Res. 32:4925–4936PubMedCentralPubMedCrossRefGoogle Scholar
  13. 13.
    Klosterman P, Uzilov A, Bendana Y, Bradley R, Chao S, Kosiol C, Goldman N, Holmes I (2006) Xrate: a fast prototyping, training and annotation tool for phylo-grammars. BMC Bioinformatics 7(1):428PubMedCentralPubMedCrossRefGoogle Scholar
  14. 14.
    Nawrocki EP, Kolbe DL, Eddy SR (2009) Infernal 1.0: Inference of RNA alignments. Bioinformatics 25:1335–1337PubMedCentralPubMedCrossRefGoogle Scholar
  15. 15.
    Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A (2005) Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33: D121–D124PubMedCentralPubMedCrossRefGoogle Scholar
  16. 16.
    Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A (2009) Rfam: updates to the RNA families database. Nucleic Acids Res. 37: D136–D140PubMedCentralPubMedCrossRefGoogle Scholar
  17. 17.
    Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25:955–964PubMedCentralPubMedCrossRefGoogle Scholar
  18. 18.
    Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, Haussler D (2006) Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol 2(4):e33PubMedCentralPubMedCrossRefGoogle Scholar
  19. 19.
    Dowell RD, Eddy SR (2006) Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. BMC Bioinformatics 7:400PubMedCentralPubMedCrossRefGoogle Scholar
  20. 20.
    Sankoff D (1985) Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J Appl Math 45(5): 810–825CrossRefGoogle Scholar
  21. 21.
    Do CB, Woods DA, Batzoglou S (2006) Contrafold: RNA secondary structure prediction without physics-based models. Bioinformatics 22(14):90–98CrossRefGoogle Scholar
  22. 22.
    Bradley RK, Pachter L, Holmes I (2008) Specific alignment of structured RNA: Stochastic grammars and sequence annealing. Bioinformatics 24(23): 2677–2683PubMedCentralPubMedCrossRefGoogle Scholar
  23. 23.
    Holmes I (2005) Accelerated probabilistic inference of RNA structure evolution. BMC Bioinformatics 6:73PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Zsuzsanna Sükösd
    • 1
  • Ebbe S. Andersen
    • 2
  • Rune Lyngsø
    • 3
  1. 1.Bioinformatics Research CentreAarhus UniversityAarhusDenmark
  2. 2.Interdisciplinary Nanoscience CenterAarhus UniversityAarhusDenmark
  3. 3.Department of StatisticsUniversity of OxfordOxfordUK

Personalised recommendations