Advertisement

Journal of Mathematical Biology

, Volume 69, Issue 6–7, pp 1743–1772 | Cite as

Asymptotic distribution of motifs in a stochastic context-free grammar model of RNA folding

  • Svetlana PoznanovićEmail author
  • Christine E. Heitsch
Article

Abstract

We analyze the distribution of RNA secondary structures given by the Knudsen–Hein stochastic context-free grammar used in the prediction program Pfold. Our main theorem gives relations between the expected number of these motifs—independent of the grammar probabilities. These relations are a consequence of proving that the distribution of base pairs, of helices, and of different types of loops is asymptotically Gaussian in this model of RNA folding. Proof techniques use singularity analysis of probability generating functions. We also demonstrate that these asymptotic results capture well the expected number of RNA base pairs in native ribosomal structures, and certain other aspects of their predicted secondary structures. In particular, we find that the predicted structures largely satisfy the expected relations, although the native structures do not.

Keywords

RNA secondary structure Stochastic context-free grammar  Central limit theorem 

Mathematics Subject Classification (2000)

92D20 05A16 60F05 

Notes

Acknowledgments

The authors would like to thank Christian Reidys for useful comments on an earlier version of these results, David Esposito for implementing the CYK parsing and running the predictions, and the reviewers for their thoughtful comments which helped improve the presentation in this article.

References

  1. Anderson J, Tataru P, Staines J, Hein J, Lyngsø R (2012) Evolving stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinform 13(1):78CrossRefGoogle Scholar
  2. Cannone J, Subramanian S, Schnare M, Collett J, D’Souza L, Du Y, Feng B, Lin N, Madabusi L, Muller K, Pande N, Shang Z, Yu N, Gutell R (2002) The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinform 3:2, [Correction: (2002), BMC Bioinformatics 3:15]Google Scholar
  3. Clote P, Kranakis E, Krizanc D, Salvy B (2009) Asymptotics of canonical and saturated RNA secondary structures. J Bioinform Computat Biol 7(05):869–893CrossRefGoogle Scholar
  4. Clote P, Ponty Y, Steyaert J-M (2012) Expected distance between terminal nucleotides of RNA secondary structures. J Math Biol 65(3):581–599Google Scholar
  5. De Chaumont M, Viennot G (1984) Polynômes orthogonaux et problemes dénumération en biologie moléculaire. Séminaire Lotharingien de Combinatoire 8Google Scholar
  6. Denise A, Ponty Y, Termier M (2010) Controlled non-uniform random generation of decomposable structures. Theor Comput Sci 411(40):3527–3552CrossRefzbMATHMathSciNetGoogle Scholar
  7. Doshi KJ, Cannone JJ, Cobaugh CW, R GR, (2004) Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction. BMC Bioinform 5:105Google Scholar
  8. Dowell RD, Eddy SR (2004) Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinform 5:14CrossRefGoogle Scholar
  9. Drmota M (1997) Systems of functional equations. Random Struct Algorithms 10(1–2):103–124CrossRefzbMATHMathSciNetGoogle Scholar
  10. Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  11. Eddy SR, Durbin R (1994) RNA sequence analysis using covariance models. Nucleic Acids Res 22:2079–2088CrossRefGoogle Scholar
  12. Flajolet P, Odlyzko AM (1990) Singularity analysis of generating functions. SIAM J Discret Math 3:216–240CrossRefzbMATHMathSciNetGoogle Scholar
  13. Flajolet P, Sedgewick R (2009) Analytic combinatorics. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  14. Fontana W, Konings D, Stadler P, Schuster P (2004) Statistics of RNA secondary structures. Biopolymers 33(9):1389–1404CrossRefGoogle Scholar
  15. Hofacker I, Schuster P, Stadler P (1998) Combinatorics of RNA secondary structures. Discret Appl Math 88(1):207–237CrossRefzbMATHMathSciNetGoogle Scholar
  16. Knudsen B, Hein JJ (1999) Using stochastic context-free grammars and molecular evolution to predict RNA secondary structure. Bioinformatics 15:446–454CrossRefGoogle Scholar
  17. Knudsen B, Hein JJ (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res 31:3423–3428CrossRefGoogle Scholar
  18. Knudsen M (2005) Stochastic context-free grammars and RNA secondary structure prediction. PhD thesis, Aarhus Universitet, Datalogisk InstitutGoogle Scholar
  19. Lorenz W, Ponty Y, Clote P (2008) Asymptotics of RNA shapes. J Comput Biol 15(1):31–63CrossRefMathSciNetGoogle Scholar
  20. Mathews DH, Turner DH (2006) Prediction of RNA secondary structure by free energy minimization. Curr Opin Struct Biol 16:270–278CrossRefGoogle Scholar
  21. Nebel M (2002a) Combinatorial properties of RNA secondary structures. J Comput Biol 9(3):541–573CrossRefMathSciNetGoogle Scholar
  22. Nebel M (2002b) On a statistical filter for RNA secondary structures. Johann-Wolfgang-Goethe-Univ., Inst. für InformatikGoogle Scholar
  23. Nebel M (2003) Identifying good predictions of RNA secondary structure. In: RB Altman, AK Dunker, L. Hunter, TE Klein (eds) Pacific symposium on biocomputing, vol 9, pp 423–434Google Scholar
  24. Nebel M (2004) Investigation of the Bernoulli model for RNA secondary structures. Bull Math Biol 66(5):925–964CrossRefMathSciNetGoogle Scholar
  25. Nebel M, Scheid A (2011) Analysis of the free energy in a stochastic RNA secondary structure model. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 8(6):1468–1482CrossRefGoogle Scholar
  26. Nebel M, Reidys C, Wang R (2011) Loops in canonical RNA pseudoknot structures. J Comput Biol 18(12):1793–1806CrossRefMathSciNetGoogle Scholar
  27. Sakakibara Y, Brown M, Hughey R, Mian IS, Sjölander K, Underwood RC, Haussler D (1994) Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res 22:5112–5120CrossRefGoogle Scholar
  28. Scheid A, Nebel M (2012) Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures. BMC Bioinform 13(1):159CrossRefGoogle Scholar
  29. Schützenberger MP (1963) On context-free languages and push-down automata. Inform control 6:246–264CrossRefzbMATHGoogle Scholar
  30. Sukosd Z, Knudsen B, Vaerum M, Kjems J, SAndersen E (2011) Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars. BMC Bioinform 12:103CrossRefGoogle Scholar
  31. Turner DH, Mathews DH (2010) NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res 38:D280–D282CrossRefGoogle Scholar
  32. Waterman M (1978) Secondary structure of single-stranded nucleic acids. Adv Math Suppl Stud 1:167–212MathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Department of Mathematical SciencesClemson UniversityClemsonUSA
  2. 2.School of MathematicsGeorgia Institute of TechnologyAtlantaUSA

Personalised recommendations