Asymptotic distribution of motifs in a stochastic context-free grammar model of RNA folding
- 300 Downloads
We analyze the distribution of RNA secondary structures given by the Knudsen–Hein stochastic context-free grammar used in the prediction program Pfold. Our main theorem gives relations between the expected number of these motifs—independent of the grammar probabilities. These relations are a consequence of proving that the distribution of base pairs, of helices, and of different types of loops is asymptotically Gaussian in this model of RNA folding. Proof techniques use singularity analysis of probability generating functions. We also demonstrate that these asymptotic results capture well the expected number of RNA base pairs in native ribosomal structures, and certain other aspects of their predicted secondary structures. In particular, we find that the predicted structures largely satisfy the expected relations, although the native structures do not.
KeywordsRNA secondary structure Stochastic context-free grammar Central limit theorem
Mathematics Subject Classification (2000)92D20 05A16 60F05
The authors would like to thank Christian Reidys for useful comments on an earlier version of these results, David Esposito for implementing the CYK parsing and running the predictions, and the reviewers for their thoughtful comments which helped improve the presentation in this article.
- Cannone J, Subramanian S, Schnare M, Collett J, D’Souza L, Du Y, Feng B, Lin N, Madabusi L, Muller K, Pande N, Shang Z, Yu N, Gutell R (2002) The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinform 3:2, [Correction: (2002), BMC Bioinformatics 3:15]Google Scholar
- Clote P, Ponty Y, Steyaert J-M (2012) Expected distance between terminal nucleotides of RNA secondary structures. J Math Biol 65(3):581–599Google Scholar
- De Chaumont M, Viennot G (1984) Polynômes orthogonaux et problemes dénumération en biologie moléculaire. Séminaire Lotharingien de Combinatoire 8Google Scholar
- Doshi KJ, Cannone JJ, Cobaugh CW, R GR, (2004) Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction. BMC Bioinform 5:105Google Scholar
- Knudsen M (2005) Stochastic context-free grammars and RNA secondary structure prediction. PhD thesis, Aarhus Universitet, Datalogisk InstitutGoogle Scholar
- Nebel M (2002b) On a statistical filter for RNA secondary structures. Johann-Wolfgang-Goethe-Univ., Inst. für InformatikGoogle Scholar
- Nebel M (2003) Identifying good predictions of RNA secondary structure. In: RB Altman, AK Dunker, L. Hunter, TE Klein (eds) Pacific symposium on biocomputing, vol 9, pp 423–434Google Scholar