Abstract
We adapt here a surprising technique, the boustrophedon method, to speed up the sampling of RNA secondary structures from the Boltzmann low-energy ensemble. This technique is simple and its implementation straight-forward, as it only requires a permutation in the order of some operations already performed in the stochastic traceback stage of these algorithms. It nevertheless greatly improves their worst-case complexity from \({\mathcal{O}}({n^2})\) to \({\mathcal{O}}({n\log(n)})\) , for n the size of the original sequence. Moreover the average-case complexity of the generation is shown to be improved from \({\mathcal{O}}({n\sqrt{n}})\) to \({\mathcal{O}}({n\log(n)})\) in an Boltzmann-weighted homopolymer model based on the Nussinov–Jacobson free-energy model. These results are extended to the more realistic Turner free-energy model through experiments performed on both structured (Drosophilia melanogaster mRNA 5S) and hybrid (Staphylococcus aureus RNAIII) RNA sequences, using a boustrophedon modified version of the popular software UnaFold. This improvement allows for the sampling of greater and more significant sets of structures in a given time.
Similar content being viewed by others
References
André D. (1879). Développements de sec(x) et de tan(x). C. R. Acad. Sci. Paris 88: 965–967
Barrick J., Corbino K., Winkler W., Nahvi A., Mandal M., Collins J., Lee M., Roth A., Sudarsan N., Jona I., Wickiser J., Breaker R. (2004). New RNA motifs suggest an expanded scope for riboswitches in bacterial genetic control. Proc. Natl. Acad. Sci. USA 101(17): 6421–6426
Clote P. (2005). An efficient algorithm to compute the landscape of locally optimal RNA secondary structures with respect to the Nussinov-Jacobson energy model. J. Comput. Biol. 12(1): 83–101
Clote P. (2005). RNALOSS: a web server for RNA locally optimal secondary structures. Nucleic Acids Res. 33(Web Server issue): W600–604
Clote P., Ferre F., Kranakis E., Krizanc D. (2005). Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. RNA 11(5): 578–591
Clote P., Waldispühl J., Behzadi B., Steyaert J.M. (2005). Energy landscape of k-point mutants of an RNA molecule. Bioinformatics 21(22): 4140–4147
Ding Y. (2006). Statistical and bayesian approaches to RNA secondary structure prediction. RNA 12(3): 323–331
Ding Y., Chan C., Lawrence C. (2004). SFold web server for statistical folding and rational design of nucleic acids. Nucleic Acids Res. 32(Web Server Issue): 135–141
Ding Y., Chan C.Y., Lawrence C.E. (2005). RNA secondary structure prediction by centroids in a boltzmann weighted ensemble. RNA 11: 1157–1166
Ding Y., Lawrence E. (2003). A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. 31(24): 7280–7301
Freyhult, E., Moulton, V., Clote, P.: Rnabor: A web server for RNA structural neighbors. Nucleic Acids Res (2007) (in press)
Flajolet, P.: Singular combinatorics. In: Proceedings of the International Congress of Mathematicians, vol. 3, pp. 561–571 (2002)
Flajolet P., Odlyzko A. (1990). Singularity analysis of generating functions. SIAM J. Discrete Math. 3(2): 216–240
Flajolet, P., Zimmermann, P., Van Cutsem, B.: Calculus for the random generation of labelled combinatorial structures. A preliminary version is available in INRIA Research Report RR-1830. Theor Comput Sci 132, 1–35 (1994)
Gan N.K.H.H., Schlick T. (2007). A computational proposal for designing structured RNA pools for in vitro selection of RNAs. RNA 13: 478–492
Greene D.H., Knuth D.E. (1981). Mathematics for the Analysis of Algorithms. Birkhauser, Boston
Griffiths-Jones S., Bateman A., Marshall M., Khanna A., Eddy S.R. (2003). Rfam: an RNA family database. Nucleic Acids Res. 31(1): 439–441
Hofacker I.L., Fontana W., Stadler P.F., Bonhoeffer L.S., Tacker M., Schuster P. (1994). Fast folding and comparison of RNA secondary structures. Monatsch. Chem. 125: 167–188
Tinoco J., Borer P., Dengler B., Levin M., Uhlenbeck O., Crothers D., Bralla J. (1973). Improved estimation of secondary structure in ribonucleic acids. Nat. New Biol. 246(150): 40–41
Leontis N., Westhof E. (2001). Geometric nomenclature and classification of RNA base pairs. RNA 7: 499–512
Lescoute A., Westhof E. (2006). Topology of three-way junctions in folded RNAs. RNA 12(1): 83–93
Lesk A.M. (1974). A combinatorial study of the effects of admitting non-watson-crick base pairings and of base compositions on the helix-forming potential of polynucleotides of random sequences. J. Theor. Biol. 44: 7–17
Lorenz, W., Ponty, Y., Clote, P.: Asymptotics of RNA shapes. J. Comput. Biol. (in press, 2007)
Lyngs R.B., Pedersen C.N.S. (2000). RNA pseudoknot prediction in energy-based models. J. Comput. Biol. 7(3–4): 409–427
Markham, N.R.: Algorithms and software for nucleic acid sequences. PhD thesis, Rensselaer Polytechnic Institute (2006)
Markham N.R., Zuker M. (2005). Dinamelt web server for nucleic acid melting prediction. Nucleic Acids Res. 33: W577–W581
Mathews D. (2004). Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA 10: 1178–1190
McCaskill J. (1990). The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29: 1105–1119
Millar J., Sloane N., Young N. (1996). A new operation on sequences: The boustrophedon transform. J. Combin. Th. Ser. A 76: 44–54
Nebel M. (2003). Combinatorial properties of RNA secondary structures. J. Comput. Biol. 3(9): 541–574
Nebel M.E. (2004). Investigation of the bernoulli model for rna secondary structures. Bull. Math. Biol. 66(5): 925–964
Nussinov R., Jacobson A. (1980). Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc. Natl. Acad. Sci. USA 77: 6903–6913
Penchovsky R., Breaker R. (2005). Computational design and experimental validation of oligonucleotide-sensing allosteric ribozymes. Nat. Biotechnol. 23(11): 1424–1431
Ponty Y., Termier M., Denise A. (2006). GenRGenS: software for generating random genomic sequences and structures. Bioinformatics 22(12): 1534–1535
Salvy, B., Zimmerman, P.: Gfun: a maple package for the manipulation of generating and holonomic functions in one variable. ACM Transactions on Mathematical Softwares 20(2), 163–177 (1994). doi:10.1145/178365.178368
Steffen P., Voss B., Rehmsmeier M., Reeder J., Giegerich R. (2006). RNAshapes: an integrated RNA analysis package based on abstract shapes. Bioinformatics 22(4): 500–503
Vauchaussade de Chaumont, M., Viennot, X.: Enumeration of RNA’s secondary structures by complexity. In: Capasso, V., Grosso, E., Paven-Fontana, S. (eds.) Mathematics in Medecine and Biology, Lecture Notes in Biomathematics, vol. 57. pp. 360–365 (1985)
Voss, B., Giegerich, R., Rehmsmeier, M.: Complete probabilistic analysis of RNA shapes. BMC Biol. 4(5) (2006)
Waterman M.S. (1978). Secondary structure of single stranded nucleic acids. Adv. Math. Suppl. Stud. 1(1): 167–212
Wuchty S., Fontana W., Hofacker I.L., Schuster P. (1999). Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers 49: 145–165
Xia T., Burkard M., Kierzek R., Schroeder S., Jiao X., Cox C., Turner D., SantaLucia J. (1999). Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry 37: 14719–14735
Zhao, J., Malmberg, R., Cai, L.: Rapid ab initio RNA folding including pseudoknots via graph tree decomposition. In: Proceedings of the 6th Workshop on Algorithms in Bioinformatics (WABI 2006), vol. 4175. pp. 262–273 (2006)
Zuker M., Stiegler P. (1981). Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9: 133–148
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ponty, Y. Efficient sampling of RNA secondary structures from the Boltzmann ensemble of low-energy. J. Math. Biol. 56, 107–127 (2008). https://doi.org/10.1007/s00285-007-0137-z
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00285-007-0137-z