Abstract
In the absence of chaperone molecules, RNA folding is believed to depend on the distribution of kinetic traps in the energy landscape of all secondary structures. Kinetic traps in the Nussinov energy model are precisely those secondary structures that are saturated, meaning that no base pair can be added without introducing either a pseudoknot or base triple. In this paper, we compute the asymptotic expected number of hairpins in saturated structures. For instance, if every hairpin is required to contain at least θ=3 unpaired bases and the probability that any two positions can base-pair is p=3/8, then the asymptotic number of saturated structures is 1.34685⋅n −3/2⋅1.62178n, and the asymptotic expected number of hairpins follows a normal distribution with mean \(0.06695640 \cdot n + 0.01909350 \cdot\sqrt{n} \cdot\mathcal{N}\). Similar results are given for values θ=1,3, and p=1,1/2,3/8; for instance, when θ=1 and p=1, the asymptotic expected number of hairpins in saturated secondary structures is 0.123194⋅n, a value greater than the asymptotic expected number 0.105573⋅n of hairpins over all secondary structures. Since RNA binding targets are often found in hairpin regions, it follows that saturated structures present potentially more binding targets than nonsaturated structures, on average. Next, we describe a novel algorithm to compute the hairpin profile of a given RNA sequence: given RNA sequence a 1,…,a n , for each integer k, we compute that secondary structure S k having minimum energy in the Nussinov energy model, taken over all secondary structures having k hairpins. We expect that an extension of our algorithm to the Turner energy model may provide more accurate structure prediction for particular RNAs, such as tRNAs and purine riboswitches, known to have a particular number of hairpins. Mathematica™ computations, C and Python source code, and additional supplementary information are available at the website http://bioinformatics.bc.edu/clotelab/RNAhairpinProfile/.
Similar content being viewed by others
Notes
In the Nussinov energy landscape, due to degeneracy of the model, the minimum energy structure may not be unique. Indeed, in Clote (2006), we show that even RNA homopolymers have quadratically many minimum energy structures.
In Theorem 10 of Nebel (2002), it is shown that the number of unpaired nucleotides is asymptotically equal to \(\frac{n}{\sqrt{5}}\), where the stated result follows. One can compare as well with the asymptotic number of hairpins in k-noncrossing structures, given in Table 2 of Nebel et al. (2011b).
In Theorem 16 of Nebel (2002), it is shown that the expected number of hairpins over all secondary structures is asymptotically equivalent to \((1-\frac{2 \sqrt{5}}{5}) \cdot n \sim0.105573 \cdot n\).
Subscript notation is used for partial derivatives.
References
Andronescu, M., Bereg, V., Hoos, H. H., & Condon, A. (2008). RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinform., 9, 340.
Clote, P. (2005). An efficient algorithm to compute the landscape of locally optimal RNA secondary structures with respect to the Nussinov–Jacobson energy model. J. Comput. Biol., 12(1), 83–101.
Clote, P. (2006). Combinatorics of saturated secondary structures of RNA. J. Comput. Biol., 13(9), 1640–1657.
Clote, P., Kranakis, E., Krizanc, D., & Salvy, B. (2009). Asymptotics of canonical and saturated RNA secondary structures. J. Bioinform. Comput. Biol., 7(5), 869–893.
Clote, P., Dobrev, S., Dotu, I., Kranakis, E., Krizanc, D., & Urrutia, J. (2012). On the page number of RNA secondary structures with pseudoknots. J. Math. Biol., 65(6–7), 1337–1357.
Danilova, L. V., Pervouchine, D. D., Favorov, A. V., & Mironov, A. A. (2006). RNAKinetics: a web server that models secondary structure kinetics of an elongating RNA. J. Bioinform. Comput. Biol., 4(2), 589–596.
Drmota, M. (1997). Systems of functional equations. Random Struct. Algorithms, 10(1–2), 103–124.
Drmota, M., Fusy, É., Jué, J., Kang, M., & Kraus, V. (2011). Asymptotic study of subcritical graph classes. SIAM J. Discrete Math., 25(4), 1615–1651.
Flajolet, P., & Sedgewick, R. (2009). Analytic combinatorics. Cambridge: Cambridge University Press.
Flamm, C., Fontana, W., Hofacker, I. L., & Schuster, P. (2000). RNA folding at elementary step resolution. RNA, 6, 325–338.
Fusy, E., & Clote, P. (2012). Combinatorics of locally optimal RNA secondary structures. J. Math. Biol., 2012 Dec 22 [Epub ahead of print]. PMID: 23263300.
Gardner, P. P., Daub, J., Tate, J., Moore, B. L., Osuch, I. H., Griffiths-Jones, S., Finn, R. D., Nawrocki, E. P., Kolbe, D. L., Eddy, S. R., & Bateman, A. (2011). Rfam: wikipedia, clans and the “decimal” release. Nucleic Acids Res., 39(Database), D141–D145.
Griffiths-Jones, S. (2006). Mirbase: the microRNA sequence database. Methods Mol. Biol., 342, 129–138.
Gutell, R. R. (1994). Collection of small subunit (16 S- and 16 S-like) ribosomal RNA structures. Nucleic Acids Res., 22, 3502–3507.
Hofacker, I. L. (2003). Vienna RNA secondary structure server. Nucleic Acids Res., 31, 3429–3431.
Hofacker, I. L., Schuster, P., & Stadler, P. F. (1998). Combinatorics of RNA secondary structures. Discrete Appl. Math., 88, 207–237.
Jin, E. Y., & Reidys, C. M. (2008). Asymptotic enumeration of RNA structures with pseudoknots. Bull. Math. Biol., 70(4), 951–970.
Knudsen, B., & Hein, J. (2003). Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res., 31(13), 3423–3428.
Lang, S. (2002). Algebra (revised 3rd ed.). Berlin: Springer.
Li, T. J., & Reidys, C. M. (2011). Combinatorial analysis of interacting RNA molecules. Math. Biosci., 233(1), 47–58.
Li, T. J., & Reidys, C. M. (2012). Combinatorics of RNA–RNA interaction. J. Math. Biol., 64(3), 529–556.
Lorenz, W. A., Ponty, Y., & Clote, P. (2008). Asymptotics of RNA shapes. J. Comput. Biol., 15(1), 31–63.
Lowe, T., & Eddy, S. (1997). tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res., 25(5), 955–964.
Markham, N. R., & Zuker, M. (2008). UNAFold: software for nucleic acid folding and hybridization. Methods Mol. Biol., 453, 3–31.
Mathews, D. H. (2004). Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA, 10(8), 1178–1190.
Muller, U. R., & Fitch, W. M. (1982). Evolutionary selection for perfect hairpin structures in viral DNAs. Nature, 298(5874), 582–585.
Nebel, M. E. (2002). Combinatorial properties of RNA secondary structure. J. Comput. Biol., 9(3), 541–573.
Nebel, M. E. (2004). Investigation of the Bernoulli model for RNA secondary structures. Bull. Math. Biol., 66(5), 925–964.
Nebel, M. E., Reidys, C. M., & Wang, R. R. (2011a). Loops in canonical RNA pseudoknot structures. J. Comput. Biol., 18(12), 1793–1806.
Nebel, N. E., Reidys, C. M., & Wang, R. R. (2011b). Loops in canonical RNA pseudoknot structures. J. Comput. Biol., 18(12), 1793–1806.
Nussinov, R., & Jacobson, A. B. (1980). Fast algorithm for predicting the secondary structure of single stranded RNA. Proc. Natl. Acad. Sci. USA, 77(11), 6309–6313.
Reidys, C. M., & Wang, R. R. (2010). Shapes of RNA pseudoknot structures. J. Comput. Biol., 17(11), 1575–1590.
Rivas, E., Lang, R., & Eddy, S. R. (2012). A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more. RNA, 18(2), 193–212.
Rodland, E. A. (2006). Pseudoknots in RNA secondary structures: representation, enumeration, and prevalence. J. Comput. Biol., 13(6), 1197–1213.
Rose, P. W., Beran, B., Bi, C., Bluhm, W. F., Dimitropoulos, D., Goodsell, D. S., Prlic, A., Quesada, M., Quinn, G. B., Westbrook, J. D., Young, J., Yukich, B., Zardecki, C., Berman, H. M., & Bourne, P. E. (2011). The RCSB Protein Data Bank: redesigned web site and web services. Nucleic Acids Res., 39(Database), D392–D401.
Sprinzl, M., Horn, C., Brown, M., Ioudovitch, A., & Steinberg, S. (1998). Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res., 26, 148–153.
Stein, P. R., & Waterman, M. S. (1978). On some new sequences generalizing the Catalan and Motzkin numbers. Discrete Math., 26, 261–272.
Torarinsson, E., Yao, Z., Wiklund, E. D., Bramsen, J. B., Hansen, C., Kjems, J., Tommerup, N., Ruzzo, W. L., & Gorodkin, J. (2008). Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions. Genome Res., 18(2), 242–251.
Waldispuhl, J., & Clote, P. (2007). Computing the partition function and sampling for saturated secondary structures of RNA, with respect to the Turner energy model. J. Comput. Biol., 14(2), 190–215.
Waterman, M. S. (1995). Introduction to computational biology. London/Boca Raton: Chapman and Hall/CRC Press.
Weinberg, F., & Nebel, N. E. (2011). Applying length-dependent stochastic context-free grammars to RNA secondary structure prediction. Algorithms, 4(4), 223–238.
Xayaphoummine, A., Bucher, T., & Isambert, H. (2005). Kinefold web server for RNA/DNA folding path and structure prediction including pseudoknots and knots. Nucleic Acids Res., 33(Web), W605–W610.
Xia, T., SantaLucia, J. Jr., Burkard, M. E., Kierzek, R., Schroeder, S. J., Jiao, X., Cox, C., & Turner, D. H. (1999). Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson–Crick base pairs. Biochemistry, 37, 14719–14735.
Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res., 31(13), 3406–3415.
Acknowledgements
We would like to thank D.H. Mathews for generously sharing his RNA data collection, and B. Salvy and E. Fusy for discussions of Drmota’s theorem. Partial support for the research of P. Clote is from NSF grants DMS-0817971 and DBI-1262439. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Funding for the research of E. Kranakis was provided by the Natural Sciences and Engineering Research Council of Canada (NSERC) and Mathematics of Information Technology and Complex Systems (MITACS).
Author information
Authors and Affiliations
Corresponding author
Appendix: Computing the Number of Hairpins in Saturated Structures
Appendix: Computing the Number of Hairpins in Saturated Structures
To produce Fig. 1, we computed by dynamic programming the expected number of hairpins in saturated structures for a homopolymer of size n. In the interests of brevity, we must refer the interested reader to Clote (2006) for background material on recurrence relations for the number of saturated structures. The recurrence relations require the auxiliary notion of saturated structure with no visible positions, defined as follows. A secondary structure S on sequence a 1,…,a n has no visible positions, if for all 1≤i≤n in which a i is unpaired, there is no base pair (x,y) for which x<i<y.
Let D(n,k) denote the number of saturated secondary structures having exactly k hairpins. Let E(n,k) denote the number of saturated secondary structures having exactly k hairpins, which have no visible positions. Define D(0,0)=D(1,0)=D(2,0)=D(3,0)=1 and E(0,0)=E(3,1)=1; for all other values of 0≤n≤3 and 0≤k≤3, let D(n,k)=E(n,k)=0.
The inductive case is given by:
Since the justification for these recursion is similar to that of Clote (2006), we do not provide further details. These recursions are implemented using dynamic programming to compute the number of saturated structures on a homopolymer of size n having exactly k hairpins. It follows that the expected number of hairpins for a homopolymer of size n is
where \(S(n)=\sum_{k=0}^{n} D(n,k)\) is the total number of saturated structures for a homopolymer of size n. The Python code is available on the web supplement.
Definition of Resultant
In the proof of Theorem 3, we compute the resultant of two multivariable polynomials. For the benefit of the reader, we define this concept here. For any commutative ring A, indeterminate X and two multivariate polynomials
respectively having roots α 1,…,α n and β 1,…,β m in the algebraic closure of A, the resultant of p 1,p 2 with respect to X is defined to be
In applications, for instance g 1,g 2 could be functions in variables S,R,u,z, but construed to be polynomials over indeterminate R with coefficients from the ring \(\mathbb{Z}(z,u,S)\). In such a case, the resultant Res(g 1,g 2) of g 1,g 2 is a polynomial in \(\mathbb{Z}[z,u,S]\), whose roots are the z-, u- and S-coordinates of the intersection of curves corresponding to g 1,g 2. Moreover, it is known that there exist polynomials \(q_{1},q_{2} \in\mathbb{Z}[z,u,S][R]\) such that
For more background on resultants, see Lang (2002).
Rights and permissions
About this article
Cite this article
Clote, P., Kranakis, E. & Krizanc, D. Asymptotic Number of Hairpins of Saturated RNA Secondary Structures. Bull Math Biol 75, 2410–2430 (2013). https://doi.org/10.1007/s11538-013-9899-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11538-013-9899-1