Skip to main content
Log in

Asymptotic Number of Hairpins of Saturated RNA Secondary Structures

  • Original Article
  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

In the absence of chaperone molecules, RNA folding is believed to depend on the distribution of kinetic traps in the energy landscape of all secondary structures. Kinetic traps in the Nussinov energy model are precisely those secondary structures that are saturated, meaning that no base pair can be added without introducing either a pseudoknot or base triple. In this paper, we compute the asymptotic expected number of hairpins in saturated structures. For instance, if every hairpin is required to contain at least θ=3 unpaired bases and the probability that any two positions can base-pair is p=3/8, then the asymptotic number of saturated structures is 1.34685⋅n −3/2⋅1.62178n, and the asymptotic expected number of hairpins follows a normal distribution with mean \(0.06695640 \cdot n + 0.01909350 \cdot\sqrt{n} \cdot\mathcal{N}\). Similar results are given for values θ=1,3, and p=1,1/2,3/8; for instance, when θ=1 and p=1, the asymptotic expected number of hairpins in saturated secondary structures is 0.123194⋅n, a value greater than the asymptotic expected number 0.105573⋅n of hairpins over all secondary structures. Since RNA binding targets are often found in hairpin regions, it follows that saturated structures present potentially more binding targets than nonsaturated structures, on average. Next, we describe a novel algorithm to compute the hairpin profile of a given RNA sequence: given RNA sequence a 1,…,a n , for each integer k, we compute that secondary structure S k having minimum energy in the Nussinov energy model, taken over all secondary structures having k hairpins. We expect that an extension of our algorithm to the Turner energy model may provide more accurate structure prediction for particular RNAs, such as tRNAs and purine riboswitches, known to have a particular number of hairpins. Mathematica computations, C and Python source code, and additional supplementary information are available at the website http://bioinformatics.bc.edu/clotelab/RNAhairpinProfile/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In the Nussinov energy landscape, due to degeneracy of the model, the minimum energy structure may not be unique. Indeed, in Clote (2006), we show that even RNA homopolymers have quadratically many minimum energy structures.

  2. In Theorem 10 of Nebel (2002), it is shown that the number of unpaired nucleotides is asymptotically equal to \(\frac{n}{\sqrt{5}}\), where the stated result follows. One can compare as well with the asymptotic number of hairpins in k-noncrossing structures, given in Table 2 of Nebel et al. (2011b).

  3. In Theorem 16 of Nebel (2002), it is shown that the expected number of hairpins over all secondary structures is asymptotically equivalent to \((1-\frac{2 \sqrt{5}}{5}) \cdot n \sim0.105573 \cdot n\).

  4. Subscript notation is used for partial derivatives.

  5. We follow Drmota (1997), in using the term simple, whereas the term admissible was used in Fusy and Clote (2012).

References

  • Andronescu, M., Bereg, V., Hoos, H. H., & Condon, A. (2008). RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinform., 9, 340.

    Article  Google Scholar 

  • Clote, P. (2005). An efficient algorithm to compute the landscape of locally optimal RNA secondary structures with respect to the Nussinov–Jacobson energy model. J. Comput. Biol., 12(1), 83–101.

    Article  MathSciNet  Google Scholar 

  • Clote, P. (2006). Combinatorics of saturated secondary structures of RNA. J. Comput. Biol., 13(9), 1640–1657.

    Article  MathSciNet  Google Scholar 

  • Clote, P., Kranakis, E., Krizanc, D., & Salvy, B. (2009). Asymptotics of canonical and saturated RNA secondary structures. J. Bioinform. Comput. Biol., 7(5), 869–893.

    Article  Google Scholar 

  • Clote, P., Dobrev, S., Dotu, I., Kranakis, E., Krizanc, D., & Urrutia, J. (2012). On the page number of RNA secondary structures with pseudoknots. J. Math. Biol., 65(6–7), 1337–1357.

    Article  MathSciNet  MATH  Google Scholar 

  • Danilova, L. V., Pervouchine, D. D., Favorov, A. V., & Mironov, A. A. (2006). RNAKinetics: a web server that models secondary structure kinetics of an elongating RNA. J. Bioinform. Comput. Biol., 4(2), 589–596.

    Article  Google Scholar 

  • Drmota, M. (1997). Systems of functional equations. Random Struct. Algorithms, 10(1–2), 103–124.

    Article  MathSciNet  MATH  Google Scholar 

  • Drmota, M., Fusy, É., Jué, J., Kang, M., & Kraus, V. (2011). Asymptotic study of subcritical graph classes. SIAM J. Discrete Math., 25(4), 1615–1651.

    Article  MathSciNet  MATH  Google Scholar 

  • Flajolet, P., & Sedgewick, R. (2009). Analytic combinatorics. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Flamm, C., Fontana, W., Hofacker, I. L., & Schuster, P. (2000). RNA folding at elementary step resolution. RNA, 6, 325–338.

    Article  Google Scholar 

  • Fusy, E., & Clote, P. (2012). Combinatorics of locally optimal RNA secondary structures. J. Math. Biol., 2012 Dec 22 [Epub ahead of print]. PMID: 23263300.

  • Gardner, P. P., Daub, J., Tate, J., Moore, B. L., Osuch, I. H., Griffiths-Jones, S., Finn, R. D., Nawrocki, E. P., Kolbe, D. L., Eddy, S. R., & Bateman, A. (2011). Rfam: wikipedia, clans and the “decimal” release. Nucleic Acids Res., 39(Database), D141–D145.

    Article  Google Scholar 

  • Griffiths-Jones, S. (2006). Mirbase: the microRNA sequence database. Methods Mol. Biol., 342, 129–138.

    Google Scholar 

  • Gutell, R. R. (1994). Collection of small subunit (16 S- and 16 S-like) ribosomal RNA structures. Nucleic Acids Res., 22, 3502–3507.

    Article  Google Scholar 

  • Hofacker, I. L. (2003). Vienna RNA secondary structure server. Nucleic Acids Res., 31, 3429–3431.

    Article  Google Scholar 

  • Hofacker, I. L., Schuster, P., & Stadler, P. F. (1998). Combinatorics of RNA secondary structures. Discrete Appl. Math., 88, 207–237.

    Article  MathSciNet  MATH  Google Scholar 

  • Jin, E. Y., & Reidys, C. M. (2008). Asymptotic enumeration of RNA structures with pseudoknots. Bull. Math. Biol., 70(4), 951–970.

    Article  MathSciNet  MATH  Google Scholar 

  • Knudsen, B., & Hein, J. (2003). Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res., 31(13), 3423–3428.

    Article  Google Scholar 

  • Lang, S. (2002). Algebra (revised 3rd ed.). Berlin: Springer.

    Book  MATH  Google Scholar 

  • Li, T. J., & Reidys, C. M. (2011). Combinatorial analysis of interacting RNA molecules. Math. Biosci., 233(1), 47–58.

    Article  MathSciNet  MATH  Google Scholar 

  • Li, T. J., & Reidys, C. M. (2012). Combinatorics of RNA–RNA interaction. J. Math. Biol., 64(3), 529–556.

    Article  MathSciNet  MATH  Google Scholar 

  • Lorenz, W. A., Ponty, Y., & Clote, P. (2008). Asymptotics of RNA shapes. J. Comput. Biol., 15(1), 31–63.

    Article  MathSciNet  Google Scholar 

  • Lowe, T., & Eddy, S. (1997). tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res., 25(5), 955–964.

    Article  Google Scholar 

  • Markham, N. R., & Zuker, M. (2008). UNAFold: software for nucleic acid folding and hybridization. Methods Mol. Biol., 453, 3–31.

    Article  Google Scholar 

  • Mathews, D. H. (2004). Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA, 10(8), 1178–1190.

    Article  Google Scholar 

  • Muller, U. R., & Fitch, W. M. (1982). Evolutionary selection for perfect hairpin structures in viral DNAs. Nature, 298(5874), 582–585.

    Article  Google Scholar 

  • Nebel, M. E. (2002). Combinatorial properties of RNA secondary structure. J. Comput. Biol., 9(3), 541–573.

    Article  MathSciNet  Google Scholar 

  • Nebel, M. E. (2004). Investigation of the Bernoulli model for RNA secondary structures. Bull. Math. Biol., 66(5), 925–964.

    Article  MathSciNet  MATH  Google Scholar 

  • Nebel, M. E., Reidys, C. M., & Wang, R. R. (2011a). Loops in canonical RNA pseudoknot structures. J. Comput. Biol., 18(12), 1793–1806.

    Article  MathSciNet  Google Scholar 

  • Nebel, N. E., Reidys, C. M., & Wang, R. R. (2011b). Loops in canonical RNA pseudoknot structures. J. Comput. Biol., 18(12), 1793–1806.

    Article  MathSciNet  Google Scholar 

  • Nussinov, R., & Jacobson, A. B. (1980). Fast algorithm for predicting the secondary structure of single stranded RNA. Proc. Natl. Acad. Sci. USA, 77(11), 6309–6313.

    Article  Google Scholar 

  • Reidys, C. M., & Wang, R. R. (2010). Shapes of RNA pseudoknot structures. J. Comput. Biol., 17(11), 1575–1590.

    Article  MathSciNet  Google Scholar 

  • Rivas, E., Lang, R., & Eddy, S. R. (2012). A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more. RNA, 18(2), 193–212.

    Article  Google Scholar 

  • Rodland, E. A. (2006). Pseudoknots in RNA secondary structures: representation, enumeration, and prevalence. J. Comput. Biol., 13(6), 1197–1213.

    Article  MathSciNet  Google Scholar 

  • Rose, P. W., Beran, B., Bi, C., Bluhm, W. F., Dimitropoulos, D., Goodsell, D. S., Prlic, A., Quesada, M., Quinn, G. B., Westbrook, J. D., Young, J., Yukich, B., Zardecki, C., Berman, H. M., & Bourne, P. E. (2011). The RCSB Protein Data Bank: redesigned web site and web services. Nucleic Acids Res., 39(Database), D392–D401.

    Article  Google Scholar 

  • Sprinzl, M., Horn, C., Brown, M., Ioudovitch, A., & Steinberg, S. (1998). Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res., 26, 148–153.

    Article  Google Scholar 

  • Stein, P. R., & Waterman, M. S. (1978). On some new sequences generalizing the Catalan and Motzkin numbers. Discrete Math., 26, 261–272.

    Article  MathSciNet  MATH  Google Scholar 

  • Torarinsson, E., Yao, Z., Wiklund, E. D., Bramsen, J. B., Hansen, C., Kjems, J., Tommerup, N., Ruzzo, W. L., & Gorodkin, J. (2008). Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions. Genome Res., 18(2), 242–251.

    Article  Google Scholar 

  • Waldispuhl, J., & Clote, P. (2007). Computing the partition function and sampling for saturated secondary structures of RNA, with respect to the Turner energy model. J. Comput. Biol., 14(2), 190–215.

    Article  MathSciNet  Google Scholar 

  • Waterman, M. S. (1995). Introduction to computational biology. London/Boca Raton: Chapman and Hall/CRC Press.

    Book  MATH  Google Scholar 

  • Weinberg, F., & Nebel, N. E. (2011). Applying length-dependent stochastic context-free grammars to RNA secondary structure prediction. Algorithms, 4(4), 223–238.

    Article  MathSciNet  Google Scholar 

  • Xayaphoummine, A., Bucher, T., & Isambert, H. (2005). Kinefold web server for RNA/DNA folding path and structure prediction including pseudoknots and knots. Nucleic Acids Res., 33(Web), W605–W610.

    Article  Google Scholar 

  • Xia, T., SantaLucia, J. Jr., Burkard, M. E., Kierzek, R., Schroeder, S. J., Jiao, X., Cox, C., & Turner, D. H. (1999). Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson–Crick base pairs. Biochemistry, 37, 14719–14735.

    Article  Google Scholar 

  • Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res., 31(13), 3406–3415.

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank D.H. Mathews for generously sharing his RNA data collection, and B. Salvy and E. Fusy for discussions of Drmota’s theorem. Partial support for the research of P. Clote is from NSF grants DMS-0817971 and DBI-1262439. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Funding for the research of E. Kranakis was provided by the Natural Sciences and Engineering Research Council of Canada (NSERC) and Mathematics of Information Technology and Complex Systems (MITACS).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Clote.

Appendix: Computing the Number of Hairpins in Saturated Structures

Appendix: Computing the Number of Hairpins in Saturated Structures

To produce Fig. 1, we computed by dynamic programming the expected number of hairpins in saturated structures for a homopolymer of size n. In the interests of brevity, we must refer the interested reader to Clote (2006) for background material on recurrence relations for the number of saturated structures. The recurrence relations require the auxiliary notion of saturated structure with no visible positions, defined as follows. A secondary structure S on sequence a 1,…,a n has no visible positions, if for all 1≤in in which a i is unpaired, there is no base pair (x,y) for which x<i<y.

Let D(n,k) denote the number of saturated secondary structures having exactly k hairpins. Let E(n,k) denote the number of saturated secondary structures having exactly k hairpins, which have no visible positions. Define D(0,0)=D(1,0)=D(2,0)=D(3,0)=1 and E(0,0)=E(3,1)=1; for all other values of 0≤n≤3 and 0≤k≤3, let D(n,k)=E(n,k)=0.

The inductive case is given by:

$$\begin{aligned} D(n,k) =&E(n-1,k)+E(n-2,k) + \sum_{r=1}^{n-2} D(r-1,k-1)D(n-r-1,0) \\ &{}+ \sum_{r=1}^{n-2} \sum _{s=0}^{k-1} D(r-1,s)D(n-r-1,k-s) \\ E(n,k) =& \sum_{r=1}^{n-2} E(r-1,k-1)D(n-r-1,0) \\ &{} +\sum_{r=1}^{n-2} \sum _{s=0}^{k-1} E(r-1,s)D(n-r-1,k-s). \end{aligned}$$

Since the justification for these recursion is similar to that of Clote (2006), we do not provide further details. These recursions are implemented using dynamic programming to compute the number of saturated structures on a homopolymer of size n having exactly k hairpins. It follows that the expected number of hairpins for a homopolymer of size n is

$$\sum_{k=0}^n k \cdot\frac{D(n,k)}{S(n)} $$

where \(S(n)=\sum_{k=0}^{n} D(n,k)\) is the total number of saturated structures for a homopolymer of size n. The Python code is available on the web supplement.

Definition of Resultant

In the proof of Theorem 3, we compute the resultant of two multivariable polynomials. For the benefit of the reader, we define this concept here. For any commutative ring A, indeterminate X and two multivariate polynomials

$$\begin{aligned} p_1 =& v_n X^n + \cdots+ v_1 X + v_0 \\ p_2 =& u_m X^m + \cdots+ u_1 X + u_0 \end{aligned}$$

respectively having roots α 1,…,α n and β 1,…,β m in the algebraic closure of A, the resultant of p 1,p 2 with respect to X is defined to be

$$v_n^n u_m^m \prod _{i=1}^n \prod_{j=1}^m (\alpha_i-\beta_j). $$

In applications, for instance g 1,g 2 could be functions in variables S,R,u,z, but construed to be polynomials over indeterminate R with coefficients from the ring \(\mathbb{Z}(z,u,S)\). In such a case, the resultant Res(g 1,g 2) of g 1,g 2 is a polynomial in \(\mathbb{Z}[z,u,S]\), whose roots are the z-, u- and S-coordinates of the intersection of curves corresponding to g 1,g 2. Moreover, it is known that there exist polynomials \(q_{1},q_{2} \in\mathbb{Z}[z,u,S][R]\) such that

$$ g_1 \cdot q_1 + g_2 \cdot q_2 = \mathit{Res}(g_1,g_2). $$
(7)

For more background on resultants, see Lang (2002).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Clote, P., Kranakis, E. & Krizanc, D. Asymptotic Number of Hairpins of Saturated RNA Secondary Structures. Bull Math Biol 75, 2410–2430 (2013). https://doi.org/10.1007/s11538-013-9899-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11538-013-9899-1

Keywords

Navigation