Asymptotic Number of Hairpins of Saturated RNA Secondary Structures

Clote, Peter; Kranakis, Evangelos; Krizanc, Danny

doi:10.1007/s11538-013-9899-1

Asymptotic Number of Hairpins of Saturated RNA Secondary Structures

Original Article
Published: 19 October 2013

Volume 75, pages 2410–2430, (2013)
Cite this article

Bulletin of Mathematical Biology Aims and scope Submit manuscript

Peter Clote¹,
Evangelos Kranakis² &
Danny Krizanc³

228 Accesses
Explore all metrics

Abstract

In the absence of chaperone molecules, RNA folding is believed to depend on the distribution of kinetic traps in the energy landscape of all secondary structures. Kinetic traps in the Nussinov energy model are precisely those secondary structures that are saturated, meaning that no base pair can be added without introducing either a pseudoknot or base triple. In this paper, we compute the asymptotic expected number of hairpins in saturated structures. For instance, if every hairpin is required to contain at least θ=3 unpaired bases and the probability that any two positions can base-pair is p=3/8, then the asymptotic number of saturated structures is 1.34685⋅n ^−3/2⋅1.62178ⁿ, and the asymptotic expected number of hairpins follows a normal distribution with mean $0.06695640 \cdot n + 0.01909350 \cdot\sqrt{n} \cdot\mathcal{N}$. Similar results are given for values θ=1,3, and p=1,1/2,3/8; for instance, when θ=1 and p=1, the asymptotic expected number of hairpins in saturated secondary structures is 0.123194⋅n, a value greater than the asymptotic expected number 0.105573⋅n of hairpins over all secondary structures. Since RNA binding targets are often found in hairpin regions, it follows that saturated structures present potentially more binding targets than nonsaturated structures, on average. Next, we describe a novel algorithm to compute the hairpin profile of a given RNA sequence: given RNA sequence a ₁,…,a _n, for each integer k, we compute that secondary structure S _k having minimum energy in the Nussinov energy model, taken over all secondary structures having k hairpins. We expect that an extension of our algorithm to the Turner energy model may provide more accurate structure prediction for particular RNAs, such as tRNAs and purine riboswitches, known to have a particular number of hairpins. Mathematica^™ computations, C and Python source code, and additional supplementary information are available at the website http://bioinformatics.bc.edu/clotelab/RNAhairpinProfile/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RNA secondary structures in a polymer-zeta model how foldings should be shaped for sparsification to establish a linear speedup

Article 23 May 2015

RNA folding kinetics using Monte Carlo and Gillespie algorithms

Article 05 August 2017

The Rainbow Spectrum of RNA Secondary Structures

Article 14 March 2018

Notes

In the Nussinov energy landscape, due to degeneracy of the model, the minimum energy structure may not be unique. Indeed, in Clote (2006), we show that even RNA homopolymers have quadratically many minimum energy structures.
In Theorem 10 of Nebel (2002), it is shown that the number of unpaired nucleotides is asymptotically equal to $\frac{n}{\sqrt{5}}$, where the stated result follows. One can compare as well with the asymptotic number of hairpins in k-noncrossing structures, given in Table 2 of Nebel et al. (2011b).
In Theorem 16 of Nebel (2002), it is shown that the expected number of hairpins over all secondary structures is asymptotically equivalent to $(1-\frac{2 \sqrt{5}}{5}) \cdot n \sim0.105573 \cdot n$.
Subscript notation is used for partial derivatives.
We follow Drmota (1997), in using the term simple, whereas the term admissible was used in Fusy and Clote (2012).

References

Andronescu, M., Bereg, V., Hoos, H. H., & Condon, A. (2008). RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinform., 9, 340.
Article Google Scholar
Clote, P. (2005). An efficient algorithm to compute the landscape of locally optimal RNA secondary structures with respect to the Nussinov–Jacobson energy model. J. Comput. Biol., 12(1), 83–101.
Article MathSciNet Google Scholar
Clote, P. (2006). Combinatorics of saturated secondary structures of RNA. J. Comput. Biol., 13(9), 1640–1657.
Article MathSciNet Google Scholar
Clote, P., Kranakis, E., Krizanc, D., & Salvy, B. (2009). Asymptotics of canonical and saturated RNA secondary structures. J. Bioinform. Comput. Biol., 7(5), 869–893.
Article Google Scholar
Clote, P., Dobrev, S., Dotu, I., Kranakis, E., Krizanc, D., & Urrutia, J. (2012). On the page number of RNA secondary structures with pseudoknots. J. Math. Biol., 65(6–7), 1337–1357.
Article MathSciNet MATH Google Scholar
Danilova, L. V., Pervouchine, D. D., Favorov, A. V., & Mironov, A. A. (2006). RNAKinetics: a web server that models secondary structure kinetics of an elongating RNA. J. Bioinform. Comput. Biol., 4(2), 589–596.
Article Google Scholar
Drmota, M. (1997). Systems of functional equations. Random Struct. Algorithms, 10(1–2), 103–124.
Article MathSciNet MATH Google Scholar
Drmota, M., Fusy, É., Jué, J., Kang, M., & Kraus, V. (2011). Asymptotic study of subcritical graph classes. SIAM J. Discrete Math., 25(4), 1615–1651.
Article MathSciNet MATH Google Scholar
Flajolet, P., & Sedgewick, R. (2009). Analytic combinatorics. Cambridge: Cambridge University Press.
Book MATH Google Scholar
Flamm, C., Fontana, W., Hofacker, I. L., & Schuster, P. (2000). RNA folding at elementary step resolution. RNA, 6, 325–338.
Article Google Scholar
Fusy, E., & Clote, P. (2012). Combinatorics of locally optimal RNA secondary structures. J. Math. Biol., 2012 Dec 22 [Epub ahead of print]. PMID: 23263300.
Gardner, P. P., Daub, J., Tate, J., Moore, B. L., Osuch, I. H., Griffiths-Jones, S., Finn, R. D., Nawrocki, E. P., Kolbe, D. L., Eddy, S. R., & Bateman, A. (2011). Rfam: wikipedia, clans and the “decimal” release. Nucleic Acids Res., 39(Database), D141–D145.
Article Google Scholar
Griffiths-Jones, S. (2006). Mirbase: the microRNA sequence database. Methods Mol. Biol., 342, 129–138.
Google Scholar
Gutell, R. R. (1994). Collection of small subunit (16 S- and 16 S-like) ribosomal RNA structures. Nucleic Acids Res., 22, 3502–3507.
Article Google Scholar
Hofacker, I. L. (2003). Vienna RNA secondary structure server. Nucleic Acids Res., 31, 3429–3431.
Article Google Scholar
Hofacker, I. L., Schuster, P., & Stadler, P. F. (1998). Combinatorics of RNA secondary structures. Discrete Appl. Math., 88, 207–237.
Article MathSciNet MATH Google Scholar
Jin, E. Y., & Reidys, C. M. (2008). Asymptotic enumeration of RNA structures with pseudoknots. Bull. Math. Biol., 70(4), 951–970.
Article MathSciNet MATH Google Scholar
Knudsen, B., & Hein, J. (2003). Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res., 31(13), 3423–3428.
Article Google Scholar
Lang, S. (2002). Algebra (revised 3rd ed.). Berlin: Springer.
Book MATH Google Scholar
Li, T. J., & Reidys, C. M. (2011). Combinatorial analysis of interacting RNA molecules. Math. Biosci., 233(1), 47–58.
Article MathSciNet MATH Google Scholar
Li, T. J., & Reidys, C. M. (2012). Combinatorics of RNA–RNA interaction. J. Math. Biol., 64(3), 529–556.
Article MathSciNet MATH Google Scholar
Lorenz, W. A., Ponty, Y., & Clote, P. (2008). Asymptotics of RNA shapes. J. Comput. Biol., 15(1), 31–63.
Article MathSciNet Google Scholar
Lowe, T., & Eddy, S. (1997). tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res., 25(5), 955–964.
Article Google Scholar
Markham, N. R., & Zuker, M. (2008). UNAFold: software for nucleic acid folding and hybridization. Methods Mol. Biol., 453, 3–31.
Article Google Scholar
Mathews, D. H. (2004). Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA, 10(8), 1178–1190.
Article Google Scholar
Muller, U. R., & Fitch, W. M. (1982). Evolutionary selection for perfect hairpin structures in viral DNAs. Nature, 298(5874), 582–585.
Article Google Scholar
Nebel, M. E. (2002). Combinatorial properties of RNA secondary structure. J. Comput. Biol., 9(3), 541–573.
Article MathSciNet Google Scholar
Nebel, M. E. (2004). Investigation of the Bernoulli model for RNA secondary structures. Bull. Math. Biol., 66(5), 925–964.
Article MathSciNet MATH Google Scholar
Nebel, M. E., Reidys, C. M., & Wang, R. R. (2011a). Loops in canonical RNA pseudoknot structures. J. Comput. Biol., 18(12), 1793–1806.
Article MathSciNet Google Scholar
Nebel, N. E., Reidys, C. M., & Wang, R. R. (2011b). Loops in canonical RNA pseudoknot structures. J. Comput. Biol., 18(12), 1793–1806.
Article MathSciNet Google Scholar
Nussinov, R., & Jacobson, A. B. (1980). Fast algorithm for predicting the secondary structure of single stranded RNA. Proc. Natl. Acad. Sci. USA, 77(11), 6309–6313.
Article Google Scholar
Reidys, C. M., & Wang, R. R. (2010). Shapes of RNA pseudoknot structures. J. Comput. Biol., 17(11), 1575–1590.
Article MathSciNet Google Scholar
Rivas, E., Lang, R., & Eddy, S. R. (2012). A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more. RNA, 18(2), 193–212.
Article Google Scholar
Rodland, E. A. (2006). Pseudoknots in RNA secondary structures: representation, enumeration, and prevalence. J. Comput. Biol., 13(6), 1197–1213.
Article MathSciNet Google Scholar
Rose, P. W., Beran, B., Bi, C., Bluhm, W. F., Dimitropoulos, D., Goodsell, D. S., Prlic, A., Quesada, M., Quinn, G. B., Westbrook, J. D., Young, J., Yukich, B., Zardecki, C., Berman, H. M., & Bourne, P. E. (2011). The RCSB Protein Data Bank: redesigned web site and web services. Nucleic Acids Res., 39(Database), D392–D401.
Article Google Scholar
Sprinzl, M., Horn, C., Brown, M., Ioudovitch, A., & Steinberg, S. (1998). Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res., 26, 148–153.
Article Google Scholar
Stein, P. R., & Waterman, M. S. (1978). On some new sequences generalizing the Catalan and Motzkin numbers. Discrete Math., 26, 261–272.
Article MathSciNet MATH Google Scholar
Torarinsson, E., Yao, Z., Wiklund, E. D., Bramsen, J. B., Hansen, C., Kjems, J., Tommerup, N., Ruzzo, W. L., & Gorodkin, J. (2008). Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions. Genome Res., 18(2), 242–251.
Article Google Scholar
Waldispuhl, J., & Clote, P. (2007). Computing the partition function and sampling for saturated secondary structures of RNA, with respect to the Turner energy model. J. Comput. Biol., 14(2), 190–215.
Article MathSciNet Google Scholar
Waterman, M. S. (1995). Introduction to computational biology. London/Boca Raton: Chapman and Hall/CRC Press.
Book MATH Google Scholar
Weinberg, F., & Nebel, N. E. (2011). Applying length-dependent stochastic context-free grammars to RNA secondary structure prediction. Algorithms, 4(4), 223–238.
Article MathSciNet Google Scholar
Xayaphoummine, A., Bucher, T., & Isambert, H. (2005). Kinefold web server for RNA/DNA folding path and structure prediction including pseudoknots and knots. Nucleic Acids Res., 33(Web), W605–W610.
Article Google Scholar
Xia, T., SantaLucia, J. Jr., Burkard, M. E., Kierzek, R., Schroeder, S. J., Jiao, X., Cox, C., & Turner, D. H. (1999). Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson–Crick base pairs. Biochemistry, 37, 14719–14735.
Article Google Scholar
Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res., 31(13), 3406–3415.
Article Google Scholar

Download references

Acknowledgements

We would like to thank D.H. Mathews for generously sharing his RNA data collection, and B. Salvy and E. Fusy for discussions of Drmota’s theorem. Partial support for the research of P. Clote is from NSF grants DMS-0817971 and DBI-1262439. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Funding for the research of E. Kranakis was provided by the Natural Sciences and Engineering Research Council of Canada (NSERC) and Mathematics of Information Technology and Complex Systems (MITACS).

Author information

Authors and Affiliations

Department of Biology, Boston College, Chestnut Hill, MA, 02467, USA
Peter Clote
School of Computer Science, Carleton University, K1S 5B6, Ottawa, Ontario, Canada
Evangelos Kranakis
Department of Mathematics and Computer Science, Wesleyan University, Middletown, CT, 06459, USA
Danny Krizanc

Authors

Peter Clote
View author publications
You can also search for this author in PubMed Google Scholar
Evangelos Kranakis
View author publications
You can also search for this author in PubMed Google Scholar
Danny Krizanc
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Clote.

Appendix: Computing the Number of Hairpins in Saturated Structures

To produce Fig. 1, we computed by dynamic programming the expected number of hairpins in saturated structures for a homopolymer of size n. In the interests of brevity, we must refer the interested reader to Clote (2006) for background material on recurrence relations for the number of saturated structures. The recurrence relations require the auxiliary notion of saturated structure with no visible positions, defined as follows. A secondary structure S on sequence a ₁,…,a _n has no visible positions, if for all 1≤i≤n in which a _i is unpaired, there is no base pair (x,y) for which x<i<y.

Let D(n,k) denote the number of saturated secondary structures having exactly k hairpins. Let E(n,k) denote the number of saturated secondary structures having exactly k hairpins, which have no visible positions. Define D(0,0)=D(1,0)=D(2,0)=D(3,0)=1 and E(0,0)=E(3,1)=1; for all other values of 0≤n≤3 and 0≤k≤3, let D(n,k)=E(n,k)=0.

The inductive case is given by:

$$\begin{aligned} D(n,k) =&E(n-1,k)+E(n-2,k) + \sum_{r=1}^{n-2} D(r-1,k-1)D(n-r-1,0) \\ &{}+ \sum_{r=1}^{n-2} \sum _{s=0}^{k-1} D(r-1,s)D(n-r-1,k-s) \\ E(n,k) =& \sum_{r=1}^{n-2} E(r-1,k-1)D(n-r-1,0) \\ &{} +\sum_{r=1}^{n-2} \sum _{s=0}^{k-1} E(r-1,s)D(n-r-1,k-s). \end{aligned}$$

Since the justification for these recursion is similar to that of Clote (2006), we do not provide further details. These recursions are implemented using dynamic programming to compute the number of saturated structures on a homopolymer of size n having exactly k hairpins. It follows that the expected number of hairpins for a homopolymer of size n is

$$\sum_{k=0}^n k \cdot\frac{D(n,k)}{S(n)} $$

where $S(n)=\sum_{k=0}^{n} D(n,k)$ is the total number of saturated structures for a homopolymer of size n. The Python code is available on the web supplement.

Definition of Resultant

In the proof of Theorem 3, we compute the resultant of two multivariable polynomials. For the benefit of the reader, we define this concept here. For any commutative ring A, indeterminate X and two multivariate polynomials

$$\begin{aligned} p_1 =& v_n X^n + \cdots+ v_1 X + v_0 \\ p_2 =& u_m X^m + \cdots+ u_1 X + u_0 \end{aligned}$$

respectively having roots α ₁,…,α _n and β ₁,…,β _m in the algebraic closure of A, the resultant of p ₁,p ₂ with respect to X is defined to be

$$v_n^n u_m^m \prod _{i=1}^n \prod_{j=1}^m (\alpha_i-\beta_j). $$

In applications, for instance g ₁,g ₂ could be functions in variables S,R,u,z, but construed to be polynomials over indeterminate R with coefficients from the ring $\mathbb{Z}(z,u,S)$. In such a case, the resultant Res(g ₁,g ₂) of g ₁,g ₂ is a polynomial in $\mathbb{Z}[z,u,S]$, whose roots are the z-, u- and S-coordinates of the intersection of curves corresponding to g ₁,g ₂. Moreover, it is known that there exist polynomials $q_{1},q_{2} \in\mathbb{Z}[z,u,S][R]$ such that

$$ g_1 \cdot q_1 + g_2 \cdot q_2 = \mathit{Res}(g_1,g_2). $$

(7)

For more background on resultants, see Lang (2002).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Clote, P., Kranakis, E. & Krizanc, D. Asymptotic Number of Hairpins of Saturated RNA Secondary Structures. Bull Math Biol 75, 2410–2430 (2013). https://doi.org/10.1007/s11538-013-9899-1

Download citation

Received: 21 September 2012
Accepted: 22 August 2013
Published: 19 October 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s11538-013-9899-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Asymptotic Number of Hairpins of Saturated RNA Secondary Structures

Abstract

Access this article

Similar content being viewed by others

RNA secondary structures in a polymer-zeta model how foldings should be shaped for sparsification to establish a linear speedup

RNA folding kinetics using Monte Carlo and Gillespie algorithms

The Rainbow Spectrum of RNA Secondary Structures

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Computing the Number of Hairpins in Saturated Structures

Definition of Resultant

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Asymptotic Number of Hairpins of Saturated RNA Secondary Structures

Abstract

Access this article

Similar content being viewed by others

RNA secondary structures in a polymer-zeta model how foldings should be shaped for sparsification to establish a linear speedup

RNA folding kinetics using Monte Carlo and Gillespie algorithms

The Rainbow Spectrum of RNA Secondary Structures

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Computing the Number of Hairpins in Saturated Structures

Appendix: Computing the Number of Hairpins in Saturated Structures

Definition of Resultant

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation