Skip to main content
Log in

Expected distance between terminal nucleotides of RNA secondary structures

  • Published:
Journal of Mathematical Biology Aims and scope Submit manuscript


In “The ends of a large RNA molecule are necessarily close”, Yoffe et al. (Nucleic Acids Res 39(1):292–299, 2011) used the programs \({\tt RNAfold}\) [resp. \({\tt RNAsubopt}\) ] from Vienna RNA Package to calculate the distance between 5′ and 3′ ends of the minimum free energy secondary structure [resp. thermal equilibrium structures] of viral and random RNA sequences. Here, the 5′–3′ distance is defined to be the length of the shortest path from 5′ node to 3′ node in the undirected graph, whose edge set consists of edges {i, i + 1} corresponding to covalent backbone bonds and of edges {i, j} corresponding to canonical base pairs. From repeated simulations and using a heuristic theoretical argument, Yoffe et al. conclude that the 5′–3′ distance is less than a fixed constant, independent of RNA sequence length. In this paper, we provide a rigorous, mathematical framework to study the expected distance from 5′ to 3′ ends of an RNA sequence. We present recurrence relations that precisely define the expected distance from 5′ to 3′ ends of an RNA sequence, both for the Turner nearest neighbor energy model, as well as for a simple homopolymer model first defined by Stein and Waterman. We implement dynamic programming algorithms to compute (rather than approximate by repeated application of Vienna RNA Package) the expected distance between 5′ and 3′ ends of a given RNA sequence, with respect to the Turner energy model. Using methods of analytical combinatorics, that depend on complex analysis, we prove that the asymptotic expected 5′–3′ distance \({\langle d_n \rangle}\) of length n homopolymers is approximately equal to the constant 5.47211, while the asymptotic distance is 6.771096 if hairpins have a minimum of 3 unpaired bases and the probability that any two positions can form a base pair is 1/4. Finally, we analyze the 5′–3′ distance for secondary structures from the STRAND database, and conclude that the 5′–3′ distance is correlated with RNA sequence length.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others


  • Andronescu M, Bereg V, Hoos HH, Condon A (2008) RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinform 9: 340

    Article  Google Scholar 

  • Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, Fagan P, Marvin J, Padilla D, Ravichandran V, Schneider B, Thanki N, Weissig H, Westbrook JD, Zardecki C (2002) The protein data bank. Acta Crystallogr D Biol Crystallogr 58(Pt): 899–907

    Article  Google Scholar 

  • Berman HM, Westbrook J, Feng Z, Iype L, Schneider B, Zardecki C (2003) The nucleic acid database. Methods Biochem Anal 44: 199–216

    Google Scholar 

  • Cormen T, Leiserson C, Rivest R (1990) Algorithms. McGraw-Hill, New York

    MATH  Google Scholar 

  • Corver J, Lenches E, Smith K, Robison RA, Sando T, Strauss EG, Strauss JH (2003) Fine mapping of a cis-acting sequence element in yellow fever virus RNA that is required for RNA replication and cyclization. J Virol 77(3): 2265–2270

    Article  Google Scholar 

  • Darty K, Denise A, Ponty Y (2009) VARNA: interactive drawing and editing of the RNA secondary structure. Bioinformatics 25(15): 1974–1975

    Article  Google Scholar 

  • Flajolet P, Sedgewick R (2009) Analytic Combinatorics. Cambridge University, Cambridge ISBN-13:9780521898065

    Book  MATH  Google Scholar 

  • Gallie DR (1991) The cap and poly(A) tail function synergistically to regulate mRNA translational efficiency. Genes Dev 5(11): 2108–2116

    Article  Google Scholar 

  • Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A (2009) Rfam: updates to the RNA families database. Nucleic Acids Res 37(Database): D136–D140

    Article  Google Scholar 

  • Gerland U, Bundschuh R, Hwa T (2001) Force-induced denaturation of RNA. Biophys J 81: 1324–1332

    Article  Google Scholar 

  • Gutell R, Lee J, Cannone J (2005) The accuracy of ribosomal RNA comparative structure models. Curr Opin Struct Biol 12: 301–310

    Article  Google Scholar 

  • Hofacker I (2003) Vienna RNA secondary structure server. Nucleic Acids Res 31(13): 3429–3431

    Article  Google Scholar 

  • Hofacker IL, Schuster P, Stadler PF (1998) Combinatorics of RNA secondary structures. Discret Appl Math 88:207–237.

    Google Scholar 

  • Hopcroft JE, Ullman JD (1969) Formal languages and their relation to automata. Addison-Wesley, Reading

    MATH  Google Scholar 

  • Hsu MT, Parvin JD, Gupta S, Krystal M, Palese P (1987) Genomic RNAs of influenza viruses are held in a circular conformation in virions and in infected cells by a terminal panhandle. Proc Natl Acad Sci USA 84(22): 8140–8144

    Article  Google Scholar 

  • Kneller EL, Rakotondrafara AM, Miller WA (2006) Cap-independent translation of plant viral RNAs. Virus Res 119(1): 63–75

    Article  Google Scholar 

  • Lorenz WA, Ponty Y, Clote P (2008) Asymptotics of RNA shapes. J Comput Biol 15(1): 31–63

    Article  MathSciNet  Google Scholar 

  • McCaskill J (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29: 1105–1119

    Article  Google Scholar 

  • Miller WA, White KA (2006) Long-distance RNA–RNA interactions in plant virus gene expression and replication. Annu Rev Phytopathol 44: 447–467

    Article  Google Scholar 

  • Nussinov R, Jacobson AB (1980) Fast algorithm for predicting the secondary structure of single stranded RNA. Proc Natl Acad Sci USA 77(11): 6309–6313

    Article  Google Scholar 

  • Sprinzl M, Horn C, Brown M, Ioudovitch A, Steinberg S (1998) Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res 26: 148–153

    Article  Google Scholar 

  • Stein PR, Waterman MS (1978) On some new sequences generalizing the Catalan and Motzkin numbers. Discret Math 26: 261–272

    Article  MathSciNet  Google Scholar 

  • Xia T, SantaLucia J, Burkard M, Kierzek R, Schroeder S, Jiao X, Cox C, Turner D (1999) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson–Crick base pairs. Biochemistry 37(14): 719–735

    Google Scholar 

  • Yoffe AM, Prinsen P, Gelbart WM, Ben-Shaul A (2011) The ends of a large RNA molecule are necessarily close. Nucleic Acids Res 39(1): 292–299

    Article  Google Scholar 

  • Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31(13): 3406–3415

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Peter Clote.

Additional information

Research supported by the Digiteo Foundation, and National Science Foundation grants DMS-0817971, DBI-0543506 and DMS-1016618.

Source code (python, Maple, Mathematica and C programs) are available at

Electronic Supplementary Material

The Below is the Electronic Supplementary Material.

ESM 1 (GZ 251 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Clote, P., Ponty, Y. & Steyaert, JM. Expected distance between terminal nucleotides of RNA secondary structures. J. Math. Biol. 65, 581–599 (2012).

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI:


Mathematics Subject Classification (2000)