Journal of Mathematical Biology

, Volume 65, Issue 3, pp 581–599 | Cite as

Expected distance between terminal nucleotides of RNA secondary structures

  • Peter Clote
  • Yann Ponty
  • Jean-Marc Steyaert


In “The ends of a large RNA molecule are necessarily close”, Yoffe et al. (Nucleic Acids Res 39(1):292–299, 2011) used the programs \({\tt RNAfold}\) [resp. \({\tt RNAsubopt}\) ] from Vienna RNA Package to calculate the distance between 5′ and 3′ ends of the minimum free energy secondary structure [resp. thermal equilibrium structures] of viral and random RNA sequences. Here, the 5′–3′ distance is defined to be the length of the shortest path from 5′ node to 3′ node in the undirected graph, whose edge set consists of edges {i, i + 1} corresponding to covalent backbone bonds and of edges {i, j} corresponding to canonical base pairs. From repeated simulations and using a heuristic theoretical argument, Yoffe et al. conclude that the 5′–3′ distance is less than a fixed constant, independent of RNA sequence length. In this paper, we provide a rigorous, mathematical framework to study the expected distance from 5′ to 3′ ends of an RNA sequence. We present recurrence relations that precisely define the expected distance from 5′ to 3′ ends of an RNA sequence, both for the Turner nearest neighbor energy model, as well as for a simple homopolymer model first defined by Stein and Waterman. We implement dynamic programming algorithms to compute (rather than approximate by repeated application of Vienna RNA Package) the expected distance between 5′ and 3′ ends of a given RNA sequence, with respect to the Turner energy model. Using methods of analytical combinatorics, that depend on complex analysis, we prove that the asymptotic expected 5′–3′ distance \({\langle d_n \rangle}\) of length n homopolymers is approximately equal to the constant 5.47211, while the asymptotic distance is 6.771096 if hairpins have a minimum of 3 unpaired bases and the probability that any two positions can form a base pair is 1/4. Finally, we analyze the 5′–3′ distance for secondary structures from the STRAND database, and conclude that the 5′–3′ distance is correlated with RNA sequence length.


RNA Boltzmann partition function Asymptotic combinatorics Dynamic programming 

Mathematics Subject Classification (2000)

05C30 49L20 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material


  1. Andronescu M, Bereg V, Hoos HH, Condon A (2008) RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinform 9: 340CrossRefGoogle Scholar
  2. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, Fagan P, Marvin J, Padilla D, Ravichandran V, Schneider B, Thanki N, Weissig H, Westbrook JD, Zardecki C (2002) The protein data bank. Acta Crystallogr D Biol Crystallogr 58(Pt): 899–907CrossRefGoogle Scholar
  3. Berman HM, Westbrook J, Feng Z, Iype L, Schneider B, Zardecki C (2003) The nucleic acid database. Methods Biochem Anal 44: 199–216Google Scholar
  4. Cormen T, Leiserson C, Rivest R (1990) Algorithms. McGraw-Hill, New YorkzbMATHGoogle Scholar
  5. Corver J, Lenches E, Smith K, Robison RA, Sando T, Strauss EG, Strauss JH (2003) Fine mapping of a cis-acting sequence element in yellow fever virus RNA that is required for RNA replication and cyclization. J Virol 77(3): 2265–2270CrossRefGoogle Scholar
  6. Darty K, Denise A, Ponty Y (2009) VARNA: interactive drawing and editing of the RNA secondary structure. Bioinformatics 25(15): 1974–1975CrossRefGoogle Scholar
  7. Flajolet P, Sedgewick R (2009) Analytic Combinatorics. Cambridge University, Cambridge ISBN-13:9780521898065zbMATHCrossRefGoogle Scholar
  8. Gallie DR (1991) The cap and poly(A) tail function synergistically to regulate mRNA translational efficiency. Genes Dev 5(11): 2108–2116CrossRefGoogle Scholar
  9. Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A (2009) Rfam: updates to the RNA families database. Nucleic Acids Res 37(Database): D136–D140CrossRefGoogle Scholar
  10. Gerland U, Bundschuh R, Hwa T (2001) Force-induced denaturation of RNA. Biophys J 81: 1324–1332CrossRefGoogle Scholar
  11. Gutell R, Lee J, Cannone J (2005) The accuracy of ribosomal RNA comparative structure models. Curr Opin Struct Biol 12: 301–310CrossRefGoogle Scholar
  12. Hofacker I (2003) Vienna RNA secondary structure server. Nucleic Acids Res 31(13): 3429–3431CrossRefGoogle Scholar
  13. Hofacker IL, Schuster P, Stadler PF (1998) Combinatorics of RNA secondary structures. Discret Appl Math 88:207–237. Google Scholar
  14. Hopcroft JE, Ullman JD (1969) Formal languages and their relation to automata. Addison-Wesley, ReadingzbMATHGoogle Scholar
  15. Hsu MT, Parvin JD, Gupta S, Krystal M, Palese P (1987) Genomic RNAs of influenza viruses are held in a circular conformation in virions and in infected cells by a terminal panhandle. Proc Natl Acad Sci USA 84(22): 8140–8144CrossRefGoogle Scholar
  16. Kneller EL, Rakotondrafara AM, Miller WA (2006) Cap-independent translation of plant viral RNAs. Virus Res 119(1): 63–75CrossRefGoogle Scholar
  17. Lorenz WA, Ponty Y, Clote P (2008) Asymptotics of RNA shapes. J Comput Biol 15(1): 31–63MathSciNetCrossRefGoogle Scholar
  18. McCaskill J (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29: 1105–1119CrossRefGoogle Scholar
  19. Miller WA, White KA (2006) Long-distance RNA–RNA interactions in plant virus gene expression and replication. Annu Rev Phytopathol 44: 447–467CrossRefGoogle Scholar
  20. Nussinov R, Jacobson AB (1980) Fast algorithm for predicting the secondary structure of single stranded RNA. Proc Natl Acad Sci USA 77(11): 6309–6313CrossRefGoogle Scholar
  21. Sprinzl M, Horn C, Brown M, Ioudovitch A, Steinberg S (1998) Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res 26: 148–153CrossRefGoogle Scholar
  22. Stein PR, Waterman MS (1978) On some new sequences generalizing the Catalan and Motzkin numbers. Discret Math 26: 261–272MathSciNetCrossRefGoogle Scholar
  23. Xia T, SantaLucia J, Burkard M, Kierzek R, Schroeder S, Jiao X, Cox C, Turner D (1999) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson–Crick base pairs. Biochemistry 37(14): 719–735Google Scholar
  24. Yoffe AM, Prinsen P, Gelbart WM, Ben-Shaul A (2011) The ends of a large RNA molecule are necessarily close. Nucleic Acids Res 39(1): 292–299CrossRefGoogle Scholar
  25. Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31(13): 3406–3415CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  1. 1.Department of BiologyBoston CollegeChestnut HillUSA
  2. 2.Laboratoire d’Informatique (LIX), Ecole PolytechniquePalaiseauFrance

Personalised recommendations