# Expected distance between terminal nucleotides of RNA secondary structures

## Abstract

In “The ends of a large RNA molecule are necessarily close”, Yoffe et al. (Nucleic Acids Res 39(1):292–299, 2011) used the programs \({\tt RNAfold}\) [resp. \({\tt RNAsubopt}\) ] from Vienna RNA Package to calculate the distance between 5′ and 3′ ends of the minimum free energy secondary structure [resp. thermal equilibrium structures] of viral and random RNA sequences. Here, the 5′–3′ distance is defined to be the length of the shortest path from 5′ node to 3′ node in the undirected graph, whose edge set consists of edges {*i*, *i* + 1} corresponding to covalent backbone bonds and of edges {*i*, *j*} corresponding to canonical base pairs. From repeated simulations and using a heuristic theoretical argument, Yoffe et al. conclude that the 5′–3′ distance is less than a fixed constant, independent of RNA sequence length. In this paper, we provide a rigorous, mathematical framework to study the expected distance from 5′ to 3′ ends of an RNA sequence. We present recurrence relations that precisely define the expected distance from 5′ to 3′ ends of an RNA sequence, both for the Turner nearest neighbor energy model, as well as for a simple homopolymer model first defined by Stein and Waterman. We implement dynamic programming algorithms to *compute* (rather than *approximate* by repeated application of Vienna RNA Package) the *expected distance* between 5′ and 3′ ends of a given RNA sequence, with respect to the Turner energy model. Using methods of analytical combinatorics, that depend on complex analysis, we prove that the asymptotic expected 5′–3′ distance \({\langle d_n \rangle}\) of length *n* homopolymers is approximately equal to the constant 5.47211, while the asymptotic distance is 6.771096 if hairpins have a minimum of 3 unpaired bases and the probability that any two positions can form a base pair is 1/4. Finally, we analyze the 5′–3′ distance for secondary structures from the STRAND database, and conclude that the 5′–3′ distance is correlated with RNA sequence length.

## Keywords

RNA Boltzmann partition function Asymptotic combinatorics Dynamic programming## Mathematics Subject Classification (2000)

05C30 49L20## Preview

Unable to display preview. Download preview PDF.

## Supplementary material

## References

- Andronescu M, Bereg V, Hoos HH, Condon A (2008) RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinform 9: 340CrossRefGoogle Scholar
- Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, Fagan P, Marvin J, Padilla D, Ravichandran V, Schneider B, Thanki N, Weissig H, Westbrook JD, Zardecki C (2002) The protein data bank. Acta Crystallogr D Biol Crystallogr 58(Pt): 899–907CrossRefGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Iype L, Schneider B, Zardecki C (2003) The nucleic acid database. Methods Biochem Anal 44: 199–216Google Scholar
- Cormen T, Leiserson C, Rivest R (1990) Algorithms. McGraw-Hill, New YorkzbMATHGoogle Scholar
- Corver J, Lenches E, Smith K, Robison RA, Sando T, Strauss EG, Strauss JH (2003) Fine mapping of a cis-acting sequence element in yellow fever virus RNA that is required for RNA replication and cyclization. J Virol 77(3): 2265–2270CrossRefGoogle Scholar
- Darty K, Denise A, Ponty Y (2009) VARNA: interactive drawing and editing of the RNA secondary structure. Bioinformatics 25(15): 1974–1975CrossRefGoogle Scholar
- Flajolet P, Sedgewick R (2009) Analytic Combinatorics. Cambridge University, Cambridge ISBN-13:9780521898065zbMATHCrossRefGoogle Scholar
- Gallie DR (1991) The cap and poly(A) tail function synergistically to regulate mRNA translational efficiency. Genes Dev 5(11): 2108–2116CrossRefGoogle Scholar
- Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A (2009) Rfam: updates to the RNA families database. Nucleic Acids Res 37(Database): D136–D140CrossRefGoogle Scholar
- Gerland U, Bundschuh R, Hwa T (2001) Force-induced denaturation of RNA. Biophys J 81: 1324–1332CrossRefGoogle Scholar
- Gutell R, Lee J, Cannone J (2005) The accuracy of ribosomal RNA comparative structure models. Curr Opin Struct Biol 12: 301–310CrossRefGoogle Scholar
- Hofacker I (2003) Vienna RNA secondary structure server. Nucleic Acids Res 31(13): 3429–3431CrossRefGoogle Scholar
- Hofacker IL, Schuster P, Stadler PF (1998) Combinatorics of RNA secondary structures. Discret Appl Math 88:207–237. http://citeseer.nj.nec.com/1454.html Google Scholar
- Hopcroft JE, Ullman JD (1969) Formal languages and their relation to automata. Addison-Wesley, ReadingzbMATHGoogle Scholar
- Hsu MT, Parvin JD, Gupta S, Krystal M, Palese P (1987) Genomic RNAs of influenza viruses are held in a circular conformation in virions and in infected cells by a terminal panhandle. Proc Natl Acad Sci USA 84(22): 8140–8144CrossRefGoogle Scholar
- Kneller EL, Rakotondrafara AM, Miller WA (2006) Cap-independent translation of plant viral RNAs. Virus Res 119(1): 63–75CrossRefGoogle Scholar
- Lorenz WA, Ponty Y, Clote P (2008) Asymptotics of RNA shapes. J Comput Biol 15(1): 31–63MathSciNetCrossRefGoogle Scholar
- McCaskill J (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29: 1105–1119CrossRefGoogle Scholar
- Miller WA, White KA (2006) Long-distance RNA–RNA interactions in plant virus gene expression and replication. Annu Rev Phytopathol 44: 447–467CrossRefGoogle Scholar
- Nussinov R, Jacobson AB (1980) Fast algorithm for predicting the secondary structure of single stranded RNA. Proc Natl Acad Sci USA 77(11): 6309–6313CrossRefGoogle Scholar
- Sprinzl M, Horn C, Brown M, Ioudovitch A, Steinberg S (1998) Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res 26: 148–153CrossRefGoogle Scholar
- Stein PR, Waterman MS (1978) On some new sequences generalizing the Catalan and Motzkin numbers. Discret Math 26: 261–272MathSciNetCrossRefGoogle Scholar
- Xia T, SantaLucia J, Burkard M, Kierzek R, Schroeder S, Jiao X, Cox C, Turner D (1999) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson–Crick base pairs. Biochemistry 37(14): 719–735Google Scholar
- Yoffe AM, Prinsen P, Gelbart WM, Ben-Shaul A (2011) The ends of a large RNA molecule are necessarily close. Nucleic Acids Res 39(1): 292–299CrossRefGoogle Scholar
- Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31(13): 3406–3415CrossRefGoogle Scholar