Abstract
The most probable secondary structure of an RNA molecule, given the nucleotide sequence, can be computed efficiently if a stochastic context-free grammar (SCFG) is used as the prior distribution of the secondary structure. The structures of some RNA molecules contain so-called pseudoknots. Allowing all possible configurations of pseudoknots is not compatible with context-free grammar models and makes the search for an optimal secondary structure NP-complete. We suggest a probabilistic model for RNA secondary structures with pseudoknots and present a Markov-chain Monte-Carlo Method for sampling RNA structures according to their posterior distribution for a given sequence. We favor Bayesian sampling over optimization methods in this context, because it makes the uncertainty of RNA structure predictions assessable. We demonstrate the benefit of our method in examples with tmRNA and also with simulated data. McQFold, an implementation of our method, is freely available from http://www.cs.uni-frankfurt.de/~metzler/McQFold.
Similar content being viewed by others
References
Akutsu T. (2000). Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discrete Appl. Math. 104: 46–62
Altschul S.F., Bundschuh R., Olsen R. and Hwa T. (2001). The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Res. 29(2): 351–361
Beaumont M.A. and Rannala B. (2004). The Bayesian revolution in genetics. Nat. Rev. Genet. 5(4): 251–261
Brown J.W. (1999). The Ribonuclease P Database. Nucleic Acids Res. 27(1): 314
Cai L., Malmberg R.L. and Wu Y. (2003). Stochastic modeling of RNA pseudoknotted structures: a grammatical approach. Bioinformatics 19(Suppl 1): i66–73
Ding Y. (2006). Statistical and Bayesian approaches to RNA secondary structure prediction. RNA 12(3): 323–331
Ding Y., Chan C.Y. and Lawrence C.E. (2004). Sfold web server for statistical folding and rational design of nucleic acids. Nucleic Acids Res. 32(Web Server issue): W135–141
Ding Y., Chan C.Y. and Lawrence C.E. (2005). RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. RNA 11(8): 1157–1166
Ding Y., Chan C.Y. and Lawrence C.E. (2006). Clustering of RNA secondary structures with application to messenger RNAs. J. Mol. Biol. 359(3): 554–571
Ding Y. and Lawrence C.E. (1999). A Bayesian statistical algorithm for RNA secondary structure prediction. Comput. Chem. 23(3–4): 387–400
Ding Y. and Lawrence C.E. (2003). A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. 31(24): 7280–7301
Dirks R.M., Bois J.S., Schaeffer J.M., Winfree E. and Pierce N.A. (2007). Thermodynamic analysis of interacting nucleic acid strands. SIAM Rev. 49: 65–88
Dirks R.M. and Pierce N.A. (2003). A partition function algorithm for nucleic acid secondary structure including pseudoknots. J. Comput. Chem. 24(13): 1664–1677
Dirks R.M. and Pierce N.A. (2004). An algorithm for computing nucleic acid base-pairing probabilities including pseudoknots. J. Comput. Chem. 25(10): 1295–1304
Doshi K.J., Cannone J.J., Cobaugh C.W. and Gutell R.R. (2004). Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction. BMC Bioinformatics 5: 105
Dowell R.D. and Eddy S.R. (2004). Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 5: 71
Durbin R.L., Eddy S.R., Krogh A. and Mitchison G. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge
Eddy S.R. (2004). What is Bayesian statistics?. Nat. Biotechnol. 22(9): 1177–1178
Fleissner R., Metzler D. and von Haeseler A. (2005). Simultaneous statistical multiple alignment and phylogeny reconstruction. Syst. Biol. 54(4): 548–561
Fraser C.M., Norris S.J., Weinstock G.M., White O., Sutton G.G., Dodson R., Gwinn M., Hickey E.K., Clayton R., Ketchum K.A., Sodergren E., Hardham J.M., McLeod M.P., Salzberg S., Peterson J., Khalak H., Richardson D., Howell J.K., Chidambaram M., Utterback T., McDonald L., Artiach P., Bowman C., Cotton M.D., Fujii C., Garland S., Hatch B., Horst K., Roberts K., Sandusky M., Weidman J., Smith H.O. and Venter J.C. (1998). Complete genome sequence of Treponema pallidum, the syphilis spirochete. Science 281(5375): 375–388
Gardner P.P. and Giegerich R. (2004). A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics 5: 140
Geman S. and Geman D. (1984). Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6: 721–741
GPL.: The GNU Public License. Available in full from http://www.fsf.org/copyleft/gpl.html (2000)
Griffiths-Jones S., Bateman A., Marshall M., Khanna A. and Eddy S.R. (2003). Rfam: an RNA family database. Nucleic Acids Res. 31(1): 439–441
Hastings W.K. (1970). Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika 57: 97–109
Hofacker I.L. (2003). Vienna RNA secondary structure server. Nucleic Acids Res 31(13): 3429–3431
Hofacker I.L., Fontana W., Stadler P.F., Bonhoeffer L.S., Tacker M. and Schuster P. (1994). Fast Folding and Comparison of RNA Secondary Structures. Monatsh Chem. 125: 167–188
Kimura M. (1985). The role of compensatory neutral mutations in molecular evolution. J. Genet. 64: 7–19
Kirkpatrick S., Gelatt C.D. and Vecchi M.P. (1983). Optimization by simulated annealing. Science 220: 671–680
Knudsen B. and Hein J. (1999). RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15(6): 446–454
Layton D.M. and Bundschuh R. (2005). A statistical analysis of RNA folding algorithms through thermodynamic parameter perturbation. Nucleic Acids Res. 33(2): 519–524
Liu Y.S. (2001). Monte Carlo Strategies in Scientific Computing. Springer, Heidelberg
Lyngso, R.B., Pedersen, C.N.: Pseudoknots in RNA secondary structure. Proc. 4th Ann. Int. Conf. Comput. Mol. Biol. (RECOMB’ 00) pp. 201–209 (2000)
Lyngso R.B. and Pedersen C.N. (2000). RNA pseudoknot prediction in energy-based models. J. Comput. Biol. 7(3–4): 409–427
Mathews D.H. (2004). Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA 10(8): 1178–1190
McCaskill J.S. (1990). The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29(6–7): 1105–1119
Metropolis N., Rosenbluth A.W., Rosenbluth M.N., Teller A.H. and Teller E. (1953). Equations of state calculations by fast computing machines. J. Chem. Phys. 21(6): 1087–1092
Metzler D. (2003). Statistical alignment based on fragment insertion and deletion models. Bioinformatics 19(4): 490–499
Metzler D. (2006). Robust E-values for gapped local alignments. J. Comput. Biol. 13(4): 882–96
Metzler D., Fleissner R., Wakolbinger A. and von Haeseler A. (2001). Assessing variability by joint sampling of alignments and mutation rates. J. Mol. Evol. 53(6): 660–669
Metzler, D., Fleißner, R., Wakolbinger, A., von Haeseler, A.: Stochastic insertion-deletion processes and statistical sequence alignment. In: Deuschel, J.D., Greven, A. (eds.) Interacting Stochastic Systems. Springer, Heidelberg (2005)
Metzler D., Grossmann S. and Wakolbinger A. (2002). A Poisson model for gapped local alignments. Stat. Prob. Lett. 60: 91–100
Metzler, D., Nebel, M.E.: Supplementary material for Predicting RNA Secondary Structures with Pseudoknots by MCMC Sampling. http//www.cs.uni-frankfurt.de/~metzler/McQFold/McQFoldSupplement.pdf (2007)
Nebel, M.E.: Identifying good predictions of RNA secondary structure. Pac Symp Biocomput, pp. 423–434 (2004)
Nussinov R., Pieczenik G., Griggs J.R. and Kleitman D.J. (1978). Algorithms for loop matchings. SIAM J. Appl. Math. 35: 68–82
Reeder J. and Giegerich R. (2004). Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics. BMC Bioinformatics 5: 104
Ren J., Rastegari B., Condon A. and Hoos H.H. (2005). HotKnots: heuristic prediction of RNA secondary structures including pseudoknots. RNA 11(10): 1494–1504
Rivas E. and Eddy S.R. (1999). A dynamic programming algorithm for RNA structure prediction including pseudoknots. J. Mol. Biol. 285(5): 2053–2068
Ruan J., Stormo G.D. and Zhang W. (2004). An iterated loop matching approach to the prediction of RNA secondary structures with~pseudoknots. Bioinformatics 20(1): 58–66
Schmitz M. and Steger G. (1996). Description of RNA folding by “simulated annealing”. J. Mol. Biol. 255(1): 254–266
Sundaralingam M., Mizuno H., Stout C.D., Rao S.T., Liedman M. and Yathindra N. (1976). Mechanisms of chain folding in nucleic acids. The (omega, omega) plot and its correlation to the nucleotide geometry in yeast tRNAPhe. Nucleic Acids Res. 3(10): 2471–2484
Borer P.N., Dengler B., Levin M.D., Uhlenbeck O.C., Crothers D.M., Bralla J. and Tinoco I. (1973). Improved estimation of secondary structure in ribonucleic acids. Nat. New Biol. 246(150): 40–41
Uemura Y., Hasegawa A., Kobayashi S. and Yokomori T. (1999). Tree adjoining grammars for RNA structure prediction. Theor. Comput. Sci. 210: 277–303
van Batenburg F.H., Gultyaev A.P., Pleij C.W., Ng J. and Oliehoek J. (2000). PseudoBase: a database with RNA pseudoknots. Nucleic Acids Res. 28(1): 201–204
Waldispühl, J., Clote, P.: Computing the parition function and sampling for saturated secondary structures of RNA, with respect to the Turner energy model. J. Comput. Biol. (2007, in press)
Walter A.E., Turner D.H., Kim J., Lyttle M.H., Muller P., Mathews D.H. and Zuker M. (1994). Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding. Proc. Natl. Acad. Sci. USA 91(20): 9218–9222
Waterman M.S. (1978). Secondary Structure of Single-Stranded Nucleic Acids. Adv. Math. Suppl. Stud. 1: 167–212
Williams K.P. (2000). The tmRNA website. Nucleic Acids Res. 28(1): 168
Williams K.P. and Bartel D.P. (1998). The tmRNA Website. Nucleic Acids Res. 26(1): 163–165
Wilm, A.: RNA-Alignments und RNA-struktur in silico. Disseratation Heinrich-Heine-Universität Düsseldorf (2006)
Xayaphoummine A., Bucher T., Thalmann F. and Isambert H. (2003). Prediction and statistics of pseudoknots in RNA structures using exactly clustered stochastic simulations. Proc. Natl. Acad. Sci. USA 100(26): 15310–15315
Zuker M. and Stiegler P. (1981). Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9(1): 133–148
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Metzler, D., Nebel, M.E. Predicting RNA secondary structures with pseudoknots by MCMC sampling. J. Math. Biol. 56, 161–181 (2008). https://doi.org/10.1007/s00285-007-0106-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00285-007-0106-6