Skip to main content
Log in

Predicting RNA secondary structures with pseudoknots by MCMC sampling

  • Published:
Journal of Mathematical Biology Aims and scope Submit manuscript

Abstract

The most probable secondary structure of an RNA molecule, given the nucleotide sequence, can be computed efficiently if a stochastic context-free grammar (SCFG) is used as the prior distribution of the secondary structure. The structures of some RNA molecules contain so-called pseudoknots. Allowing all possible configurations of pseudoknots is not compatible with context-free grammar models and makes the search for an optimal secondary structure NP-complete. We suggest a probabilistic model for RNA secondary structures with pseudoknots and present a Markov-chain Monte-Carlo Method for sampling RNA structures according to their posterior distribution for a given sequence. We favor Bayesian sampling over optimization methods in this context, because it makes the uncertainty of RNA structure predictions assessable. We demonstrate the benefit of our method in examples with tmRNA and also with simulated data. McQFold, an implementation of our method, is freely available from http://www.cs.uni-frankfurt.de/~metzler/McQFold.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Akutsu T. (2000). Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discrete Appl. Math. 104: 46–62

    Article  MathSciNet  Google Scholar 

  2. Altschul S.F., Bundschuh R., Olsen R. and Hwa T. (2001). The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Res. 29(2): 351–361

    Article  Google Scholar 

  3. Beaumont M.A. and Rannala B. (2004). The Bayesian revolution in genetics. Nat. Rev. Genet. 5(4): 251–261

    Article  Google Scholar 

  4. Brown J.W. (1999). The Ribonuclease P Database. Nucleic Acids Res. 27(1): 314

    Article  Google Scholar 

  5. Cai L., Malmberg R.L. and Wu Y. (2003). Stochastic modeling of RNA pseudoknotted structures: a grammatical approach. Bioinformatics 19(Suppl 1): i66–73

    Article  Google Scholar 

  6. Ding Y. (2006). Statistical and Bayesian approaches to RNA secondary structure prediction. RNA 12(3): 323–331

    Article  Google Scholar 

  7. Ding Y., Chan C.Y. and Lawrence C.E. (2004). Sfold web server for statistical folding and rational design of nucleic acids. Nucleic Acids Res. 32(Web Server issue): W135–141

    Article  Google Scholar 

  8. Ding Y., Chan C.Y. and Lawrence C.E. (2005). RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. RNA 11(8): 1157–1166

    Article  Google Scholar 

  9. Ding Y., Chan C.Y. and Lawrence C.E. (2006). Clustering of RNA secondary structures with application to messenger RNAs. J. Mol. Biol. 359(3): 554–571

    Article  Google Scholar 

  10. Ding Y. and Lawrence C.E. (1999). A Bayesian statistical algorithm for RNA secondary structure prediction. Comput. Chem. 23(3–4): 387–400

    Article  Google Scholar 

  11. Ding Y. and Lawrence C.E. (2003). A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. 31(24): 7280–7301

    Article  Google Scholar 

  12. Dirks R.M., Bois J.S., Schaeffer J.M., Winfree E. and Pierce N.A. (2007). Thermodynamic analysis of interacting nucleic acid strands. SIAM Rev. 49: 65–88

    Article  MATH  MathSciNet  Google Scholar 

  13. Dirks R.M. and Pierce N.A. (2003). A partition function algorithm for nucleic acid secondary structure including pseudoknots. J. Comput. Chem. 24(13): 1664–1677

    Article  Google Scholar 

  14. Dirks R.M. and Pierce N.A. (2004). An algorithm for computing nucleic acid base-pairing probabilities including pseudoknots. J. Comput. Chem. 25(10): 1295–1304

    Article  Google Scholar 

  15. Doshi K.J., Cannone J.J., Cobaugh C.W. and Gutell R.R. (2004). Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction. BMC Bioinformatics 5: 105

    Article  Google Scholar 

  16. Dowell R.D. and Eddy S.R. (2004). Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 5: 71

    Article  Google Scholar 

  17. Durbin R.L., Eddy S.R., Krogh A. and Mitchison G. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  18. Eddy S.R. (2004). What is Bayesian statistics?. Nat. Biotechnol. 22(9): 1177–1178

    Article  Google Scholar 

  19. Fleissner R., Metzler D. and von Haeseler A. (2005). Simultaneous statistical multiple alignment and phylogeny reconstruction. Syst. Biol. 54(4): 548–561

    Article  Google Scholar 

  20. Fraser C.M., Norris S.J., Weinstock G.M., White O., Sutton G.G., Dodson R., Gwinn M., Hickey E.K., Clayton R., Ketchum K.A., Sodergren E., Hardham J.M., McLeod M.P., Salzberg S., Peterson J., Khalak H., Richardson D., Howell J.K., Chidambaram M., Utterback T., McDonald L., Artiach P., Bowman C., Cotton M.D., Fujii C., Garland S., Hatch B., Horst K., Roberts K., Sandusky M., Weidman J., Smith H.O. and Venter J.C. (1998). Complete genome sequence of Treponema pallidum, the syphilis spirochete. Science 281(5375): 375–388

    Article  Google Scholar 

  21. Gardner P.P. and Giegerich R. (2004). A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics 5: 140

    Article  Google Scholar 

  22. Geman S. and Geman D. (1984). Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6: 721–741

    Article  MATH  Google Scholar 

  23. GPL.: The GNU Public License. Available in full from http://www.fsf.org/copyleft/gpl.html (2000)

  24. Griffiths-Jones S., Bateman A., Marshall M., Khanna A. and Eddy S.R. (2003). Rfam: an RNA family database. Nucleic Acids Res. 31(1): 439–441

    Article  Google Scholar 

  25. Hastings W.K. (1970). Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika 57: 97–109

    Article  MATH  Google Scholar 

  26. Hofacker I.L. (2003). Vienna RNA secondary structure server. Nucleic Acids Res 31(13): 3429–3431

    Article  Google Scholar 

  27. Hofacker I.L., Fontana W., Stadler P.F., Bonhoeffer L.S., Tacker M. and Schuster P. (1994). Fast Folding and Comparison of RNA Secondary Structures. Monatsh Chem. 125: 167–188

    Article  Google Scholar 

  28. Kimura M. (1985). The role of compensatory neutral mutations in molecular evolution. J. Genet. 64: 7–19

    Article  Google Scholar 

  29. Kirkpatrick S., Gelatt C.D. and Vecchi M.P. (1983). Optimization by simulated annealing. Science 220: 671–680

    Article  MathSciNet  Google Scholar 

  30. Knudsen B. and Hein J. (1999). RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15(6): 446–454

    Article  Google Scholar 

  31. Layton D.M. and Bundschuh R. (2005). A statistical analysis of RNA folding algorithms through thermodynamic parameter perturbation. Nucleic Acids Res. 33(2): 519–524

    Article  Google Scholar 

  32. Liu Y.S. (2001). Monte Carlo Strategies in Scientific Computing. Springer, Heidelberg

    MATH  Google Scholar 

  33. Lyngso, R.B., Pedersen, C.N.: Pseudoknots in RNA secondary structure. Proc. 4th Ann. Int. Conf. Comput. Mol. Biol. (RECOMB’ 00) pp. 201–209 (2000)

  34. Lyngso R.B. and Pedersen C.N. (2000). RNA pseudoknot prediction in energy-based models. J. Comput. Biol. 7(3–4): 409–427

    Article  Google Scholar 

  35. Mathews D.H. (2004). Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA 10(8): 1178–1190

    Article  Google Scholar 

  36. McCaskill J.S. (1990). The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29(6–7): 1105–1119

    Article  Google Scholar 

  37. Metropolis N., Rosenbluth A.W., Rosenbluth M.N., Teller A.H. and Teller E. (1953). Equations of state calculations by fast computing machines. J. Chem. Phys. 21(6): 1087–1092

    Article  Google Scholar 

  38. Metzler D. (2003). Statistical alignment based on fragment insertion and deletion models. Bioinformatics 19(4): 490–499

    Article  Google Scholar 

  39. Metzler D. (2006). Robust E-values for gapped local alignments. J. Comput. Biol. 13(4): 882–96

    Article  MathSciNet  Google Scholar 

  40. Metzler D., Fleissner R., Wakolbinger A. and von Haeseler A. (2001). Assessing variability by joint sampling of alignments and mutation rates. J. Mol. Evol. 53(6): 660–669

    Article  Google Scholar 

  41. Metzler, D., Fleißner, R., Wakolbinger, A., von Haeseler, A.: Stochastic insertion-deletion processes and statistical sequence alignment. In: Deuschel, J.D., Greven, A. (eds.) Interacting Stochastic Systems. Springer, Heidelberg (2005)

  42. Metzler D., Grossmann S. and Wakolbinger A. (2002). A Poisson model for gapped local alignments. Stat. Prob. Lett. 60: 91–100

    Article  MATH  MathSciNet  Google Scholar 

  43. Metzler, D., Nebel, M.E.: Supplementary material for Predicting RNA Secondary Structures with Pseudoknots by MCMC Sampling. http//www.cs.uni-frankfurt.de/~metzler/McQFold/McQFoldSupplement.pdf (2007)

  44. Nebel, M.E.: Identifying good predictions of RNA secondary structure. Pac Symp Biocomput, pp. 423–434 (2004)

  45. Nussinov R., Pieczenik G., Griggs J.R. and Kleitman D.J. (1978). Algorithms for loop matchings. SIAM J. Appl. Math. 35: 68–82

    Article  MATH  MathSciNet  Google Scholar 

  46. Reeder J. and Giegerich R. (2004). Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics. BMC Bioinformatics 5: 104

    Article  Google Scholar 

  47. Ren J., Rastegari B., Condon A. and Hoos H.H. (2005). HotKnots: heuristic prediction of RNA secondary structures including pseudoknots. RNA 11(10): 1494–1504

    Article  Google Scholar 

  48. Rivas E. and Eddy S.R. (1999). A dynamic programming algorithm for RNA structure prediction including pseudoknots. J. Mol. Biol. 285(5): 2053–2068

    Article  Google Scholar 

  49. Ruan J., Stormo G.D. and Zhang W. (2004). An iterated loop matching approach to the prediction of RNA secondary structures with~pseudoknots. Bioinformatics 20(1): 58–66

    Article  Google Scholar 

  50. Schmitz M. and Steger G. (1996). Description of RNA folding by “simulated annealing”. J. Mol. Biol. 255(1): 254–266

    Article  Google Scholar 

  51. Sundaralingam M., Mizuno H., Stout C.D., Rao S.T., Liedman M. and Yathindra N. (1976). Mechanisms of chain folding in nucleic acids. The (omega, omega) plot and its correlation to the nucleotide geometry in yeast tRNAPhe. Nucleic Acids Res. 3(10): 2471–2484

    Google Scholar 

  52. Borer P.N., Dengler B., Levin M.D., Uhlenbeck O.C., Crothers D.M., Bralla J. and Tinoco I. (1973). Improved estimation of secondary structure in ribonucleic acids. Nat. New Biol. 246(150): 40–41

    Google Scholar 

  53. Uemura Y., Hasegawa A., Kobayashi S. and Yokomori T. (1999). Tree adjoining grammars for RNA structure prediction. Theor. Comput. Sci. 210: 277–303

    Article  MATH  MathSciNet  Google Scholar 

  54. van Batenburg F.H., Gultyaev A.P., Pleij C.W., Ng J. and Oliehoek J. (2000). PseudoBase: a database with RNA pseudoknots. Nucleic Acids Res. 28(1): 201–204

    Article  Google Scholar 

  55. Waldispühl, J., Clote, P.: Computing the parition function and sampling for saturated secondary structures of RNA, with respect to the Turner energy model. J. Comput. Biol. (2007, in press)

  56. Walter A.E., Turner D.H., Kim J., Lyttle M.H., Muller P., Mathews D.H. and Zuker M. (1994). Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding. Proc. Natl. Acad. Sci. USA 91(20): 9218–9222

    Article  Google Scholar 

  57. Waterman M.S. (1978). Secondary Structure of Single-Stranded Nucleic Acids. Adv. Math. Suppl. Stud. 1: 167–212

    MathSciNet  Google Scholar 

  58. Williams K.P. (2000). The tmRNA website. Nucleic Acids Res. 28(1): 168

    Article  Google Scholar 

  59. Williams K.P. and Bartel D.P. (1998). The tmRNA Website. Nucleic Acids Res. 26(1): 163–165

    Article  Google Scholar 

  60. Wilm, A.: RNA-Alignments und RNA-struktur in silico. Disseratation Heinrich-Heine-Universität Düsseldorf (2006)

  61. Xayaphoummine A., Bucher T., Thalmann F. and Isambert H. (2003). Prediction and statistics of pseudoknots in RNA structures using exactly clustered stochastic simulations. Proc. Natl. Acad. Sci. USA 100(26): 15310–15315

    Article  MATH  MathSciNet  Google Scholar 

  62. Zuker M. and Stiegler P. (1981). Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9(1): 133–148

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dirk Metzler.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Metzler, D., Nebel, M.E. Predicting RNA secondary structures with pseudoknots by MCMC sampling. J. Math. Biol. 56, 161–181 (2008). https://doi.org/10.1007/s00285-007-0106-6

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00285-007-0106-6

Mathematics Subject Classification (2000)

Navigation