Predicting RNA secondary structures with pseudoknots by MCMC sampling

Metzler, Dirk; Nebel, Markus E.

doi:10.1007/s00285-007-0106-6

Predicting RNA secondary structures with pseudoknots by MCMC sampling

Published: 23 June 2007

Volume 56, pages 161–181, (2008)
Cite this article

Journal of Mathematical Biology Aims and scope Submit manuscript

Dirk Metzler¹ &
Markus E. Nebel²

234 Accesses
29 Citations
Explore all metrics

Abstract

The most probable secondary structure of an RNA molecule, given the nucleotide sequence, can be computed efficiently if a stochastic context-free grammar (SCFG) is used as the prior distribution of the secondary structure. The structures of some RNA molecules contain so-called pseudoknots. Allowing all possible configurations of pseudoknots is not compatible with context-free grammar models and makes the search for an optimal secondary structure NP-complete. We suggest a probabilistic model for RNA secondary structures with pseudoknots and present a Markov-chain Monte-Carlo Method for sampling RNA structures according to their posterior distribution for a given sequence. We favor Bayesian sampling over optimization methods in this context, because it makes the uncertainty of RNA structure predictions assessable. We demonstrate the benefit of our method in examples with tmRNA and also with simulated data. McQFold, an implementation of our method, is freely available from http://www.cs.uni-frankfurt.de/~metzler/McQFold.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Akutsu T. (2000). Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discrete Appl. Math. 104: 46–62
Article MathSciNet Google Scholar
Altschul S.F., Bundschuh R., Olsen R. and Hwa T. (2001). The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Res. 29(2): 351–361
Article Google Scholar
Beaumont M.A. and Rannala B. (2004). The Bayesian revolution in genetics. Nat. Rev. Genet. 5(4): 251–261
Article Google Scholar
Brown J.W. (1999). The Ribonuclease P Database. Nucleic Acids Res. 27(1): 314
Article Google Scholar
Cai L., Malmberg R.L. and Wu Y. (2003). Stochastic modeling of RNA pseudoknotted structures: a grammatical approach. Bioinformatics 19(Suppl 1): i66–73
Article Google Scholar
Ding Y. (2006). Statistical and Bayesian approaches to RNA secondary structure prediction. RNA 12(3): 323–331
Article Google Scholar
Ding Y., Chan C.Y. and Lawrence C.E. (2004). Sfold web server for statistical folding and rational design of nucleic acids. Nucleic Acids Res. 32(Web Server issue): W135–141
Article Google Scholar
Ding Y., Chan C.Y. and Lawrence C.E. (2005). RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. RNA 11(8): 1157–1166
Article Google Scholar
Ding Y., Chan C.Y. and Lawrence C.E. (2006). Clustering of RNA secondary structures with application to messenger RNAs. J. Mol. Biol. 359(3): 554–571
Article Google Scholar
Ding Y. and Lawrence C.E. (1999). A Bayesian statistical algorithm for RNA secondary structure prediction. Comput. Chem. 23(3–4): 387–400
Article Google Scholar
Ding Y. and Lawrence C.E. (2003). A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. 31(24): 7280–7301
Article Google Scholar
Dirks R.M., Bois J.S., Schaeffer J.M., Winfree E. and Pierce N.A. (2007). Thermodynamic analysis of interacting nucleic acid strands. SIAM Rev. 49: 65–88
Article MATH MathSciNet Google Scholar
Dirks R.M. and Pierce N.A. (2003). A partition function algorithm for nucleic acid secondary structure including pseudoknots. J. Comput. Chem. 24(13): 1664–1677
Article Google Scholar
Dirks R.M. and Pierce N.A. (2004). An algorithm for computing nucleic acid base-pairing probabilities including pseudoknots. J. Comput. Chem. 25(10): 1295–1304
Article Google Scholar
Doshi K.J., Cannone J.J., Cobaugh C.W. and Gutell R.R. (2004). Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction. BMC Bioinformatics 5: 105
Article Google Scholar
Dowell R.D. and Eddy S.R. (2004). Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 5: 71
Article Google Scholar
Durbin R.L., Eddy S.R., Krogh A. and Mitchison G. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge
MATH Google Scholar
Eddy S.R. (2004). What is Bayesian statistics?. Nat. Biotechnol. 22(9): 1177–1178
Article Google Scholar
Fleissner R., Metzler D. and von Haeseler A. (2005). Simultaneous statistical multiple alignment and phylogeny reconstruction. Syst. Biol. 54(4): 548–561
Article Google Scholar
Fraser C.M., Norris S.J., Weinstock G.M., White O., Sutton G.G., Dodson R., Gwinn M., Hickey E.K., Clayton R., Ketchum K.A., Sodergren E., Hardham J.M., McLeod M.P., Salzberg S., Peterson J., Khalak H., Richardson D., Howell J.K., Chidambaram M., Utterback T., McDonald L., Artiach P., Bowman C., Cotton M.D., Fujii C., Garland S., Hatch B., Horst K., Roberts K., Sandusky M., Weidman J., Smith H.O. and Venter J.C. (1998). Complete genome sequence of Treponema pallidum, the syphilis spirochete. Science 281(5375): 375–388
Article Google Scholar
Gardner P.P. and Giegerich R. (2004). A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics 5: 140
Article Google Scholar
Geman S. and Geman D. (1984). Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6: 721–741
Article MATH Google Scholar
GPL.: The GNU Public License. Available in full from http://www.fsf.org/copyleft/gpl.html (2000)
Griffiths-Jones S., Bateman A., Marshall M., Khanna A. and Eddy S.R. (2003). Rfam: an RNA family database. Nucleic Acids Res. 31(1): 439–441
Article Google Scholar
Hastings W.K. (1970). Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika 57: 97–109
Article MATH Google Scholar
Hofacker I.L. (2003). Vienna RNA secondary structure server. Nucleic Acids Res 31(13): 3429–3431
Article Google Scholar
Hofacker I.L., Fontana W., Stadler P.F., Bonhoeffer L.S., Tacker M. and Schuster P. (1994). Fast Folding and Comparison of RNA Secondary Structures. Monatsh Chem. 125: 167–188
Article Google Scholar
Kimura M. (1985). The role of compensatory neutral mutations in molecular evolution. J. Genet. 64: 7–19
Article Google Scholar
Kirkpatrick S., Gelatt C.D. and Vecchi M.P. (1983). Optimization by simulated annealing. Science 220: 671–680
Article MathSciNet Google Scholar
Knudsen B. and Hein J. (1999). RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15(6): 446–454
Article Google Scholar
Layton D.M. and Bundschuh R. (2005). A statistical analysis of RNA folding algorithms through thermodynamic parameter perturbation. Nucleic Acids Res. 33(2): 519–524
Article Google Scholar
Liu Y.S. (2001). Monte Carlo Strategies in Scientific Computing. Springer, Heidelberg
MATH Google Scholar
Lyngso, R.B., Pedersen, C.N.: Pseudoknots in RNA secondary structure. Proc. 4th Ann. Int. Conf. Comput. Mol. Biol. (RECOMB’ 00) pp. 201–209 (2000)
Lyngso R.B. and Pedersen C.N. (2000). RNA pseudoknot prediction in energy-based models. J. Comput. Biol. 7(3–4): 409–427
Article Google Scholar
Mathews D.H. (2004). Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA 10(8): 1178–1190
Article Google Scholar
McCaskill J.S. (1990). The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29(6–7): 1105–1119
Article Google Scholar
Metropolis N., Rosenbluth A.W., Rosenbluth M.N., Teller A.H. and Teller E. (1953). Equations of state calculations by fast computing machines. J. Chem. Phys. 21(6): 1087–1092
Article Google Scholar
Metzler D. (2003). Statistical alignment based on fragment insertion and deletion models. Bioinformatics 19(4): 490–499
Article Google Scholar
Metzler D. (2006). Robust E-values for gapped local alignments. J. Comput. Biol. 13(4): 882–96
Article MathSciNet Google Scholar
Metzler D., Fleissner R., Wakolbinger A. and von Haeseler A. (2001). Assessing variability by joint sampling of alignments and mutation rates. J. Mol. Evol. 53(6): 660–669
Article Google Scholar
Metzler, D., Fleißner, R., Wakolbinger, A., von Haeseler, A.: Stochastic insertion-deletion processes and statistical sequence alignment. In: Deuschel, J.D., Greven, A. (eds.) Interacting Stochastic Systems. Springer, Heidelberg (2005)
Metzler D., Grossmann S. and Wakolbinger A. (2002). A Poisson model for gapped local alignments. Stat. Prob. Lett. 60: 91–100
Article MATH MathSciNet Google Scholar
Metzler, D., Nebel, M.E.: Supplementary material for Predicting RNA Secondary Structures with Pseudoknots by MCMC Sampling. http//www.cs.uni-frankfurt.de/~metzler/McQFold/McQFoldSupplement.pdf (2007)
Nebel, M.E.: Identifying good predictions of RNA secondary structure. Pac Symp Biocomput, pp. 423–434 (2004)
Nussinov R., Pieczenik G., Griggs J.R. and Kleitman D.J. (1978). Algorithms for loop matchings. SIAM J. Appl. Math. 35: 68–82
Article MATH MathSciNet Google Scholar
Reeder J. and Giegerich R. (2004). Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics. BMC Bioinformatics 5: 104
Article Google Scholar
Ren J., Rastegari B., Condon A. and Hoos H.H. (2005). HotKnots: heuristic prediction of RNA secondary structures including pseudoknots. RNA 11(10): 1494–1504
Article Google Scholar
Rivas E. and Eddy S.R. (1999). A dynamic programming algorithm for RNA structure prediction including pseudoknots. J. Mol. Biol. 285(5): 2053–2068
Article Google Scholar
Ruan J., Stormo G.D. and Zhang W. (2004). An iterated loop matching approach to the prediction of RNA secondary structures with~pseudoknots. Bioinformatics 20(1): 58–66
Article Google Scholar
Schmitz M. and Steger G. (1996). Description of RNA folding by “simulated annealing”. J. Mol. Biol. 255(1): 254–266
Article Google Scholar
Sundaralingam M., Mizuno H., Stout C.D., Rao S.T., Liedman M. and Yathindra N. (1976). Mechanisms of chain folding in nucleic acids. The (omega, omega) plot and its correlation to the nucleotide geometry in yeast tRNAPhe. Nucleic Acids Res. 3(10): 2471–2484
Google Scholar
Borer P.N., Dengler B., Levin M.D., Uhlenbeck O.C., Crothers D.M., Bralla J. and Tinoco I. (1973). Improved estimation of secondary structure in ribonucleic acids. Nat. New Biol. 246(150): 40–41
Google Scholar
Uemura Y., Hasegawa A., Kobayashi S. and Yokomori T. (1999). Tree adjoining grammars for RNA structure prediction. Theor. Comput. Sci. 210: 277–303
Article MATH MathSciNet Google Scholar
van Batenburg F.H., Gultyaev A.P., Pleij C.W., Ng J. and Oliehoek J. (2000). PseudoBase: a database with RNA pseudoknots. Nucleic Acids Res. 28(1): 201–204
Article Google Scholar
Waldispühl, J., Clote, P.: Computing the parition function and sampling for saturated secondary structures of RNA, with respect to the Turner energy model. J. Comput. Biol. (2007, in press)
Walter A.E., Turner D.H., Kim J., Lyttle M.H., Muller P., Mathews D.H. and Zuker M. (1994). Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding. Proc. Natl. Acad. Sci. USA 91(20): 9218–9222
Article Google Scholar
Waterman M.S. (1978). Secondary Structure of Single-Stranded Nucleic Acids. Adv. Math. Suppl. Stud. 1: 167–212
MathSciNet Google Scholar
Williams K.P. (2000). The tmRNA website. Nucleic Acids Res. 28(1): 168
Article Google Scholar
Williams K.P. and Bartel D.P. (1998). The tmRNA Website. Nucleic Acids Res. 26(1): 163–165
Article Google Scholar
Wilm, A.: RNA-Alignments und RNA-struktur in silico. Disseratation Heinrich-Heine-Universität Düsseldorf (2006)
Xayaphoummine A., Bucher T., Thalmann F. and Isambert H. (2003). Prediction and statistics of pseudoknots in RNA structures using exactly clustered stochastic simulations. Proc. Natl. Acad. Sci. USA 100(26): 15310–15315
Article MATH MathSciNet Google Scholar
Zuker M. and Stiegler P. (1981). Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9(1): 133–148
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Informatik, J. W. Goethe-Universität, Robert Mayer Str. 11-15, 60325, Frankfurt am Main, Germany
Dirk Metzler
Fachbereich Informatik, Technische Universität Kaiserslautern, Gottlieb-Daimler-Str., 67663, Kaiserslautern, Germany
Markus E. Nebel

Authors

Dirk Metzler
View author publications
You can also search for this author in PubMed Google Scholar
Markus E. Nebel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dirk Metzler.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Metzler, D., Nebel, M.E. Predicting RNA secondary structures with pseudoknots by MCMC sampling. J. Math. Biol. 56, 161–181 (2008). https://doi.org/10.1007/s00285-007-0106-6

Download citation

Received: 01 January 2007
Published: 23 June 2007
Issue Date: January 2008
DOI: https://doi.org/10.1007/s00285-007-0106-6

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting RNA secondary structures with pseudoknots by MCMC sampling

Abstract

Access this article

Similar content being viewed by others

Accuracy of RNA Structure Prediction Depends on the Pseudoknot Grammar

Fast and accurate structure probability estimation for simultaneous alignment and folding of RNAs with Markov chains

Crumple: An Efficient Tool to Explore Thoroughly the RNA Folding Landscape

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification (2000)

Navigation

Predicting RNA secondary structures with pseudoknots by MCMC sampling

Abstract

Access this article

Similar content being viewed by others

Accuracy of RNA Structure Prediction Depends on the Pseudoknot Grammar

Fast and accurate structure probability estimation for simultaneous alignment and folding of RNAs with Markov chains

Crumple: An Efficient Tool to Explore Thoroughly the RNA Folding Landscape

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification (2000)

Search

Navigation