Skip to main content

Linear-Time Algorithms for RNA Structure Prediction

  • Protocol
  • First Online:
RNA Structure Prediction

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2586))

Abstract

RNA secondary structure prediction is widely used to understand RNA function. Existing dynamic programming-based algorithms, both the classical minimum free energy (MFE) methods and partition function methods, suffer from a major limitation: their runtimes scale cubically with the RNA length, and this slowness limits their use in genome-wide applications. Inspired by incremental parsing for context-free grammars in computational linguistics, we designed linear-time heuristic algorithms, LinearFold and LinearPartition, to approximate the MFE structure, partition function and base pairing probabilities. These programs are orders of magnitude faster than Vienna RNAfold and CONTRAfold on long sequences. More interestingly, LinearFold and LinearPartition lead to more accurate predictions on the longest sequence families for which the structures are well established (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500 +  nucleotides apart). This chapter provides protocols for using LinearFold and LinearPartition for secondary structure prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Petrov AI, Kay SJ, Kalvari I, Howe KL, Gray KA, Bruford EA, Kersey PJ, Cochrane G, Finn RD, Bateman A, Kozomara A, Griffiths-Jones S, Frankish A, Zwieb CW, Lau BY, Williams KP, Chan PP, Lowe TM, Cannone JJ, Gutell R, Machnicka MA, Bujnicki JM, Yoshihama M, Kenmochi N, Chai B, Cole JR, Szymanski M, Karlowski WM, Wood V, Huala E, Berardini TZ, Zhao Y, Chen R, Zhu W, Paraskevopoulou MD, Vlachos IS, Hatzigeorgiou AG, Ma L, Zhang Z, Puetz J, Stadler PF, McDonald D, Basu S, Fey P, Engel SR, Cherry JM, Volders PJ, Mestdagh P, Wower J, Clark MB, Quek XC, Dinger ME (2017) RNAcentral: a comprehensive database of non-coding RNA sequences. Nucleic Acids Res 45(D1):D128–D134

    Google Scholar 

  2. Bachellerie JP, Cavaillé J, Hüttenhofer A (2002) The expanding snoRNA world. Biochimie 84(8):775–790

    Article  CAS  PubMed  Google Scholar 

  3. Bellaousov S, Mathews DH (2010) ProbKnot: fast prediction of RNA secondary structure including pseudoknots. RNA 16(10):1870–1880

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Bernhart SH, Hofacker IL, Stadler PF (2006) Local RNA base pairing probabilities in large sequences. Bioinformatics 22(5):614–615

    Article  CAS  PubMed  Google Scholar 

  5. Clote P, Ponty Y, Steyaert J (2012) Expected distance between terminal nucleotides of RNA secondary structures. J Math Biol 65(3):581–599

    Article  PubMed  Google Scholar 

  6. Ding Y, Lawrence CE (2003) A statistical sampling algorithm for RNA secondary. Nucleic Acids Res 31(24):7280–7301

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Do C, Woods D, Batzoglou S (2006) CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics 22(14):e90–e98

    Article  CAS  PubMed  Google Scholar 

  8. Doudna JA, Cech TR (2002) The chemical repertoire of natural ribozymes. Nature 418(6894):222–228

    Article  CAS  PubMed  Google Scholar 

  9. Eddy SR (2001) Non-coding RNA genes and the modern RNA world. Nat Rev Genet 2(12):919–929

    Article  CAS  PubMed  Google Scholar 

  10. Hofacker IL, Lorenz R (2014) Predicting RNA structure: advances and limitations. In: RNA folding: methods and protocols, pp 1–19

    Google Scholar 

  11. Huang L, Sagae K (2010) Dynamic programming for linear-time incremental parsing. In: Proceedings of ACL 2010. ACL, Uppsala, pp 1077–1086

    Google Scholar 

  12. Huang L, Zhang H, Deng D, Zhao K, Liu K, Hendrix DA, Mathews DH (2019) LinearFold: linear-time approximate RNA folding by 5-to-3 dynamic programming and beam search. Bioinformatics 35(14):i295–i304

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Kerpedjiev P, Hammer S, Hofacker IL (2015) Forna (force-directed RNA): simple and effective online RNA secondary structure diagrams. Bioinformatics 31(20):3377–3379

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Kiryu H, Kin T, Asai K (2008) Rfold: an exact algorithm for computing local base pairing probabilities. Bioinformatics 24(3):367–373

    Article  CAS  PubMed  Google Scholar 

  15. Knudsen B, Hein J (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res 31(13):3423–3428

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Lai W-JC, Kayedkhordeh M, Cornell EV, Farah E, Bellaousov S, Rietmeijer R, Salsi E, Mathews DH, Ermolenko DN (2018) mRNAs and lncRNAs intrinsically form secondary structures with short end-to-end distances. Nat Commun 9(1):1–11

    Article  Google Scholar 

  17. Lange SJ, Maticzka D, Mohl M, Gagnon JN, Brown CM, Backofen R (2012) Global or local? predicting secondary structure and accessibility in mRNAs. Nucleic Acids Res 40(12):5215–5226

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Leija-Martínez N, Casas-Flores S, Cadena-Nava RD, Roca JA, Mendez-Cabañas JA, Gomez E, Ruiz-Garcia J (2014) The separation between the 5-3 ends in long RNA molecules is short and nearly constant. Nucleic Acids Res 42(22):13963–13968

    Article  PubMed  PubMed Central  Google Scholar 

  19. Li TJ, Reidys CM (2018) The rainbow spectrum of RNA secondary structures. Bull Math Biol 80(6):1514–1538

    Article  CAS  PubMed  Google Scholar 

  20. Liu B, Mathews DH, Turner DH (2010) RNA pseudoknots: folding and finding. F1000 Biology Reports 2(8)

    Google Scholar 

  21. Lorenz R, Bernhart SH, Zu Siederdissen CH, Tafer H, Flamm C, Stadler PF, Hofacker IL (2011) ViennaRNA package 2.0. Algorithms Mol Biol 6(1):1

    Google Scholar 

  22. Lu ZJ, Gloor JW, Mathews DH (2009) Improved RNA secondary structure prediction by maximizing expected pair accuracy. RNA 15(10):1805–1813

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Lyumkis D (2019) Challenges and opportunities in cryo-EM single-particle analysis. J Biol Chem 294(13):5181–5197

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Mathews DH (2004) Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA 10(8):1178–1190

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Mathews DH (2006) Revolutions in RNA secondary structure prediction. J Mol Biol 359(3):526–532

    Article  CAS  PubMed  Google Scholar 

  26. Mathews DH (2006) RNA secondary structure analysis using RNAstructure. Curr Protoc Bioinform 13(1):12–6

    Article  Google Scholar 

  27. Mathews DH, Turner DH (2006) Prediction of RNA secondary structure by free energy minimization. Curr Opin Struct Biol 16(3):270–278

    Article  CAS  PubMed  Google Scholar 

  28. McCaskill JS (1990) The equilibrium partition function and base pair probabilities for RNA secondary structure. Biopolymers 29:11105–1119

    Google Scholar 

  29. Nussinov R, Jacobson AB (1980) Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc Nat Acad Sci USA 77(11):6309–6313

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Reuter JS, Mathews DH (2010) RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinform 11(1):1–9

    Article  Google Scholar 

  31. Sato K, Kato Y, Hamada M, Akutsu T, Asai K (2011) IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics 27(13):i85–i93

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Seetin MG, Mathews DH (2012) RNA structure prediction: an overview of methods. In: Bacterial regulatory RNA: methods and protocols, pp 99–122

    Google Scholar 

  33. Sloma M, Mathews D (2016) Exact calculation of loop formation probability identifies folding motifs in RNA secondary structures. RNA 22:1808–1818

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Sperschneider J, Datta A (2010) DotKnot: pseudoknot prediction using the probability dot plot under a refined energy model. Nucleic Acids Res 38(7):e103–e114

    Article  PubMed  PubMed Central  Google Scholar 

  35. Turner DH, Mathews DH (2009) NNDB: The nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res 38:D280–D282

    Article  PubMed  PubMed Central  Google Scholar 

  36. Waterman MS, Smith TF (1986) Rapid dynamic programming algorithms for RNA secondary structure. Adv Appl Math 7(4):455–464

    Article  Google Scholar 

  37. Yoffe AM, Prinsen P, Gelbart WM, Ben-Shaul A (2011) The ends of a large RNA molecule are necessarily close. Nucleic Acids Res 39(1):292–299

    Article  CAS  PubMed  Google Scholar 

  38. Zhang H, Keane S (2019) Advances that facilitate the study of large RNA structure and dynamics by nuclear magnetic resonance spectroscopy. Wiley Interdiscip Rev RNA 10:e1541

    Article  PubMed  PubMed Central  Google Scholar 

  39. Zhang H, Zhang L, Mathews DH, Huang L (2020) LinearPartition: linear-time approximation of RNA folding partition function and base-pairing probabilities. Bioinformatics 36:i258–i267

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Zhang J, Ferré-D’Amaré AR (2014) New molecular engineering approaches for crystallographic studies of large RNAs. Curr Opin Struct Biol 26:9–15

    Article  PubMed  Google Scholar 

  41. Zhang L, Zhang H, Mathews DH, Huang L (2019) ThreshKnot: thresholded probknot for improved RNA secondary structure prediction. bioRxiv

    Google Scholar 

  42. Zuker M (1989) On finding all suboptimal foldings of an RNA molecule. Science 244(4900):48–52

    Article  CAS  PubMed  Google Scholar 

  43. Zuker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9(1):133–148

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work is supported in part by National Institutes of Health (R35GM145283 to D.H.M.)

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Zhang, H., Zhang, L., Liu, K., Li, S., Mathews, D.H., Huang, L. (2023). Linear-Time Algorithms for RNA Structure Prediction. In: Kawaguchi, R.K., Iwakiri, J. (eds) RNA Structure Prediction. Methods in Molecular Biology, vol 2586. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2768-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2768-6_2

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2767-9

  • Online ISBN: 978-1-0716-2768-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics