RNA Structural Alignments, Part I: Sankoff-Based Approaches for Structural Alignments

  • Jakob Hull Havgaard
  • Jan Gorodkin
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1097)

Abstract

Simultaneous alignment and secondary structure prediction of RNA sequences is often referred to as “RNA structural alignment.” A class of the methods for structural alignment is based on the principles proposed by Sankoff more than 25 years ago. The Sankoff algorithm simultaneously folds and aligns two or more sequences. The advantage of this algorithm over those that separate the folding and alignment steps is that it makes better predictions. The disadvantage is that it is slower and requires more computer memory to run. The amount of computational resources needed to run the Sankoff algorithm is so high that it took more than a decade before the first implementation of a Sankoff style algorithm was published. However, with the faster computers available today and the improved heuristics used in the implementations the Sankoff-based methods have become practical. This chapter describes the methods based on the Sankoff algorithm. All the practical implementations of the algorithm use heuristics to make them run in reasonable time and memory. These heuristics are also described in this chapter.

Key words

Structural RNA alignment Simultaneous folding and alignment of RNA sequences Sankoff algorithm 

References

  1. 1.
    Gardner PP, Wilm A, Washietl S (2005) A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res 33(8):2433–2439PubMedCentralPubMedCrossRefGoogle Scholar
  2. 2.
    Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23(21):2947–2948PubMedCrossRefGoogle Scholar
  3. 3.
    Washietl S, Hofacker IL (2004) Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. J Mol Biol 342(1): 19–30PubMedCrossRefGoogle Scholar
  4. 4.
    Menzel P, Gorodkin J, Stadler PF (2009) The tedious task of finding homologous noncoding RNA genes. RNA 15(12):2075–2082PubMedCentralPubMedCrossRefGoogle Scholar
  5. 5.
    Sankoff D (1985) Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J Appl Math 45(5): 810–825CrossRefGoogle Scholar
  6. 6.
    Klein RJ, Eddy SR (2003) RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinformatics 4(1):44PubMedCentralPubMedCrossRefGoogle Scholar
  7. 7.
    Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89(22):10915–10919PubMedCentralPubMedCrossRefGoogle Scholar
  8. 8.
    Higgins DG, Sharp PM (1988) CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73(1):237–244PubMedCrossRefGoogle Scholar
  9. 9.
    Hofacker IL, Bernhart SH, Stadler PF (2004) Alignment of RNA base pairing probability matrices. Bioinformatics 20(14):2222–2227PubMedCrossRefGoogle Scholar
  10. 10.
    Bradley RK, Pachter L, Holmes I (2008) Specific alignment of structured RNA: stochastic grammars and sequence annealing. Bioinformatics 24(23):2677–2683PubMedCentralPubMedCrossRefGoogle Scholar
  11. 11.
    Gorodkin J, Heyer LJ, Stormo GD (1997) Finding the most significant common sequence and structure motifs in a set of RNA sequences. Nucleic Acids Res 25(18):3724–3732PubMedCentralPubMedCrossRefGoogle Scholar
  12. 12.
    Gorodkin J, Stricklin SL, Stormo GD (2001) Discovering common stem-loop motifs in unaligned RNA sequences. Nucleic Acids Res 29(10):2135–2144PubMedCentralPubMedCrossRefGoogle Scholar
  13. 13.
    Havgaard JH, Lyngso RB, Stormo GD, Gorodkin J (2005) Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics 21(9): 1815–1824PubMedCrossRefGoogle Scholar
  14. 14.
    Havgaard JH, Torarinsson E, Gorodkin J (2007) Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix. PLoS Comput Biol 3(10):1896–1908PubMedGoogle Scholar
  15. 15.
    Mathews DH, Turner DH (2002) Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J Mol Biol 317(2):191–203PubMedCrossRefGoogle Scholar
  16. 16.
    Mathews D (2004) Predicting the secondary structure common to two RNA sequences with Dynalign. Curr Protoc Bioinformatics. Unit 12.4Google Scholar
  17. 17.
    Mathews DH (2005) Predicting a set of minimal free energy RNA secondary structures common to two sequences. Bioinformatics 21(10):2246–2253PubMedCrossRefGoogle Scholar
  18. 18.
    Harmanci AO, Sharma G, Mathews DH (2007) Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign. BMC Bioinformatics 8:130PubMedCentralPubMedCrossRefGoogle Scholar
  19. 19.
    Will S, Reiche K, Hofacker IL, Stadler PF, Backofen R (2007) Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering. PLoS Comput Biol 3(4):e65PubMedCentralPubMedCrossRefGoogle Scholar
  20. 20.
    Will S, Joshi T, Hofacker IL, Stadler PF, Backofen R (2012) LocARNA-P: accurate boundary prediction and improved detection of structural RNAs. RNA 18(5):900–914PubMedCentralPubMedCrossRefGoogle Scholar
  21. 21.
    Gorodkin J, Hofacker IL, Torarinsson E, Yao Z, Havgaard JH, Ruzzo WL (2010) De novo prediction of structured RNAs from genomic sequences. Trends Biotechnol 28(1):9–19PubMedCrossRefGoogle Scholar
  22. 22.
    Kiryu H, Tabei Y, Kin T, Asai K (2007) Murlet: a practical multiple alignment tool for structural RNA sequences. Bioinformatics 23(13):1588–1598PubMedCrossRefGoogle Scholar
  23. 23.
    Torarinsson E, Havgaard JH, Gorodkin J (2007) Multiple structural alignment and clustering of RNA sequences. Bioinformatics 23(8):926–932PubMedCrossRefGoogle Scholar
  24. 24.
    Dowell RD, Eddy SR (2004) Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 5(1):71PubMedCentralPubMedCrossRefGoogle Scholar
  25. 25.
    Rivas E, Lang R, Eddy SR (2012) A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more. RNA 18(2):193–212PubMedCentralPubMedCrossRefGoogle Scholar
  26. 26.
    Dowell RD, Eddy SR (2006) Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. BMC Bioinformatics 7:400PubMedCentralPubMedCrossRefGoogle Scholar
  27. 27.
    Holmes I (2005) Accelerated probabilistic inference of RNA structure evolution. BMC Bioinformatics 6:73PubMedCentralPubMedCrossRefGoogle Scholar
  28. 28.
    Torarinsson E, Sawera M, Havgaard JH, Fredholm M, Gorodkin J (2006) Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure. Genome Res 16(7):885–889PubMedCentralPubMedCrossRefGoogle Scholar
  29. 29.
    Uzilov AV, Keegan JM, Mathews DH (2006) Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics 7(1):173PubMedCentralPubMedCrossRefGoogle Scholar
  30. 30.
    Torarinsson E, Lindgreen S (2008) WAR: Webserver for aligning structural RNAs. Nucleic Acids Res 36(Web server issue):W79–W84Google Scholar
  31. 31.
    Gorodkin J, Hofacker IL (2011) From structure prediction to genomic screens for novel non-coding RNAs. PLoS Comput Biol 7(8):e1002100PubMedCentralPubMedCrossRefGoogle Scholar
  32. 32.
    Meyer IM, Mikls I (2007) SimulFold: simultaneously inferring RNA structures including pseudoknots, alignments, and trees using a Bayesian MCMC framework. PLoS Comput Biol 3(8):e149PubMedCentralPubMedCrossRefGoogle Scholar
  33. 33.
    Menzel P, Seemann SE, Gorodkin J (2012) RILogo: visualising RNA-RNA interactions. Bioinformatics 28(19):2523–2526PubMedCrossRefGoogle Scholar
  34. 34.
    Gardner PP, Daub J, Tate J, Moore BL, Osuch IH, Griffiths-Jones S, Finn RD, Nawrocki EP, Kolbe DL, Eddy SR, Bateman A (2011) Rfam: Wikipedia, clans and the “decimal” release. Nucleic Acids Res 39(Database issue):D141–D145PubMedCentralPubMedCrossRefGoogle Scholar
  35. 35.
    Widmann J, Stombaugh J, McDonald D, Chocholousova J, Gardner P, Iyer MK, Liu Z, Lozupone CA, Quinn J, Smit S, Wikman S, Zaneveld JR, Knight R (2012) RNASTAR: an RNA STructural Alignment Repository that provides insight into the evolution of natural and artificial RNAs. RNA 18(7):1319– 1327PubMedCentralPubMedCrossRefGoogle Scholar
  36. 36.
    Breaker RR (2011) Prospects for riboswitch discovery and analysis. Mol Cell 43(6):867–879PubMedCrossRefGoogle Scholar
  37. 37.
    Ding Y, Lawrence CE (2003) A statistical sampling algorithm for RNA secondary structure prediction Nucleic Acids Res 31(24):7280–7301Google Scholar
  38. 38.
    Voss B (2006) Structural analysis of aligned RNAs. Nucleic Acids Res 34(19):5471– 5481PubMedCentralPubMedCrossRefGoogle Scholar
  39. 39.
    Harmanci AO, Sharma G, Mathews DH (2009) Stochastic sampling of the RNA structural alignment space. Nucleic Acids Res 37(12):4063–4075PubMedCentralPubMedCrossRefGoogle Scholar
  40. 40.
    Höner zu Siederdissen C, Bernhart SH, Stadler PF, Hofacker IL (2011) A folding algorithm for extended RNA secondary structures. Bioinformatics 27(13):i129– i136Google Scholar
  41. 41.
    Washietl S, Hofacker IL, Stadler PF, Kellis M (2012) RNA folding with soft constraints: reconciliation of probing data and thermodynamic secondary structure prediction. Nucleic Acids Res 40(10):4261–4272PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Jakob Hull Havgaard
    • 1
  • Jan Gorodkin
    • 1
  1. 1.Center for non-coding RNA in Technology and Health, IKVHUniversity of CopenhagenFrederiksberg CDenmark

Personalised recommendations