Advertisement

Exact Pattern Matching for RNA Structure Ensembles

  • Christina Schmiedl
  • Mathias Möhl
  • Steffen Heyne
  • Mika Amit
  • Gad M. Landau
  • Sebastian Will
  • Rolf Backofen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7262)

Abstract

ExpaRNA’s core algorithm computes, for two fixed RNA structures, a maximal non-overlapping set of maximal exact matchings. We introduce an algorithm ExpaRNA-P that solves the lifted problem of finding such sets of exact matchings in entire Boltzmann-distributed structure ensembles of two RNAs. Due to a novel kind of structural sparsification, the new algorithm maintains the time and space complexity of the algorithm for fixed input structures. Furthermore, we generalized the chaining algorithm of ExpaRNA in order to compute a compatible subset of ExpaRNA-P’s exact matchings. We show that ExpaRNA-P outperforms ExpaRNA in BRAliBase 2.1 benchmarks, where we pass the chained exact matchings as anchor constraints to the RNA alignment tool LocARNA. Compared to LocARNA, this novel approach shows similar accuracy but is six times faster.

Keywords

Base Pair Alignment Quality Exact Pattern Multiple Structural Alignment Minimum Free Energy Structure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    The FANTOM Consortium: The transcriptional landscape of the mammalian genome. Science 309(5740), 1559–1563 (2005)CrossRefGoogle Scholar
  2. 2.
    Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., Long, J., Stern, D., Tammana, H., Helt, G., Sementchenko, V., Piccolboni, A., Bekiranov, S., Bailey, D.K., Ganesh, M., Ghosh, S., Bell, I., Gerhard, D.S., Gingeras, T.R.: Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308, 1149–1154 (2005)CrossRefGoogle Scholar
  3. 3.
    Bertone, P., Stoc, V., Royce, T.E., Rozowsky, J.S., Urban, A.E., Zhu, X., Rinn, J.L., Tongprasit, W., Samanta, M., Weissman, S., Gerstein, M., Snyder, M.: Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004)CrossRefGoogle Scholar
  4. 4.
    Kapranov, P., Willingham, A.T., Gingeras, T.R.: Genome-wide transcription and the implications for genomic organization. Nat. Rev. Genet. 8(6), 413–423 (2007)CrossRefGoogle Scholar
  5. 5.
    Mattick, J.S., Taft, R.J., Faulkner, G.J.: A global view of genomic information - moving beyond the gene and the master regulator. Trends in Genetics (2009)Google Scholar
  6. 6.
    Consortium, A.F.B., Backofen, R., Bernhart, S.H., Flamm, C., Fried, C., Fritzsch, G., Hackermuller, J., Hertel, J., Hofacker, I.L., Missal, K., Mosig, A., Prohaska, S.J., Rose, D., Stadler, P.F., Tanzer, A., Washietl, S., Will, S.: RNAs everywhere: genome-wide annotation of structured RNAs. J. Exp. Zoolog. B. Mol. Dev. Evol. 308(1), 1–25 (2007)Google Scholar
  7. 7.
    Rivas, E., Eddy, S.R.: Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics 2(1), 8 (2001)CrossRefGoogle Scholar
  8. 8.
    Washietl, S., Hofacker, I.L.: Identifying structural noncoding RNAs using RNAz. In: Curr. Protoc. Bioinformatics, ch.12, Unit 12.7 (2007)Google Scholar
  9. 9.
    Pedersen, J.S., Bejerano, G., Siepel, A., Rosenbloom, K., Lindblad-Toh, K., Lander, E.S., Kent, J., Miller, W., Haussler, D.: Identification and Classification of Conserved RNA Secondary Structures in the Human Genome. PLoS Comput. Biol. 2(4), e33 (2006)CrossRefGoogle Scholar
  10. 10.
    Will, S., Reiche, K., Hofacker, I.L., Stadler, P.F., Backofen, R.: Inferring non-coding RNA families and classes by means of genome-scale structure-based clustering. PLOS Computational Biology 3(4), e65 (2007)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Kaczkowski, B., Torarinsson, E., Reiche, K., Havgaard, J.H., Stadler, P.F., Gorodkin, J.: Structural profiles of human miRNA families from pairwise clustering. Bioinformatics 25(3), 291–294 (2009)CrossRefGoogle Scholar
  12. 12.
    Parker, B.J., Moltke, I., Roth, A., Washietl, S., Wen, J., Kellis, M., Breaker, R., Pedersen, J.S.: New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes. Genome Res. (2011)Google Scholar
  13. 13.
    Höchsmann, M., Töller, T., Giegerich, R., Kurtz, S.: Local similarity in RNA secondary structures. In: Proceedings of Computational Systems Bioinformatics (CSB 2003), vol. 2, pp. 159–168. IEEE Computer Society (2003)Google Scholar
  14. 14.
    Siebert, S., Backofen, R.: MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons. Bioinformatics 21(16), 3352–3359 (2005)CrossRefGoogle Scholar
  15. 15.
    Sankoff, D.: Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J. Appl. Math. 45(5), 810–825 (1985)MathSciNetzbMATHCrossRefGoogle Scholar
  16. 16.
    Havgaard, J.H., Lyngso, R.B., Stormo, G.D., Gorodkin, J.: Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics 21(9), 1815–1824 (2005)CrossRefGoogle Scholar
  17. 17.
    Mathews, D.H., Turner, D.H.: Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. Journal of Molecular Biology 317(2), 191–203 (2002)CrossRefGoogle Scholar
  18. 18.
    Hofacker, I.L., Bernhart, S.H., Stadler, P.F.: Alignment of RNA base pairing probability matrices. Bioinformatics 20(14), 2222–2227 (2004)CrossRefGoogle Scholar
  19. 19.
    McCaskill, J.S.: The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29(6-7), 1105–1119 (1990)CrossRefGoogle Scholar
  20. 20.
    Gorodkin, J., Heyer, L., Stormo, G.: Finding the most significant common sequence and structure motifs in a set of RNA sequences. Nucleic Acids Res. 25(18), 3724–3732 (1997)CrossRefGoogle Scholar
  21. 21.
    Bradley, R.K., Pachter, L., Holmes, I.: Specific alignment of structured RNA: stochastic grammars and sequence annealing. Bioinformatics 24(23), 2677–2683 (2008)CrossRefGoogle Scholar
  22. 22.
    Torarinsson, E., Havgaard, J.H., Gorodkin, J.: Multiple structural alignment and clustering of RNA sequences. Bioinformatics 23(8), 926–932 (2007)CrossRefGoogle Scholar
  23. 23.
    Bauer, M., Klau, G.W., Reinert, K.: Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization. BMC Bioinformatics 8, 271 (2007)CrossRefGoogle Scholar
  24. 24.
    Do, C.B., Foo, C.S., Batzoglou, S.: A max-margin model for efficient simultaneous alignment and folding of RNA sequences. Bioinformatics 24(13), i68–i76 (2008)CrossRefGoogle Scholar
  25. 25.
    Heyne, S., Will, S., Beckstette, M., Backofen, R.: Lightweight comparison of RNAs based on exact sequence-structure matches. Bioinformatics 25(16), 2095–2102 (2009)CrossRefGoogle Scholar
  26. 26.
    Backofen, R., Siebert, S.: Fast detection of common sequence structure patterns in RNAs. Journal of Discrete Algorithms 5(2), 212–228 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  27. 27.
    Wexler, Y., Zilberstein, C., Ziv-Ukelson, M.: A study of accessible motifs and RNA folding complexity. Journal of Computational Biology 14(6), 856–872 (2007)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Havgaard, J.H., Torarinsson, E., Gorodkin, J.: Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix. PLoS Comput. Biol. 3(10), 1896–1908 (2007)CrossRefGoogle Scholar
  29. 29.
    Ziv-Ukelson, M., Gat-Viks, I., Wexler, Y., Shamir, R.: A Faster Algorithm for RNA Co-folding. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 174–185. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  30. 30.
    Backofen, R., Tsur, D., Zakov, S., Ziv-Ukelson, M.: Sparse RNA Folding: Time and Space Efficient Algorithms. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009. LNCS, vol. 5577, pp. 249–262. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  31. 31.
    Salari, R., Möhl, M., Will, S., Sahinalp, S.C., Backofen, R.: Time and Space Efficient RNA-RNA Interaction Prediction via Sparse Folding. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 473–490. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  32. 32.
    Backofen, R., Will, S.: Local sequence-structure motifs in RNA. Journal of Bioinformatics and Computational Biology (JBCB) 2(4), 681–698 (2004)CrossRefGoogle Scholar
  33. 33.
    Otto, W., Will, S., Backofen, R.: Structure local multiple alignment of RNA. In: Proceedings of German Conference on Bioinformatics (GCB 2008). LNI, Gesellschaft für Informatik (GI), vol. P-136, pp. 178–188 (2008)Google Scholar
  34. 34.
    Wilm, A., Mainz, I., Steger, G.: An enhanced RNA alignment benchmark for sequence alignment programs. Algorithms Mol. Biol. 1, 19 (2006)CrossRefGoogle Scholar
  35. 35.
    Gardner, P.P., Wilm, A., Washietl, S.: A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Research 33(8), 2433–2439 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Christina Schmiedl
    • 1
  • Mathias Möhl
    • 1
  • Steffen Heyne
    • 1
  • Mika Amit
    • 2
  • Gad M. Landau
    • 2
    • 3
  • Sebastian Will
    • 1
    • 4
  • Rolf Backofen
    • 1
    • 5
  1. 1.Bioinformatics, Institute of Computer ScienceAlbert-Ludwigs-UniversitätFreiburgGermany
  2. 2.Department of Computer ScienceUniversity of HaifaHaifaIsrael
  3. 3.Department of Computer Science and EngineeringNYU-PolyBrooklynUSA
  4. 4.CSAIL and Mathematics DepartmentMITCambridgeUSA
  5. 5.Center for Biological Signaling Studies (BIOSS)Albert-Ludwigs-UniversitätFreiburgGermany

Personalised recommendations