Advertisement

PMFastR: A New Approach to Multiple RNA Structure Alignment

  • Daniel DeBlasio
  • Jocelyne Bruand
  • Shaojie Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5724)

Abstract

Multiple RNA structure alignment is particularly challenging because covarying mutations make sequence information alone insufficient. Many existing tools for multiple RNA alignments first generate pairwise RNA structure alignments and then build the multiple alignment using only the sequence information. Here we present PMFastR, an algorithm which iteratively uses a sequence-structure alignment procedure to build a multiple RNA structure alignment. PMFastR has low memory consumption allowing for the alignment of large sequences such as 16S and 23S rRNA. The algorithm also provides a method to utilize a multi-core environment. Finally, we present results on benchmark data sets from BRAliBase, which shows PMFastR outperforms other state-of-the-art programs. Furthermore, we regenerate 607 Rfam seed alignments and show that our automated process creates similar multiple alignments to the manually-curated Rfam seed alignments.

Keywords

multiple RNA alignment RNA sequence-structure alignment iterative alignment 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bauer, M., Klau, G., Reinert, K.: Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization. BMC Bioinformatics 8(1), 271 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Cannone, J., Subramanian, S., Schnare, M., Collett, J., D’Souza, L., Du, Y., Feng, B., Lin, N., Madabusi, L., Muller, K., Pande, N., Shang, Z., Yu, N., Gutell, R.: The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics 3(1), 2 (2002)CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Dalli, D., Wilm, A., Mainz, I., Steger, G.: STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time. Bioinformatics 22, 1593–1599 (2006)CrossRefPubMedGoogle Scholar
  4. 4.
    Do, C.B., Mahabhashyam, M.S., Brudno, M., Batzoglou, S.: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 15, 330–340 (2005)CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Eddy, S.R.: Infernal package, http://infernal.janelia.org/
  6. 6.
    Eddy, S.R., Durbin, R.: RNA sequence analysis using covariance models. Nucleic Acids Res. 22, 2079–2088 (1994)CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Edgar, R.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Gardner, P.P., Wilm, A., Washietl, S.: A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res. 33, 2433–2439 (2005)CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S.R., Bateman, A.: Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, D121–D124 (2005)CrossRefGoogle Scholar
  10. 10.
    Hofacker, I.L., Bernhart, S.H., Stadler, P.F.: Alignment of RNA base pairing probability matrices. Bioinformatics 20, 2222–2227 (2004)CrossRefPubMedGoogle Scholar
  11. 11.
    Hofacker, I.L., Fekete, M., Stadler, P.F.: Secondary structure prediction for aligned RNA sequences. J. Mol. Biol. 319, 1059–1066 (2002)CrossRefPubMedGoogle Scholar
  12. 12.
    Holmes, I.: Accelerated probabilistic inference of RNA structure evolution. BMC Bioinformatics 6, 73 (2005)CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Jaeger, J.A., Turner, D.H., Zuker, M.: Improved predictions of secondary structures for RNA. Proc. Natl. Acad. Sci. U.S.A. 86, 7706–7710 (1989)CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Jiang, T., Lin, G., Ma, B., Zhang, K.: A general edit distance between RNA structures. Journal of Computational Biology 9, 2002 (2002)Google Scholar
  15. 15.
    Katoh, K., Kuma, K., Toh, H., Miyata, T.: MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33, 511–518 (2005)CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Klein, R.J., Eddy, S.R.: RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinformatics 4, 44 (2003)CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Knudsen, B., Hein, J.: Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res. 31, 3423–3428 (2003)CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Larkin, M., Blackshields, G., Brown, N., Chenna, R., McGettigan, P., McWilliam, H., Valentin, F., Wallace, I., Wilm, A., Lopez, R., Thompson, J., Gibson, T., Higgins, D.: Clustal W and Clustal X version 2.0. Bioinformatics 23(21), 2947–2948 (2007)CrossRefPubMedGoogle Scholar
  19. 19.
    Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000)CrossRefPubMedGoogle Scholar
  20. 20.
    Rivas, E., Eddy, S.R.: Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics 2, 8 (2001)CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Sankoff, D.: Simulations solution of the RNA folding, alignment and protosequence problems. SIAM J. Appl. Math. 45(5), 810–825 (1985)CrossRefGoogle Scholar
  22. 22.
    Siebert, S., Backofen, R.: MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons. Bioinformatics 21, 3352–3359 (2005)CrossRefPubMedGoogle Scholar
  23. 23.
    Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Thompson, J.D., Plewniak, F., Poch, O.: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 27, 2682–2690 (1999)CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Torarinsson, E., Havgaard, J.H., Gorodkin, J.: Multiple structural alignment and clustering of RNA sequences. Bioinformatics 23, 926–932 (2007)CrossRefPubMedGoogle Scholar
  26. 26.
    Washietl, S., Hofacker, I.L., Lukasser, M., Httenhofer, A., Stadler, P.F.: Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat. Biotechnol. 23, 1383–1390 (2005)CrossRefPubMedGoogle Scholar
  27. 27.
    Weinberg, Z., Ruzzo, W.L.: Exploiting conserved structure for faster annotation of non-coding RNAs without loss of accuracy. Bioinformatics 20(suppl. 1), i334–i341 (2004)CrossRefGoogle Scholar
  28. 28.
    Wilm, A., Mainz, I., Steger, G.: An enhanced RNA alignment benchmark for sequence alignment programs. Algorithms Mol. Biol. 1, 19 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Zhang, S., Borovok, I., Aharonowitz, Y., Sharan, R., Bafna, V.: A sequence-based filtering method for ncRNA identification and its application to searching for riboswitch elements. Bioinformatics 22(14), e557–e565 (2006)CrossRefGoogle Scholar
  30. 30.
    Zhang, S., Haas, B., Eskin, E., Bafna, V.: Searching genomes for noncoding RNA using FastR. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2(4), 366–379 (2005)CrossRefPubMedGoogle Scholar
  31. 31.
    Zuker, M., Sankoff, D.: RNA secondary structures and their prediction. Bulletin of Mathematical Biology 46(4), 591–621 (1984)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Daniel DeBlasio
    • 1
  • Jocelyne Bruand
    • 2
  • Shaojie Zhang
    • 1
  1. 1.University of Central FloridaOrlandoUSA
  2. 2.University of CaliforniaSan Diego, La JollaUSA

Personalised recommendations