Consensus Folding of Unaligned RNA Sequences Revisited

  • Vineet Bafna
  • Haixu Tang
  • Shaojie Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3500)

Abstract

As one of the earliest problems in computational biology, RNA secondary structure prediction (sometimes referred to as “RNA folding”) problem has attracted attention again, thanking to the recent discoveries of many novel non-coding RNA molecules. The two common approaches to this problem are de novo prediction of RNA secondary structure based on energy minimization and “consensus folding” approach (computing the common secondary structure for a set of unaligned RNA sequences). Consensus folding algorithms work well when the correct seed alignment is part of the input to the problem. However, seed alignment itself is a challenging problem for diverged RNA families.

In this paper, we propose a novel framework to predict the common secondary structure for unaligned RNA sequences. By matching putative stacks in RNA sequences, we make use of both primary sequence information and thermodynamic stability for prediction at the same time. We show that our method can predict the correct common RNA secondary structures even when we are only given a limited number of unaligned RNA sequences, and it outperforms current algorithms in sensitivity and accuracy.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Eddy, S.: Non-coding RNA genes and the modern RNA world. Nat. Rev. Genet. 2, 919–929 (2001)CrossRefGoogle Scholar
  2. 2.
    Storz, G.: An expanding universe of noncoding RNAs. Science 296, 1260–1263 (2002)CrossRefGoogle Scholar
  3. 3.
    International Human Genome Sequencing Consortium: Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004)Google Scholar
  4. 4.
    Kampa, D., et al.: Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res. 14, 331–342 (2004)CrossRefGoogle Scholar
  5. 5.
    Nahvi, A., Sudarshan, N., Ebert, M., Zou, X., Brown, K., Breaker, R.: Genetic control by a metabolite binding mRNA. Chemical Biology 9, 1043–1049 (2003)CrossRefGoogle Scholar
  6. 6.
    Vitreschak, A., et al.: Riboswitches: the oldest mechanism for the regulation of gene expression? Trends in Genetics 20, 44–50 (2003)CrossRefGoogle Scholar
  7. 7.
    Tinoco, I., Uhlenbeck, O., Levine, M.: Estimation of secondary structure in ribonucleic acids. Nature 230, 362–367 (1971)CrossRefGoogle Scholar
  8. 8.
    Nussinov, R., Jacobson, A.: Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc. Natl. Acad. Sci. USA 77, 6309–6313 (1980)CrossRefGoogle Scholar
  9. 9.
    Nussinov, R., Pieczenik, G., Griggs, J., Kleitman, D.: Algorithms for loop matchings. SIAM J. Appl. Math. 35, 68–82 (1978)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Smith, T., Waterman, M.: RNA Secondary structure. Math. Biosci. 42, 257–266 (1978)MATHCrossRefGoogle Scholar
  11. 11.
    Waterman, M.: Secondary structure of single stranded nucleic acids. Adv. Math. Suppl. Stud. I, 167–212 (1978)MathSciNetGoogle Scholar
  12. 12.
    Zuker, M., Sankoff, D.: RNA secondary structure and their prediction. Bull. Math. Biol. 46, 591–621 (1984)MATHGoogle Scholar
  13. 13.
    Zuker, M.: Prediction of RNA secondary structure by energy minimization. Methods Mol. Biol. 25, 267–294 (1994)Google Scholar
  14. 14.
    Hofacker, I.: Vienna RNA secondary structure server. Nucl. Acids Res. 31, 3429–3431 (2003)CrossRefGoogle Scholar
  15. 15.
    Jaeger, J., Turner, D., Zuker, M.: Improved predictions of secondary structures for RNA. Proc. Natl. Acad. Sci. USA 86, 7706–7710 (1989)CrossRefGoogle Scholar
  16. 16.
    Pavesi, G., Mauri, G., Stefani, M., Pesole, G.: RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences. Nucl. Acids Res. 32, 3258–3269 (2004)CrossRefGoogle Scholar
  17. 17.
    Levitt, M.: Detailed molecular model for transfer ribonucleic acid. Nature 224, 759–763 (1969)CrossRefGoogle Scholar
  18. 18.
    Hofacker, I., Fekete, M., Stadler, P.: Secondary structure prediction for aligned RNA sequences. J. Mol. Biol. 319, 1059–1066 (2002)CrossRefGoogle Scholar
  19. 19.
    Gorodkin, J., Stricklin, S., Stormo, G.: Discovering common stem-loop motifs in unaligned RNA sequences. Nucl. Acids Res. 29, 2135–2144 (2001)CrossRefGoogle Scholar
  20. 20.
    Sankoff, D.: Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J. Appl. Math. 45, 810–825 (1985)MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Mathews, D., Turner, D.: Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J. Mol. Biol. 317, 191–203 (2002)CrossRefGoogle Scholar
  22. 22.
    Gorodkin, J., Heyer, L., Stormo, G.: Finding the most significant common sequence and structure motifs in a set of RNA sequences. Nucl. Acids Res. 25, 3724–3732 (1997)CrossRefGoogle Scholar
  23. 23.
    Eddy, S., Durbin, R.: RNA sequence analysis using covariance models. Nucl. Acids Res. 22, 2079–2088 (1994)CrossRefGoogle Scholar
  24. 24.
    Sakakibara, Y., et al.: Recent methods for RNA modeling using Stochastic Context Free Grammars. Combinatorial Pattern Matching 807 (1994)Google Scholar
  25. 25.
    Knudsen, B., Hein, J.: Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucl. Acids Res. 31, 3423–3428 (2003)CrossRefGoogle Scholar
  26. 26.
    Knight, R., Birmingham, A., Yarus, M.: BayesFold: rational 2 degrees folds that combine thermodynamic, covariation, and chemical data for aligned RNA sequences. RNA 10, 1323–1336 (2004)CrossRefGoogle Scholar
  27. 27.
    Bray, N., Pachter, L.: MAVID: Constrained Ancestral Alignment of Multiple Sequences. Genome Res. 14, 693–699 (2004)CrossRefGoogle Scholar
  28. 28.
    Waterman, M.: Consensus methods for fodling single-stranded nucleic acids. Mathematical methods for DNA Sequences, 185–224 (1989)Google Scholar
  29. 29.
    Ji, Y., Xu, X., Stormo, G.: A graph theoretical approach for predicting common RNA secondary structure motifs including pseudoknots in unaligned sequences. Bioinformatics 20, 1591–1602 (2004)CrossRefGoogle Scholar
  30. 30.
    Perriquet, O., Touzet, H., Dauchet, M.: Finding the common structure shared by two homologous RNAs. Bioinformatics 19, 108–116 (2003)CrossRefGoogle Scholar
  31. 31.
    Bouthinon, D., Soldano, H.: A new method to predict the consensus secondary structure of a set of unaligned RNA sequences. Bioinformatics 15, 785–798 (1999)CrossRefGoogle Scholar
  32. 32.
    Davydov, E., Batzoglou, S.: A computational model for rna multiple structural alignment. Combinatorial Pattern Matching (2004)Google Scholar
  33. 33.
    Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A., Eddy, S.: Rfam: an RNA family database. Nucl. Acids Res. 31, 439–441 (2003)CrossRefGoogle Scholar
  34. 34.
    Touzet, H., Perriquet, O.: CARNAC: folding families of related RNAs. Nucl. Acids Res. 32, 142–145 (2004)CrossRefGoogle Scholar
  35. 35.
    Bafna, V., Muthukrishnan, S., Ravi, R.: Computing similarity between RNA strings. Combinatorial Pattern Matching 937, 1–14 (1995)MathSciNetGoogle Scholar
  36. 36.
    Thompson, J., Higgins, D., Gibson, T.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl. Acids Res. 22, 4673–4680 (1994)CrossRefGoogle Scholar
  37. 37.
    Lawrence, C., et al.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Vineet Bafna
    • 1
  • Haixu Tang
    • 2
  • Shaojie Zhang
    • 1
  1. 1.Dept. of Computer Science and EngineeringUniversity of California, San DiegoLa JollaUSA
  2. 2.School of Informatics and Center for Genomics and BioinformaticsIndiana UniversityBloomingtonUSA

Personalised recommendations