Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA

  • Weijia Xu
  • Stuart Ozer
  • Robin R. Gutell
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5566)

Abstract

With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure.

Keywords

Biological database Bioinformatics Sequence Analysis RNA 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cannone, J.J., et al.: The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics 3(1), 2 (2002)CrossRefGoogle Scholar
  2. 2.
    Gutell, R.R., et al.: Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. Nucleic Acids Res. 20(21), 5785–5795 (1992)CrossRefGoogle Scholar
  3. 3.
    Gutell, R.R., et al.: Comparative anatomy of 16-S-like ribosomal RNA. Prog. Nucleic Acid Res. Mol. Biol. 32, 155–216 (1985)CrossRefGoogle Scholar
  4. 4.
    Gutell, R.R., Noller, H.F., Woese, C.R.: Higher order structure in ribosomal RNA. EMBO J. 5(5), 1111–1113 (1986)Google Scholar
  5. 5.
    Woese, C.R., Winker, S., Gutell, R.R.: Architecture of ribosomal RNA: constraints on the sequence of tetra-loops. Proceedings of the National Academy of Sciences of the United States of America 87(21), 8467–8471 (1990)CrossRefGoogle Scholar
  6. 6.
    Benson, D.A., et al.: GenBank. Nucleic acids research, 35(Database issue), pp. D21–D25 (2007)Google Scholar
  7. 7.
    Walker, E.: A Distributed File System for a Wide-area High Performance Computing Infrastructure. In: Proceedings of the 3rd conference on USENIX Workshop on Real, Large Distributed Systems, vol. 3. USENIX Association, Seattle (2006)Google Scholar
  8. 8.
    Macke, T.J.: The Ae2 Alignment Editor (1992)Google Scholar
  9. 9.
    Walker, E.: Creating Private Network Overlays for High Performance Scientific Computing. In: ACM/IFIP/USENIX International Middleware Conference. ACM, Newport Beach (2007)Google Scholar
  10. 10.
    Stephens, S.M., et al.: Oracle Database 10g: a platform for BLAST search and Regular Expression pattern matching in life sciences. Nucl. Acids Res. 33, 675–679 (2005)CrossRefGoogle Scholar
  11. 11.
    Eckman, B.: Efficient Access to BLAST using IBM DB2 Information Integrator. IBM health care & life sciences (September 2004)Google Scholar
  12. 12.
    Tata, S., Patel, J.M.: PiQA: an algebra for querying protein data sets. In: Proceedings of 15th International Conference on Scientific and Statistical Database Management, pp. 141–150 (2003)Google Scholar
  13. 13.
    Miranker, D.P., Xu, W., Mao, R.: MoBIoS: a Metric-Space DBMS to Support Biological Discovery. In: 15th International Conference on Scientific and Statistical Database Management (SSDBM 2003), Cambridge, Massachusetts, USA. IEEE Computer Society, Los Alamitos (2003)Google Scholar
  14. 14.
    Xu, W., et al.: Using MoBIoS’ Scalable Genome Joins to Find Conserved Primer Pair Candidates Between Two Genomes. Bioinformatics 20, i371–i378 (2004)CrossRefGoogle Scholar
  15. 15.
    Xu, W., et al.: On integrating peptide sequence analysis and relational distance-based indexing. In: IEEE 6th Symposium on Bioinformatics and Bioengineering (BIBE 2006), Arlington, VA, USA (accepted) (2006)Google Scholar
  16. 16.
    Sandeep, T., James, S.F., Anand, S.: Declarative Querying for Biological Sequences. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE 2006). IEEE Computer Society, Los Alamitos (2006)Google Scholar
  17. 17.
    rCAD: Comparative RNA Analysis Database (manuscripts in preparation)Google Scholar
  18. 18.
    Yeang, C.H., et al.: Detecting the coevolution of biosequences–an example of RNA interaction prediction. Molecular biology and evolution 24(9), 2119–2131 (2007)CrossRefGoogle Scholar
  19. 19.
    Gutell, R.R., Lee, J.C., Cannone, J.J.: The accuracy of ribosomal RNA comparative structure models. Curr. Opin. Struct. Biol. 12(3), 301–310 (2002)CrossRefGoogle Scholar
  20. 20.
    Noller, H.F., et al.: Secondary structure model for 23S ribosomal RNA. Nucleic Acids Res. 9(22), 6167–6189 (1981)CrossRefGoogle Scholar
  21. 21.
    Rzhetsky, A.: Estimating substitution rates in ribosomal RNA genes. Genetics 141(2), 771–783 (1995)Google Scholar
  22. 22.
    Hofacker, I.L., et al.: Automatic detection of conserved RNA structure elements in complete RNA virus genomes. Nucleic acids research 26(16), 3825–3836 (1998)CrossRefGoogle Scholar
  23. 23.
    Knudsen, B., Hein, J.: RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15(6), 446–454 (1999)CrossRefGoogle Scholar
  24. 24.
    Rivas, E., et al.: Computational identification of noncoding RNAs in E. coli by comparative genomics. Current biology 11(17), 1369–1373 (2001)CrossRefGoogle Scholar
  25. 25.
    di Bernardo, D., Down, T., Hubbard, T.: ddbrna: detection of conserved secondary structures in multiple alignments. Bioinformatics 19(13), 1606–1611 (2003)CrossRefGoogle Scholar
  26. 26.
    Coventry, A., Kleitman, D.J., Berger, B.: MSARI: multiple sequence alignments for statistical detection of RNA secondary structure. Proceedings of the National Academy of Sciences of the United States of America 101(33), 12102–12107 (2004)CrossRefGoogle Scholar
  27. 27.
    Washietl, S., et al.: Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nature biotechnology 23(11), 1383–1390 (2005)CrossRefGoogle Scholar
  28. 28.
    Pedersen, J.S., et al.: Identification and classification of conserved RNA secondary structures in the human genome. PLoS computational biology 2(4), e33 (2006)CrossRefGoogle Scholar
  29. 29.
    Dutheil, J., et al.: A model-based approach for detecting coevolving positions in a molecule. Molecular biology and evolution 22(9), 1919–1928 (2005)CrossRefGoogle Scholar
  30. 30.
    Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17(6), 368–376 (1981)CrossRefGoogle Scholar
  31. 31.
    Ranwez, V., Gascuel, O.: Quartet-Based Phylogenetic Inference: Improvements and Limits. Molecular biology and evolution 18(6), 1103–1116 (2001)CrossRefMATHGoogle Scholar
  32. 32.
    Atchley, W.R., et al.: Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. Molecular biology and evolution 17(1), 164–178 (2000)CrossRefGoogle Scholar
  33. 33.
    Tillier, E.R., Lui, T.W.: Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments. Bioinformatics 19(6), 750–755 (2003)CrossRefGoogle Scholar
  34. 34.
    Comparative RNA Website, http://www.rna.ccbb.utexas.edu/
  35. 35.
  36. 36.
    Gautheret, D., Damberger, S.H., Gutell, R.R.: Identification of base-triples in RNA using comparative sequence analysis. J. Mol. Biol. 248(1), 27–43 (1995)CrossRefGoogle Scholar
  37. 37.
    Gutell, R.R.: Collection of small subunit (16S- and 16S-like) ribosomal RNA structures. Nucleic Acids Res. 21(13), 3051–3054 (1993)CrossRefGoogle Scholar
  38. 38.
    Woese, C.R., et al.: Detailed analysis of the higher-order structure of 16S-like ribosomal ribonucleic acids. Microbiol. Rev. 47(4), 621–669 (1983)Google Scholar
  39. 39.
    Patterson, D.A., Hennessy, J.L.: Computer architecture: a quantitative approach, vol. xxviii, 594, p. 160. Morgan Kaufman Publishers, San Mateo (1990)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Weijia Xu
    • 1
  • Stuart Ozer
    • 2
  • Robin R. Gutell
    • 3
  1. 1.Texas Advanced Computing CenterThe University of Texas at AustinAustinUSA
  2. 2.One Microsoft Way RedmondWashingtonUSA
  3. 3.Center of Computational Biology and BioinformaticsThe University of Texas at AustinAustinUSA

Personalised recommendations