Pseudoknot Identification through Learning TAGRNA

  • Sahar Al Seesi
  • Sanguthevar Rajasekaran
  • Reda Ammar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5265)


Studying the structure of RNA sequences is an important problem that helps in understanding the functional properties of RNA. Pseudoknot is one type of RNA structures that cannot be modeled with Context Free Grammars (CFG) because it exhibits crossing dependencies. Pseudoknot structures have functional importance since they appear, for example, in viral genome RNAs and ribozyme active sites. Tree Adjoining Grammars (TAG) is one example of a grammatical model that is more expressive than CFG and has the capability of dealing with crossing dependencies. In this paper, we describe a new inference algorithm for TAGRNA, a sub-model of TAG. We also introduce an RNA structure identification framework, TAGRNAInf, within which the TAGRNA inference algorithm constitutes the core of the training phase. We present the results of using the proposed framework for identifying RNA sequences with pseudoknot structures. Our results outperform those reported in [14] for the same problem that employs a different grammatical formalism.


Threshold Function Inference Algorithm Hepatitis Delta Virus Viral Genome RNAs Formal Grammar 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Al Seesi, S.: Pseudoknot Identification through Learning TAGRNA, BECAT-CSE Technical Report, University of Connecticut (April 2008)Google Scholar
  2. 2.
    Akutsu, T.: Dynamic Programming Algorithms for RNA Secondary Structure Prediction with Pseudoknots. Discrete Applied Mathematics 104, 45–62 (2000)CrossRefGoogle Scholar
  3. 3.
    Ambros, V., Bartel, B., Bartel, D.P., Burge, C.B., Carrington, J.C., Chen, X., Dreyfuss, G., Eddy, S.R., Griffiths-Jones, S., Marshall, M., Matzke, M., Ruvkun, G., Tuschl, T.: A Uniform System for microRNA Annotation. RNA 9(3), 277–279 (2003)CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    van Batenburg, F.H.D., Gultyaev, A.P., Pleij, C.W.A., Ng, J., Oliehoek, J.: Pseudobase: a Database with RNA Pseudoknots. Nucl. Acids Res. 28(1), 201–204 (2000)CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Brazma, A., Jonassen, I., Vilo, J., Ukkonen, E.: Pattern Discovery in Biosequences. In: Honavar, V., Slutzki, G. (eds.) ICGI 1998. LNCS (LNAI), vol. 1433, pp. 255–270. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  6. 6.
    Buratti, E., Dhir, A., Lewandowska, M.A., Baralle, F.E.: RNA Structure is a Key Regulatory Element in Pathological ATM and CFTR Pseudoexon Inclusion Events. Nucl. Acids Res. 35(13), 4369–4383 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Cai, L., Malmberg, R., Wu, Y.: Stochastic Modeling of RNA Pseudoknotted Structures: a Grammatical Approach. Bioinformatics 19(supp. 1), 66–73 (2003)CrossRefGoogle Scholar
  8. 8.
    Dirks, R.M., Pierce, N.A.: A Partition Function Algorithm for Nucleic Acid Secondary Structure Including Pseudoknots. J. Comput. Chem. 24(13), 1664–1677 (2003)CrossRefPubMedGoogle Scholar
  9. 9.
    Gilbert, W.: The RNA World. Nature 319, 618 (1986)CrossRefGoogle Scholar
  10. 10.
    Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S.R., Bateman, A.: Rfam: Annotating Non-coding RNAs in Complete Genomes. Nucl. Acids Res. 33, D121–D124 (2005)CrossRefGoogle Scholar
  11. 11.
    Holbrook, S.R.: RNA Structure: the Long and the Short of it. Current Opinion in Structural Biology 15, 302–308 (2005)CrossRefPubMedGoogle Scholar
  12. 12.
    Joshi, A.K., Levy, L., Takahashi, M.: Tree Adjunct Grammars. Journal of Computer and System Sciences 10, 136–163 (1975)CrossRefGoogle Scholar
  13. 13.
    Laxminarayana, J.A., Nagaraja, G., Balaji, P.V.: Identification of Pseudoknots in RNA Secondary Structures: A Grammatical Inference Approach. In: Mukherjee, D.P., Pal, S. (eds.) Proceedings of 5th International Conference on Advances in Pattern Recognition (2003)Google Scholar
  14. 14.
    Laxminarayana, J.A., Nagaraja, G., Balaji, P.V.: Inference of a Subclass of Even Linear Languages and its Application to Pseudoknot Identification. In: Department of Computer Science and Engineering, Indian Institute of Technology, Bombay, India (manuscript, 2003)Google Scholar
  15. 15.
    Paillart, J.C., Skripkin, E., Ehresmann, B., Ehresmann, C., Marquet, R.: In vitro Evidence for a Long Range Pseudoknot in the 5’-Untranslated and Matrix Coding regions of HIV-1 Genomic RNA. J. Biol. Chem. 277, 5995–6004 (2002)CrossRefPubMedGoogle Scholar
  16. 16.
    Pedersen, J.S., Bejerano, G., Siepel, A., Rosenbloom, K., Lindblad-Toh, K., Lander, E.S., Kent, J., Miller, W., Haussler, D.: Identification and Classification of Conserved RNA Secondary Structures in the Human Genome. Public Library of Science. Computational Biology 2(4), 33 (2006)Google Scholar
  17. 17.
    Rajasekaran, S.: Tree-Adjoining Language Parsing in o(n6) Time. SIAM Journal on Computing 25(4), 862–873 (1996)CrossRefGoogle Scholar
  18. 18.
    Reeder, J., Giegerich, R.: Design, Implementation and Evaluation of a Practical Pseudoknot Folding Algorithm Based on Thermodynamics. BMC Bioinformatics 5, 104 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Rivas, E., Eddy, S.: The Language of RNA: a Formal Grammar that Includes Pseudoknots. Bioinformatics 16(4), 334–340 (2000)CrossRefPubMedGoogle Scholar
  20. 20.
    Robertson, M.P., Igel, H., Baertsch, R., Haussler, D., Ares Jr., M., Scott, W.G.: The Structure of a Rigorously Conserved RNA Element within the SARS Virus Genome. Public Library of Science: Biology 3(1), 5 (2004)Google Scholar
  21. 21.
    Sakakibara, Y., Brown, M., Hughey, R., Mian, I.S., Sjolander, K., Underwood, R.C., Haussler, D.: Stochastic Context-Free Grammars for tRNA Modeling. Nucl. Acids Res. 22, 5112–5120 (1994)CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Sakakibara, Y.: Grammatical Inference in Bioinformatics. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1051–1062 (2005)CrossRefPubMedGoogle Scholar
  23. 23.
    Searls, D.: The Linguistics of DNA. Am. Scient. 80, 579–591 (1992)Google Scholar
  24. 24.
    Takakura, T., Asakawa, H., Seki, S., Kobayashi, S.: Efficient Tree Grammar Modeling of RNA Secondary Structures from Alignment Data. In: Proceedings of posters of RECOMB 2005, pp. 339–340 (2005)Google Scholar
  25. 25.
    Tanaka, Y., Hori, T., Tagaya, M., Sakamoto, T., Kurihara, Y., Katahira, M., Uesugi, S.: Imino Proton NMR Analysis of HDV Ribozymes: Nested Double Pseudoknot Structure and Mg2+ Ion-Binding Site Close to the Catalytic Core in Solution. Nucl. Acids Res. 30, 766–774 (2002)CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Uemura, Y., Hasegawa, A., Kobayashi, S., Yokomori, T.: Tree Adjoining Grammars for RNA Structure Prediction. Theoretical Computer Science 210(2), 277–303 (1999)CrossRefGoogle Scholar
  27. 27.
    Vijay-Shanker, K., Joshi, A.K.: Some Computational Properties of Tree Adjoining Grammars. In: 23 rd Meeting of the Association for Computational Linguistics, pp. 82–93 (1985)Google Scholar
  28. 28.
    Williams, K.P., Bartel, D.P.: The tmRNA Website. Nucl. Acids Res. 26(1), 163–165 (1998)CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Williams, K.P.: The tmRNA Website: Invasion by an Intron. Nucl. Acids Res. 30(1), 179–182 (2002)CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Sahar Al Seesi
    • 1
  • Sanguthevar Rajasekaran
    • 1
  • Reda Ammar
    • 1
  1. 1.Computer Science and Engineering DepartmentUniversity of ConnecticutUSA

Personalised recommendations