Advertisement

Logol: Expressive Pattern Matching in Sequences. Application to Ribosomal Frameshift Modeling

  • Catherine Belleannée
  • Olivier Sallou
  • Jacques Nicolas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8626)

Abstract

Most of the current practice of pattern matching tools is oriented towards finding efficient ways to compare sequences. This is useful but insufficient: as the knowledge and understanding of some functional or structural aspects of living systems improve, analysts in molecular biology progressively shift from mere classification tasks to modeling tasks. People need to be able to express global sequence architectures and check various hypotheses on the way their sequences are structured. It appears necessary to offer generic tools for this task, allowing to build more expressive models of biological sequence families, on the basis of their content and structure.

This article introduces Logol, a new application designed to achieve pattern matching in possibly large sequences with customized biological patterns. Logol consists in both a language for describing patterns, and the associated parser for effective pattern search in sequences (RNA, DNA or protein) with such patterns. The Logol language, based on an high level grammatical formalism, allows to express flexible patterns (with mispairings and indels) composed of both sequential elements (such as motifs) and structural elements (such as repeats or pseudoknots). Its expressive power is presented through an application using the main components of the language : the identification of -1 programmed ribosomal frameshifting (PRF) events in messenger RNA sequences.

Logol allows the design of sophisticated patterns, and their search in large nucleic or amino acid sequences. It is available on the GenOuest bioinformatics platform at http://logol.genouest.org. The core application is a command-line application, available for different operating systems. The Logol suite also includes interfaces, e.g. an interface for graphically drawing the pattern.

Keywords

Pattern Match Sequence Matcher Biological Sequence Reverse Complement Ribosomal Frameshift 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2(1), 53–86 (2004)CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Billoud, B., Kontic, M., Viari, A.: Palingol: a declarative programming language to describe nucleic acids’ secondary structures and to scan sequence database. Nucleic Acids Res. 24(8) (1996)Google Scholar
  3. 3.
    de Castro, E., Sigrist, C.J.A., et al.: Scanprosite: detection of prosite signature matches and prorule-associated functional and structural residues in proteins. Nucleic Acids Research 34(suppl. 2), 362–365 (2006)CrossRefGoogle Scholar
  4. 4.
    Dong, S., Searls, D.B.: Gene structure prediction by linguistic methods. Genomics 23(3), 540–551 (1994)CrossRefGoogle Scholar
  5. 5.
    Dsouza, M., Larsen, N., Overbeek, R.: Searching for patterns in genomic data. Trends in Genetics 13(12), 497–498 (1997)CrossRefGoogle Scholar
  6. 6.
    Eddy, S.: Rnabob: a program to search for rna secondary structure motifs in sequence databases (1996)Google Scholar
  7. 7.
    Firth, A.E., Bekaert, M., Baranov, P.V.: Computational resources for studying recoding. In: Atkins, J.F., Gesteland, R.F. (eds.) Recoding: Expansion of Decoding Rules Enriches Gene Expression, Nucleic Acids and Molecular Biology, vol. 24, pp. 435–461. Springer, New York (2010)Google Scholar
  8. 8.
    Forest, J.P.: Modélisation et détection automatique de sites de décalage de cadre en -1 dans les génomes eucaryotes. Ph.D. thesis, Université de Paris VI (2005)Google Scholar
  9. 9.
    Gattiker, A., Gasteiger, E., Bairoch, A.: Scanprosite: a reference implementation of a prosite scanning tool. Applied Bioinformatics 1(2), 107–108 (2002)Google Scholar
  10. 10.
    Graf, S., Strothmann, D., Kurtz, S., Steger, G.: HyPaLib: a Database of RNAs and RNA Structural Elements defined by Hybrid Patterns. Nucleic Acids Res. 29(1), 196–198 (2001)CrossRefGoogle Scholar
  11. 11.
    Jensen, K., Stephanopoulos, G., Rigoutsos, I.: Biogrep: A multi-threaded pattern matcher for large pattern sets (2002)Google Scholar
  12. 12.
    Joshi, A.K., Vijay-Shanker, K., Weir, D.: The convergence of midly context-sensitive grammars. In: Shieber, S.M., Wasow, T. (eds.) The Processing of Natural Language Structure, pp. 31–81. MIT Press, Bosto (1991)Google Scholar
  13. 13.
    Macke, T.J., Ecker, D.J., Gutell, R.R., Gautheret, D., Case, D.A., Sampath, R.: Rnamotif, an rna secondary structure definition and search algorithm. Nucleic Acids Research 29(22), 4724–4735 (2001)CrossRefGoogle Scholar
  14. 14.
    Meyer, F., Kurtz, S., et al.: Structator: fast index-based search for rna sequence-structure patterns. BMC Bioinformatics 12(1), 214 (2011)CrossRefGoogle Scholar
  15. 15.
    Nicolas, J., Durand, P., et al.: Suffix-tree analyser (stan): looking for nucleotidic and peptidic patterns in chromosomes. Bioinformatics 21(24), 4408–4410 (2005)CrossRefGoogle Scholar
  16. 16.
    Pesole, G., Liuni, S., DSouza, M.: Patsearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance. Bioinformatics 16(5), 439–450 (2000)CrossRefGoogle Scholar
  17. 17.
    Rocheteau, A., Belleannée, C.: Recherche d’éléments structurés dans les génomes par modèles logiques. Rapport de recherche PI-1994, Dyliss - Inria - Irisa (April 2012), http://hal.inria.fr/hal-00684388
  18. 18.
    Searls, D.B.: String variable grammar: A logic grammar formalism for the biological language of DNA. Journal of Logic Programming 24(1&2), 73–102 (1995)CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    Searls, D.B., Dong, S.: A syntactic pattern recognition system for DNA sequences. In: Cantor, C.R., Lim, H.A., Fickett, J., Robbins, R.J. (eds.) Proceedings 2nd International Conference on Bioinformatics, Supercomputing, and Complex Genome Analysis, pp. 89–101. World Scientific, Singapore (1993)CrossRefGoogle Scholar
  20. 20.
    Strothmann, D., Gräf, S.A., Kurtz, S., Steger, G.: The syntax and semantics of a language for describing complex patterns in biological sequences. Tech. rep., Universität Bielefeld, Arbeitsgruppe Praktische Informatik (August 2000)Google Scholar
  21. 21.
    Theis, C., Reeder, J., Giegerich, R.: Knotinframe: prediction of -1 ribosomal frameshift events. Nucleic Acids Research 36(18), 6013–6020 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Catherine Belleannée
    • 1
  • Olivier Sallou
    • 1
  • Jacques Nicolas
    • 1
  1. 1.Irisa/Inria/Université de Rennes1RennesFrance

Personalised recommendations