Advertisement

Finding Common Motifs with Gaps Using Finite Automata

  • Pavlos Antoniou
  • Jan Holub
  • Costas S. Iliopoulos
  • Bořivoj Melichar
  • Pierre Peterlongo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4094)

Abstract

We present an algorithm that uses finite automata to find the common motifs with gaps occurring in all strings belonging to a finite set S = {S 1,S 2,...,S r }. In order to find these common motifs we must first identify the factors that exist in each string. Therefore the algorithm begins by constructing a factor automaton for each string S i . To find the common factors of all the strings, the algorithm needs to gather all the factors from the strings together in one data structure and this is achieved by computing an automaton that accepts the union of the above-mentioned automata. Using this automaton we are able to create a new factor alphabet. Based on this factor alphabet a finite automaton is created for each string S i that accepts sequences of all non overlapping factors residing in each string. The intersection of the latter automata produces the finite automaton which accepts all the common subsequences with gaps over the factor alphabet that are present in all the strings of the set S = {S 1,S 2,...,S r }. These common subsequences are the common motifs of the strings.

Keywords

Space Complexity Finite Automaton String Match Transition Diagram Common Motif 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Charras, C., Lecroq, T.: Exact string matching algorithms (2004)Google Scholar
  2. 2.
    Crawford, T., Iliopoulos, C.S., Raman, R.: String matching techniques for musical similarity and melodic recognition. Computing in Musicology 11, 73–100 (1998)Google Scholar
  3. 3.
    Crochemore, M., Vérin, R.: Direct construction of compact directed acyclic word graphs. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 116–129. Springer, Heidelberg (1997)Google Scholar
  4. 4.
    Crochemore, M., Hancart, C.: Automata for matching patterns. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, Linear Modeling: Background and Application, ch. 9, vol. 2, pp. 399–462. Springer, Heidelberg (1997)Google Scholar
  5. 5.
    Crochemore, M., Rytter, W.: Text algorithms. Oxford University Press, Inc., New York (1994)MATHGoogle Scholar
  6. 6.
    Holub, J., Melichar, B.: Approximate string matching using factor automata. Theor. Comput. Sci. 249(2), 305–311 (2000)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Iliopoulos, C.S., McHugh, J., Peterlongo, P., Pisanti, N., Rytter, W., Sagot, M.: A first approach to finding common motifs with gaps. International Journal of Foundations of Computer Science (2004)Google Scholar
  8. 8.
    Leung, H.C.M.: Finding motifs with insufficient number of strong binding sites. Journal of Computational Biology 12(6), 686–701 (2005)CrossRefGoogle Scholar
  9. 9.
    Skiena, S.S.: The algorithm design manual. Springer, New York (1998)Google Scholar
  10. 10.
    Baker, M.E., Bailey, T.L., Elkan, C.P.: An artificial intelligence approach to motif discovery in protein sequences: Application to steroid dehydrogenases. The Journal of Steroid Biochemistry and Molecular Biology 62(1), 29–44 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Pavlos Antoniou
    • 1
  • Jan Holub
    • 2
  • Costas S. Iliopoulos
    • 1
  • Bořivoj Melichar
    • 2
  • Pierre Peterlongo
    • 3
  1. 1.Dept. of Computer ScienceKing’s College LondonEnglandUK
  2. 2.Department of Computer Science and EngineeringCzech Technical University in PraguePrague 2Czech Republic
  3. 3.Institut Gaspard-MongeUniversité de Marne-la-ValléeMarne-la-ValléeFrance

Personalised recommendations