On compact directed acyclic word graphs

  • Maxime Crochemore
  • Renaud Vérin
Pattern Matching and Learning
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1261)


The Directed Acyclic Word Graph (DAWG) is a space-efficient data structure to treat and analyze repetitions in a text, especially in DNA genomic sequences. Here, we consider the Compact Directed Acyclic Word Graph of a word. We give the first direct algorithm to construct it. It runs in time linear in the length of the string on a fixed alphabet. Our implementation requires half the memory space used by DAWGs.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    A. Apostolico. The myriad virtues of subword trees. In A. Apostolico & Z. Galil, editor, Combinatorial Algorithms on Words., pages 85–95. Springer-Verlag, 1985.Google Scholar
  2. 2.
    A. Blumer, J. Blumer, A. Ehrenfeucht, D. Haussler, and R. McConnel. Linear size finite automata for the set of all subwords of a word: an outline of results. Bull. European Assoc. Theoret. Comput. Sci., 21:12–20, 1983.Google Scholar
  3. 3.
    A. Blumer, J. Blumer, D. Haussler, A. Ehrenfeucht, M.T. Chen, and J. Seiferas. The smallest automaton recognizing the subwords of a text. Theoret. Comput. Sci., 40:31–55, 1985.Google Scholar
  4. 4.
    A. Blumer, J. Blumer, D. Haussler, and R. McConnell. Complete inverted files for efficient text retrieval and analysis. Journal of the Association for Computing Machinery, 34(3):578–595, July 1987.Google Scholar
  5. 5.
    A. Blumer, D. Haussler, and A. Ehrenfeucht. Average sizes of suffix trees and dawgs. Discrete Applied Mathematics, 24:37–45, 1989.Google Scholar
  6. 6.
    B. Clift, D. Haussler, R. McDonnell, T.D. Schneider, and G.D. Stormo. Sequence landscapes. Nucleic Acids Research, 4(1):141–158, 1986.Google Scholar
  7. 7.
    M. Crochemore. Recherche linéaire d'un carré dans un mot. C. R. Acad. Sci. Paris Sér. I Math., 296:781–784, 1983.Google Scholar
  8. 8.
    M. Crochemore. Optimal factor tranducers. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, volume 12 of NATO Advanced Science Institutes, Series F, pages 31–44. Springer-Verlag, Berlin, 1985.Google Scholar
  9. 9.
    M. Crochemore. Transducers and repetitions. Theoret. Comput. Sci., 45(1):63–86, 1986.Google Scholar
  10. 10.
    M. Crochemore. Longest common factor of two words. In H. Ehrig, R. Kowalski, G. Levi, and U. Montanari, editors, TAPSOFT, number 249 in Lecture Notes in Computer Science, pages 26–36. Springer-Verlag, Berlin, 1987.Google Scholar
  11. 11.
    M. Crochemore and C. Hancart. Automata for matching patterns. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages. Springer-Verlag, 1997. to appear.Google Scholar
  12. 12.
    M. Crochemore and W. Rytter. Text Algorithms, chapter 5–6, pages 73–130. Oxford University Press, New York, 1994.Google Scholar
  13. 13.
    M. Farach. Optimal suffix tree construction with large alphabets. manuscript, October 1996.Google Scholar
  14. 14.
    R. W. Irving. Suffix binary search trees. Technical report TR-1995-7, Computing Science Department, University of Glasgow, April 1995.Google Scholar
  15. 15.
    J. Karkkainen. Suffix cactus: a cross between suffix tree and suffix array. Combinatorial Pattern Matching, 937:191–204, July 1995.Google Scholar
  16. 16.
    C. Lefevre and J-E. Ikeda. The position end-set tree: A small automaton for word recognition in biological sequences. CABIOS, 9(3):343–348, 1993.Google Scholar
  17. 17.
    U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935–948, Oct. 1993.Google Scholar
  18. 18.
    E. McCreight. A space-economical suffix tree construction algorithm. Journal of the ACM, 23(2):262–272, Apr. 1976.Google Scholar
  19. 19.
    G. A. Stephen. String searching algorithms. World Scientific Press, 1994.Google Scholar
  20. 20.
    E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14:249–260, 1995.Google Scholar
  21. 21.
    E. Ukkonen and D. Wood. Approximate string matching with suffix automata. Algorithmica, 10(5):353–364, 1993.Google Scholar
  22. 22.
    P. Weiner. Linear pattern matching algorithm. In 14th Annual IEEE Symposium on Switching and Automata Theory, pages 1–11, Washington, DC, 1973.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Maxime Crochemore
    • 1
  • Renaud Vérin
    • 1
  1. 1.Institut Gaspard MongeUniversité de Marne-La-ValléeNoisy-Le-Grand

Personalised recommendations