Advertisement

Contracted Suffix Trees: A Simple and Dynamic Text Indexing Data Structure

  • Andrzej Ehrenfeucht
  • Ross M. McConnell
  • Sung-Whan Woo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5577)

Abstract

We address the problem of finding the locations of all instances of a string P in a text T, where of T is allowed to facilitate the queries. Previous data structures for this problem include the suffix tree, the suffix array, and the compact DAWG. We modify a data structure called a sequence tree, which was proposed by Coffman and Eve for hashing, and adapt it to the new problem. We can then produce a list of k occurrences of any string P in T in O(||P|| + k) time. Because of properties shared by suffixes of a text that are not shared by arbitrary hash keys, we can build the structure in O(||T||) time, which is much faster than Coffman and Eve’s algorithm. These bounds are as good as those for the suffix tree, suffix array, and the compact DAWG. The advantages are the elementary nature of some of the algorithms for constructing and using the data structure and the asymptotic bounds we can give for updating the data structure when the text is edited.

Keywords

Hash Table Suffix Tree Candidate Position Suffix Array Consecutive Block 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Weiner, P.: Linear pattern-matching algorithms. In: Proceedings of the 14th IEEE Annual Symposium on Switching and Automata Theory, pp. 1–11. Institute of Electrical Electronics Engineers, London (1973)CrossRefGoogle Scholar
  2. 2.
    Blumer, A., Blumer, J., Ehrenfeucht, D., Haussler, D., McConnell, R.: Complete inverted files for efficient text retrieval and analysis. Journal of the ACM 34, 578–595 (1987)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Manber, U., Myers, E.: Suffix arrays: a new method for on-line search. SIAM J. Comput. 22, 935–948 (1993)CrossRefMATHGoogle Scholar
  4. 4.
    Ferragina, P., Grossi, R., Montangero, M.: On updating suffix tree labels. Theor. Comput. Sci. 201(1-2), 249–262 (1998)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Salson, M., Lecroq, T., Lonard, M., Mouchard, L.: Dynamic burrows-wheeler transform. Theoretical Computer Science (accepted, 2009)Google Scholar
  6. 6.
    Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24, 530–536 (1978)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Coffman, E., Eve, J.: File structures using hashing functions. Communications of the ACM 13, 427–432 (1970)CrossRefMATHGoogle Scholar
  8. 8.
    Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms. McGraw-Hill, Boston (2001)MATHGoogle Scholar
  9. 9.
    Ehrenfeucht, A., McConnell, R.M.: String searching. In: Mehta, D., Sahni, S. (eds.) Handbook of Data Structures and Applications. CRC Press, Boca Raton (2005)Google Scholar
  10. 10.
    Tarjan, R.E.: Data structures and network algorithms. Society for Industrial and Applied Math., Philadelphia (1983)CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Andrzej Ehrenfeucht
    • 1
  • Ross M. McConnell
    • 2
  • Sung-Whan Woo
    • 2
  1. 1.Dept. of Computer ScienceUniversity of Colorado at BoulderBoulderUSA
  2. 2.Dept. of Computer ScienceColorado State UniversityFort CollinsUSA

Personalised recommendations