Advertisement

A New Compressed Suffix Tree Supporting Fast Search and Its Construction Algorithm Using Optimal Working Space

  • Dong Kyue Kim
  • Heejin Park
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3537)

Abstract

The compressed suffix array and the compressed suffix tree for a given string S are full-text index data structures occupying O(nlog|Σ|) bits where n is the length of S and Σ is the alphabet from which symbols of S are drawn. When they were first introduced, they were constructed from suffix arrays and suffix trees, which implies they were not constructed in optimal O(nlog|Σ|)-bit working space. Recently, several methods were developed for constructing compressed suffix arrays and compressed suffix trees in optimal working space. By these methods, one can construct compressed suffix trees supporting the pattern search in O(m′ |Σ|) time where m′ = m log ε n, m is the length of a pattern, and log ε n is the time to find the ith smallest suffix of S from the compressed suffix array for any fixed 0 < ε ≤ 1. However, compressed suffix trees supporting the pattern search in O(m′ log|Σ| ) time are not constructed by these methods.

In this paper, we present a new compressed suffix tree supporting O(m′ log|Σ|)-time pattern search and its construction algorithm using optimal working space. To obtain this result, we developed a new succinct representation of the suffix trees, which is different from the classic succinct representation of parentheses encoding of the suffix trees. Our succinct representation technique can be generally applicable to succinct representation of other search trees.

Keywords

Pattern Search Construction Algorithm Suffix Tree Sign Array Suffix Array 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. of Discrete Algorithms, 53–86 (2004)Google Scholar
  2. 2.
    Abouelhoda, M., Ohlebusch, E., Kurtz, S.: Optimal exact string matching based on suffix arrays. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 31–43. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)zbMATHGoogle Scholar
  4. 4.
    Elias, P.: Efficient storage and retrieval by content and address of static files. J. Assoc. Comput. Mach. 21, 246–260 (1974)zbMATHMathSciNetGoogle Scholar
  5. 5.
    Elias, P.: Universal codeword sets and representation of the integers. IEEE. Trans. Inform. Theory 21, 194–203 (1975)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. Assoc. Comput. Mach. 47, 987–1011 (2000)zbMATHMathSciNetGoogle Scholar
  7. 7.
    Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: FOCS, pp. 390–398 (2001)Google Scholar
  8. 8.
    Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: SODA, pp. 269–278 (2001)Google Scholar
  9. 9.
    Ferragina, P., Manzini, G., Makinen, V., Navarro, G.: An Alphabet-Friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Gonnet, G., Baeza-Yates, R., Snider, T.: New indices for text: Pat trees and pat arrays. In: Frakes, W.B., Baeza-Yates, R.A. (eds.) Information Retrieval: Data Structures & Algorithms, pp. 66–82. Prentice Hall, Englewood Cliffs (1992)Google Scholar
  11. 11.
    Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: SODA, pp. 841–850 (2003)Google Scholar
  12. 12.
    Grossi, R., Gupta, A., Vitter, J.S.: When indexing equals compression: Experiments with compressing suffix arrays and applications. In: SODA (2004)Google Scholar
  13. 13.
    Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: STOC, pp. 397–406 (2000)Google Scholar
  14. 14.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge Univ. Press, Cambridge (1997)zbMATHCrossRefGoogle Scholar
  15. 15.
    Hon, W.K., Sadakane, K.: Space-economical algorithms for finding maximal unique matches. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 144–152. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  16. 16.
    Hon, W.K., Sadakane, K., Sung, W.K.: Breaking a time-and-space barrier in constructing full-text indices. In: FOCS, pp. 251–260 (2003)Google Scholar
  17. 17.
    Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longestcommon- prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  18. 18.
    Kärkkäinen, J., Sanders, P.: Simpler linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  19. 19.
    Kim, D.K., Jo, J., Park, H.: A fast algorithm for constructing suffix arrays for fixed-size alphabets. In: Ribeiro, C.C., Martins, S.L. (eds.) WEA 2004. LNCS, vol. 3059, pp. 301–314. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  20. 20.
    Kim, D.K., Jeon, J.E., Park, H.: An efficient index data structre with the capabilities of suffix trees and suffix arrays for alphabets of non-negligible size. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 138–149. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  21. 21.
    Kim, D.K., Kim, M., Park, H.: Linearized suffix tree: an efficient index data structre with the capabilities of suffix trees and suffix arrays (manuscript, 2004)Google Scholar
  22. 22.
    Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 186–199. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  23. 23.
    Ko, P., Aluru, S.: Space-efficient linear time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 200–210. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  24. 24.
    Lam, T.K., Sadakane, K., Sung, W.K., Yiu, S.M.: A space and time efficient algorithm for constructing compressed suffix arrays. In: Ibarra, O.H., Zhang, L. (eds.) COCOON 2002. LNCS, vol. 2387, pp. 401–410. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  25. 25.
    Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22, 935–938 (1993)zbMATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    McCreight, E.M.: A space-economical suffix tree construction algorithm. J. Assoc. Comput. Mach. 23, 262–272 (1976)zbMATHMathSciNetGoogle Scholar
  27. 27.
    Munro, J.I., Raman, V.: Succinct representation of balanced parentheses and static trees. SIAM J. on Comput. 31, 762–776 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  28. 28.
    Munro, J.I., Raman, V., Rao, S.S.: Space efficient suffix trees. J. of Algorithms 39, 205–222 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  29. 29.
    Sadakane, K.: Succinct representations of lcp Information and improvements in the compressed suffix arrays. In: SODA, pp. 225–232 (2002)Google Scholar
  30. 30.
    Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. J. of Algorithms 48, 294–313 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  31. 31.
    Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)zbMATHCrossRefMathSciNetGoogle Scholar
  32. 32.
    Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th IEEE Symp. Switching and Automata Theory, pp. 1–11 (1973)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Dong Kyue Kim
    • 1
  • Heejin Park
    • 2
  1. 1.School of Electrical and Computer EngineeringPusan National UniversityBusanSouth Korea
  2. 2.College of Information and CommunicationsHanyang UniversitySeoulSouth Korea

Personalised recommendations