Abstract
The suffix tree and the suffix array are fundamental full-text index data structures and many algorithms have been developed on them to solve problems occurring in string processing and information retrieval. Some problems are solved more efficiently using the suffix tree and others are solved more efficiently using the suffix array. We consider the index data structure with the capabilities of both the suffix tree and the suffix array without requiring much space. For the alphabets whose size is negligible, Abouelhoda et al. developed the enhance suffix array for this purpose. It consists of the suffix array and the child table. The child table stores the parent-child relationship between the nodes in the suffix tree so that every algorithm developed on the suffix tree can be run with a small and systematic modification. Since the child table consumes moderate space and is constructed very fast, the enhanced suffix array is almost as time/space-efficient as the suffix array. However, when the size of the alphabet is not negligible, the enhance suffix array loses the capabilities of the suffix tree. The pattern search in the enhanced suffix array takes O(m∣Σ∣) time where m is the length of the pattern and Σ is the alphabet, while the pattern search in the suffix tree takes O(mlog∣Σ∣) time.
In this paper, we improve the enhanced suffix array to have the capabilities of the suffix tree and the suffix array even when the size of the alphabet is not negligible. We do this by presenting a new child table, which improves the enhanced suffix array to support the pattern search in O(mlog∣Σ∣) time. Our index data structure is almost as time/space-efficient as the enhanced suffix array. It consumes the same space as the enhanced suffix array and its construction time is slightly slower (< 4%) than that of the enhanced suffix array. In a different point of view, it can be considered the first practical one facilitating the capabilities of suffix trees when the size of the alphabet is not negligible because the suffix tree supporting O(mlog∣Σ∣)-time pattern search is not easy to implement and thus it is rarely used in practice.
This research was supported by the Program for the Training of Graduate Students in Regional Innovation which was conducted by the Ministry of Commerce, Industry and Energy of the Korean Government.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. of Discrete Algorithms, 53–86 (2004)
Abouelhoda, M., Ohlebusch, E., Kurtz, S.: Optimal exact string matching based on suffix arrays. In: Symp. on String Processing and Information Retrieval, pp. 31–43 (2002)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
Farach, M.: Optimal suffix tree construction with large alphabets. In: IEEE Symp. Found. Computer Science, pp. 137–143 (1997)
Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. Assoc. Comput. Mach. 47, 987–1011 (2000)
Gonnet, G., Baeza-Yates, R., Snider, T.: New indices for text: Pat trees and pat arrays. In: Frakes, W.B., Baeza-Yates, R.A. (eds.) Information Retrieval: Data Structures & Algorithms, pp. 66–82. Prentice Hall, Englewood Cliffs (1992)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge Univ. Press, Cambridge (1997)
Larsson, N.J., Sadakane, K.: Faster Suffix Sorting, Technical Report, number LU-CS-TR:99-214, Department of Computer Science, Lund University, Sweden (1999)
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longestcommon- prefix computation in suffix arrays and its applications. In: Symp. Combinatorial Pattern Matching, pp. 181–192 (2001)
Kärkkäinen, J., Sanders, P.: Simpler linear work suffix array construction. Int. Colloq. Automata Languages and Programming, 943–955 (2003)
Kim, D.K., Jo, J., Park, H.: A fast algorithm for constructing suffix arrays for fixed-size alphabets. In: Workshop on Efficient and Experimental Algorithms, pp. 301–314 (2004)
Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Symp. Combinatorial Pattern Matching, pp. 186–199 (2003)
Ko, P., Aluru, S.: Space-efficient linear time construction of suffix arrays. In: Symp. Combinatorial Pattern Matching, pp. 200–210 (2003)
Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22, 935–938 (1993)
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. Assoc. Comput. Mach. 23, 262–272 (1976)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)
Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th IEEE Symp. Switching and Automata Theory, pp. 1–11 (1973)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, D.K., Jeon, J.E., Park, H. (2004). An Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays for Alphabets of Non-negligible Size. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-30213-1_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23210-0
Online ISBN: 978-3-540-30213-1
eBook Packages: Springer Book Archive