An Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays for Alphabets of Non-negligible Size

Kim, Dong Kyue; Jeon, Jeong Eun; Park, Heejin

doi:10.1007/978-3-540-30213-1_22

Dong Kyue Kim¹⁸,
Jeong Eun Jeon¹⁸ &
Heejin Park¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3246))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

736 Accesses
7 Citations

Abstract

The suffix tree and the suffix array are fundamental full-text index data structures and many algorithms have been developed on them to solve problems occurring in string processing and information retrieval. Some problems are solved more efficiently using the suffix tree and others are solved more efficiently using the suffix array. We consider the index data structure with the capabilities of both the suffix tree and the suffix array without requiring much space. For the alphabets whose size is negligible, Abouelhoda et al. developed the enhance suffix array for this purpose. It consists of the suffix array and the child table. The child table stores the parent-child relationship between the nodes in the suffix tree so that every algorithm developed on the suffix tree can be run with a small and systematic modification. Since the child table consumes moderate space and is constructed very fast, the enhanced suffix array is almost as time/space-efficient as the suffix array. However, when the size of the alphabet is not negligible, the enhance suffix array loses the capabilities of the suffix tree. The pattern search in the enhanced suffix array takes O(m∣Σ∣) time where m is the length of the pattern and Σ is the alphabet, while the pattern search in the suffix tree takes O(mlog∣Σ∣) time.

In this paper, we improve the enhanced suffix array to have the capabilities of the suffix tree and the suffix array even when the size of the alphabet is not negligible. We do this by presenting a new child table, which improves the enhanced suffix array to support the pattern search in O(mlog∣Σ∣) time. Our index data structure is almost as time/space-efficient as the enhanced suffix array. It consumes the same space as the enhanced suffix array and its construction time is slightly slower (< 4%) than that of the enhanced suffix array. In a different point of view, it can be considered the first practical one facilitating the capabilities of suffix trees when the size of the alphabet is not negligible because the suffix tree supporting O(mlog∣Σ∣)-time pattern search is not easy to implement and thus it is rarely used in practice.

This research was supported by the Program for the Training of Graduate Students in Regional Innovation which was conducted by the Ministry of Commerce, Industry and Energy of the Korean Government.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. of Discrete Algorithms, 53–86 (2004)
Google Scholar
Abouelhoda, M., Ohlebusch, E., Kurtz, S.: Optimal exact string matching based on suffix arrays. In: Symp. on String Processing and Information Retrieval, pp. 31–43 (2002)
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
MATH Google Scholar
Farach, M.: Optimal suffix tree construction with large alphabets. In: IEEE Symp. Found. Computer Science, pp. 137–143 (1997)
Google Scholar
Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. Assoc. Comput. Mach. 47, 987–1011 (2000)
MATH MathSciNet Google Scholar
Gonnet, G., Baeza-Yates, R., Snider, T.: New indices for text: Pat trees and pat arrays. In: Frakes, W.B., Baeza-Yates, R.A. (eds.) Information Retrieval: Data Structures & Algorithms, pp. 66–82. Prentice Hall, Englewood Cliffs (1992)
Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge Univ. Press, Cambridge (1997)
Book MATH Google Scholar
Larsson, N.J., Sadakane, K.: Faster Suffix Sorting, Technical Report, number LU-CS-TR:99-214, Department of Computer Science, Lund University, Sweden (1999)
Google Scholar
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longestcommon- prefix computation in suffix arrays and its applications. In: Symp. Combinatorial Pattern Matching, pp. 181–192 (2001)
Google Scholar
Kärkkäinen, J., Sanders, P.: Simpler linear work suffix array construction. Int. Colloq. Automata Languages and Programming, 943–955 (2003)
Google Scholar
Kim, D.K., Jo, J., Park, H.: A fast algorithm for constructing suffix arrays for fixed-size alphabets. In: Workshop on Efficient and Experimental Algorithms, pp. 301–314 (2004)
Google Scholar
Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Symp. Combinatorial Pattern Matching, pp. 186–199 (2003)
Google Scholar
Ko, P., Aluru, S.: Space-efficient linear time construction of suffix arrays. In: Symp. Combinatorial Pattern Matching, pp. 200–210 (2003)
Google Scholar
Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22, 935–938 (1993)
Article MATH MathSciNet Google Scholar
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. Assoc. Comput. Mach. 23, 262–272 (1976)
MATH MathSciNet Google Scholar
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)
Article MATH MathSciNet Google Scholar
Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th IEEE Symp. Switching and Automata Theory, pp. 1–11 (1973)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, Pusan National University, Busan, 609-735, South Korea
Dong Kyue Kim & Jeong Eun Jeon
College of Information and Communications, Hanyang University, Seoul, 133-791, South Korea
Heejin Park

Authors

Dong Kyue Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jeong Eun Jeon
View author publications
You can also search for this author in PubMed Google Scholar
Heejin Park
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Georgia Institute of Technology and Università di Padova,
Alberto Apostolico
Department of Information Engineering, University of Padova,
Massimo Melucci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, D.K., Jeon, J.E., Park, H. (2004). An Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays for Alphabets of Non-negligible Size. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-540-30213-1_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23210-0
Online ISBN: 978-3-540-30213-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics