Abstract
The time complexity of suffix tree construction has been shown to be equivalent to that of sorting: O(n) for a constant-size alphabet or an integer alphabet and O(n log n) for a general alphabet. However, previous algorithms for constructing suffix arrays have the time complexity of O(n log n) even for a constant-size alphabet.
In this paper we present a linear-time algorithm to construct suffix arrays for integer alphabets, which do not use suffix trees as intermediate data structures during its construction. Since the case of a constant-size alphabet can be subsumed in that of an integer alphabet, our result implies that the time complexity of directly constructing suffix arrays matches that of constructing suffix trees.
Supported by KOSEF grant R01-2002-000-00589-0.
Supported by BK21 Project and IMT2000 Project AB02.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Bender and M. Farach-Colton, The LCA Problem Revisited, In Proceedings of LATIN 2000, LNCS 1776, 88–94, 2000.
O. Berkman and U. Vishkin, Recursive star-tree parallel data structure, SIAM J. Comput. 22 (1993), 221–242.
A. Blumer, J. Blumer, D. Haussler, A. Ehrenfeucht, M. T. Chen and J. Seiferas, The smallest automaton recognizing the subwords of a text, Theoret. Comput. Sci. 40 (1985), 31–55.
S. Burkhardt and J. Kärkkäinen, Fast lightweight suffix array construction and checking, Accepted to Symp. Combinatorial Pattern Matching (2003).
M. Crochemore, An optimal algorithm for computing the repetitions in a word, Inform. Processing Letters 12 (1981), 244–250.
M. Farach, Optimal suffix tree construction with large alphabets, IEEE Symp. Found. Computer Science (1997), 137–143.
M. Farach-Colton, P. Ferragina and S. Muthukrishnan, On the sorting-complexity of suffix tree construction, J. Assoc. Comput. Mach. 47 (2000), 987–1011.
M. Farach and S. Muthukrishnan, Optimal logarithmic time randomized suffix tree construction, Int. Colloq. Automata Languages and Programming (1996), 550–561.
P. Ferragina and G. Manzini, Opportunistic data structures with applications, IEEE Symp. Found. Computer Science (2001), 390–398.
H.N. Gabow, J.L. Bentley, and R.E. Tarjan, Scaling and Related Techniques for Geometry Problems, ACM Symp. Theory of Computing (1984), 135–143.
G. Gonnet, R. Baeza-Yates, and T. Snider, New indices for text: Pat trees and pat arrays. In W. B. Frakes and R. A. Baeza-Yates, editors, Information Retrieval: Data Structures & Algorithms, Prentice Hall (1992), 66–82.
D. Gusfield, An “Increment-by-one” approach to suffix arrays and trees, manuscript 1990.
R. Grossi and J.S. Vitter, Compressed suffix arrays and suffix trees with applications to text indexing and string matching, ACM Symp. Theory of Computing (2000), 397–406.
D. Harel and R.E. Tarjan. Fast algorithms for finding nearest common ancestors, SIAM J. Comput. 13 (1984), 338–355.
R. Hariharan, Optimal parallel suffix tree construction, J. Comput. Syst. Sci. 55 (1997), 44–69.
J. Kärkkäinen and P. Sanders, Simpler linear work suffix array construction, Accepted to Int. Colloq. Automata Languages and Programming (2003).
P. Ko and S. Aluru, Space-efficient linear time construction of suffix arrays, Accepted to Symp. Combinatorial Pattern Matching (2003).
U. Manber and G. Myers, Suffix arrays: A new method for on-line string searches, SIAM J. Comput. 22 (1993), 935–938.
E.M. McCreight, A space-economical suffix tree construction algorithm, J. Assoc. Comput. Mach. 23 (1976), 262–272.
J. I. Munro, V. Raman and S. Srinivasa Rao Space Efficient Suffix Trees, FST & TCS 18, in Lecture Notes in Computer Science, (Springer-Verlag), Dec. 1998.
K. Sadakane, Succinct representation of lcp information and improvement in the compressed suffix arrays, ACM-SIAM Symp. on Discrete Algorithms (2002), 225–232.
S.C. Sahinalp and U. Vishkin, Symmetry breaking for suffix tree construction, IEEE Symp. Found. Computer Science (1994), 300–309.
B. Schieber and U. Vishkin, On finding lowest common ancestors: simplification and parallelization, SIAM J. Comput. 17, (1988), 1253–1262.
E. Ukkonen, On-line construction of suffix trees, Algorithmica 14 (1995), 249–260.
J. Vuillemin, A unifying look at data structures, Comm. ACM Vol. 24, (1980), 229–239.
P. Weiner, Linear pattern matching algorithms, Proc. 14th IEEE Symp. Switching and Automata Theory (1973), 1–11.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, D.K., Sim, J.S., Park, H., Park, K. (2003). Linear-Time Construction of Suffix Arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_14
Download citation
DOI: https://doi.org/10.1007/3-540-44888-8_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40311-1
Online ISBN: 978-3-540-44888-4
eBook Packages: Springer Book Archive