Advertisement

Linear-Time Construction of Suffix Arrays

Extended Abstract
  • Dong Kyue Kim
  • Jeong Seop Sim
  • Heejin Park
  • Kunsoo Park
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2676)

Abstract

The time complexity of suffix tree construction has been shown to be equivalent to that of sorting: O(n) for a constant-size alphabet or an integer alphabet and O(n log n) for a general alphabet. However, previous algorithms for constructing suffix arrays have the time complexity of O(n log n) even for a constant-size alphabet.

In this paper we present a linear-time algorithm to construct suffix arrays for integer alphabets, which do not use suffix trees as intermediate data structures during its construction. Since the case of a constant-size alphabet can be subsumed in that of an integer alphabet, our result implies that the time complexity of directly constructing suffix arrays matches that of constructing suffix trees.

Keywords

Equivalence Class Tree Construction Suffix Tree Limit Stage Couple Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    M. Bender and M. Farach-Colton, The LCA Problem Revisited, In Proceedings of LATIN 2000, LNCS 1776, 88–94, 2000.CrossRefGoogle Scholar
  2. 2.
    O. Berkman and U. Vishkin, Recursive star-tree parallel data structure, SIAM J. Comput. 22 (1993), 221–242.zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    A. Blumer, J. Blumer, D. Haussler, A. Ehrenfeucht, M. T. Chen and J. Seiferas, The smallest automaton recognizing the subwords of a text, Theoret. Comput. Sci. 40 (1985), 31–55.zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    S. Burkhardt and J. Kärkkäinen, Fast lightweight suffix array construction and checking, Accepted to Symp. Combinatorial Pattern Matching (2003).Google Scholar
  5. 5.
    M. Crochemore, An optimal algorithm for computing the repetitions in a word, Inform. Processing Letters 12 (1981), 244–250.zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    M. Farach, Optimal suffix tree construction with large alphabets, IEEE Symp. Found. Computer Science (1997), 137–143.Google Scholar
  7. 7.
    M. Farach-Colton, P. Ferragina and S. Muthukrishnan, On the sorting-complexity of suffix tree construction, J. Assoc. Comput. Mach. 47 (2000), 987–1011.zbMATHMathSciNetGoogle Scholar
  8. 8.
    M. Farach and S. Muthukrishnan, Optimal logarithmic time randomized suffix tree construction, Int. Colloq. Automata Languages and Programming (1996), 550–561.Google Scholar
  9. 9.
    P. Ferragina and G. Manzini, Opportunistic data structures with applications, IEEE Symp. Found. Computer Science (2001), 390–398.Google Scholar
  10. 10.
    H.N. Gabow, J.L. Bentley, and R.E. Tarjan, Scaling and Related Techniques for Geometry Problems, ACM Symp. Theory of Computing (1984), 135–143.Google Scholar
  11. 11.
    G. Gonnet, R. Baeza-Yates, and T. Snider, New indices for text: Pat trees and pat arrays. In W. B. Frakes and R. A. Baeza-Yates, editors, Information Retrieval: Data Structures & Algorithms, Prentice Hall (1992), 66–82.Google Scholar
  12. 12.
    D. Gusfield, An “Increment-by-one” approach to suffix arrays and trees, manuscript 1990.Google Scholar
  13. 13.
    R. Grossi and J.S. Vitter, Compressed suffix arrays and suffix trees with applications to text indexing and string matching, ACM Symp. Theory of Computing (2000), 397–406.Google Scholar
  14. 14.
    D. Harel and R.E. Tarjan. Fast algorithms for finding nearest common ancestors, SIAM J. Comput. 13 (1984), 338–355.zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    R. Hariharan, Optimal parallel suffix tree construction, J. Comput. Syst. Sci. 55 (1997), 44–69.zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    J. Kärkkäinen and P. Sanders, Simpler linear work suffix array construction, Accepted to Int. Colloq. Automata Languages and Programming (2003).Google Scholar
  17. 17.
    P. Ko and S. Aluru, Space-efficient linear time construction of suffix arrays, Accepted to Symp. Combinatorial Pattern Matching (2003).Google Scholar
  18. 18.
    U. Manber and G. Myers, Suffix arrays: A new method for on-line string searches, SIAM J. Comput. 22 (1993), 935–938.zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    E.M. McCreight, A space-economical suffix tree construction algorithm, J. Assoc. Comput. Mach. 23 (1976), 262–272.zbMATHMathSciNetGoogle Scholar
  20. 20.
    J. I. Munro, V. Raman and S. Srinivasa Rao Space Efficient Suffix Trees, FST & TCS 18, in Lecture Notes in Computer Science, (Springer-Verlag), Dec. 1998.Google Scholar
  21. 21.
    K. Sadakane, Succinct representation of lcp information and improvement in the compressed suffix arrays, ACM-SIAM Symp. on Discrete Algorithms (2002), 225–232.Google Scholar
  22. 22.
    S.C. Sahinalp and U. Vishkin, Symmetry breaking for suffix tree construction, IEEE Symp. Found. Computer Science (1994), 300–309.Google Scholar
  23. 23.
    B. Schieber and U. Vishkin, On finding lowest common ancestors: simplification and parallelization, SIAM J. Comput. 17, (1988), 1253–1262.zbMATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    E. Ukkonen, On-line construction of suffix trees, Algorithmica 14 (1995), 249–260.zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    J. Vuillemin, A unifying look at data structures, Comm. ACM Vol. 24, (1980), 229–239.CrossRefMathSciNetGoogle Scholar
  26. 26.
    P. Weiner, Linear pattern matching algorithms, Proc. 14th IEEE Symp. Switching and Automata Theory (1973), 1–11.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Dong Kyue Kim
    • 1
  • Jeong Seop Sim
    • 2
  • Heejin Park
    • 3
  • Kunsoo Park
    • 3
  1. 1.School of Electrical and Computer EngineeringPusan National UniversityPusan
  2. 2.Electronics and Telecommunications Research InstituteDaejeonKorea
  3. 3.School of Computer Science and EngineeringSeoul National UniversitySeoul

Personalised recommendations