Towards Real-Time Suffix Tree Construction

  • Amihood Amir
  • Tsvi Kopelowitz
  • Moshe Lewenstein
  • Noa Lewenstein
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3772)

Abstract

The quest for a real-time suffix tree construction algorithm is over three decades old. To date there is no convincing understandable solution to this problem. This paper makes a step in this direction by constructing a suffix tree online in time O(log n) per every single input symbol. Clearly, it is impossible to achieve better than O(log n) time per symbol in the comparison model, therefore no true real time algorithm can exist for infinite alphabets. Nevertheless, the best that can be hoped for is that the construction time for every symbol does not exceed O(log n) (as opposed to an amortized O(log n) time per symbol, achieved by current known algorithms). To our knowledge, our algorithm is the first that spends in the worst caseO(log n) per every single input symbol.

We also provide a simple algorithm that constructs online an indexing structure (the BIS) in time O(log n) per input symbol, where n is the number of text symbols input thus far. This structure and fast LCP (Longest Common Prefix) queries on it, provide the backbone for the suffix tree construction. Together, our two data structures provide a searching algorithm for a pattern of length m whose time is \(O(min(m {\rm log} |{\it \Sigma}|,m + {\rm log} n) + tocc)\), where tocc is the number of occurrences of the pattern.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adelson-Velskii, G.M., Landis, E.M.: An algorithm for the organizaton of information. Soviet Math. Doklady 3, 1259–1263 (1962)Google Scholar
  2. 2.
    Bayer, R.: Symetric Binary B-trees: Data structure and maintenance algorithms. Acta Informatica 1, 290–306 (1972)MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Bayer, R., McCreight, E.M.: Organization and maintenance of large ordered indexes. Acta Informatica 1(3), 173–189 (1972)CrossRefGoogle Scholar
  4. 4.
    Bender, M., Cole, R., Demaine, E., Farach-Colton, M., Zito, J.: Two simplified algorithms for maintaining order in a list. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 152–164. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  5. 5.
    Cole, R., Hariharan, R.: Dynamic lca queries in trees. In: Proc. 10th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 235–244 (1999)Google Scholar
  6. 6.
    Dietz, P.F., Sleator, D.D.: Two algorithms for maintaining order in a list. In: Proc. 19th ACM Symposium on Theory of Computing (STOC), pp. 365–372 (1987)Google Scholar
  7. 7.
    Farach, M.: Optimal suffix tree construction with large alphabets. In: Proc. 38th IEEE Symposium on Foundations of Computer Science, pp. 137–143 (1997)Google Scholar
  8. 8.
    Franceschini, G., Grossi, R.: A general technique for managing strings in comparison-driven data structures. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 606–617. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Grossi, R., Italiano, G.F.: Efficient techniques for maintaining multidimensional keys in linked data structures. In: Wiedermann, J., Van Emde Boas, P., Nielsen, M. (eds.) ICALP 1999. LNCS, vol. 1644, pp. 372–381. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  10. 10.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)MATHCrossRefGoogle Scholar
  11. 11.
    Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  12. 12.
    Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. In: Proc. 1st ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 319–327 (1990)Google Scholar
  13. 13.
    McCreight, E.M.: A space-economical suffix tree construction algorithm. J. of the ACM 23, 262–272 (1976)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Rauhe, T., Alstrup, S., Brodal, G.S.: Pattern matching in dynamic texts. In: Proc. 11th ACM-SIAM Symposium on Discrete algorithms (SODA), pp. 819–828 (2000)Google Scholar
  15. 15.
    Sahinalp, S.C., Vishkin, U.: Efficient approximate and dynamic matching of patterns using a labeling paradigm. In: Proc. 37th FOCS, pp. 320–328 (1996)Google Scholar
  16. 16.
    Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Weiner, P.: Linear pattern matching algorithm. In: Proc. 14 IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Amihood Amir
    • 1
    • 2
  • Tsvi Kopelowitz
    • 1
  • Moshe Lewenstein
    • 1
  • Noa Lewenstein
    • 3
  1. 1.Department of Computer ScienceBar-Ilan UniversityRamat-GanIsrael
  2. 2.College of ComputingGeorgia TechAtlanta
  3. 3.Department of Computer ScienceNetanya CollegeIsrael

Personalised recommendations