Skip to main content

Towards Real-Time Suffix Tree Construction

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3772))

Abstract

The quest for a real-time suffix tree construction algorithm is over three decades old. To date there is no convincing understandable solution to this problem. This paper makes a step in this direction by constructing a suffix tree online in time O(log n) per every single input symbol. Clearly, it is impossible to achieve better than O(log n) time per symbol in the comparison model, therefore no true real time algorithm can exist for infinite alphabets. Nevertheless, the best that can be hoped for is that the construction time for every symbol does not exceed O(log n) (as opposed to an amortized O(log n) time per symbol, achieved by current known algorithms). To our knowledge, our algorithm is the first that spends in the worst caseO(log n) per every single input symbol.

We also provide a simple algorithm that constructs online an indexing structure (the BIS) in time O(log n) per input symbol, where n is the number of text symbols input thus far. This structure and fast LCP (Longest Common Prefix) queries on it, provide the backbone for the suffix tree construction. Together, our two data structures provide a searching algorithm for a pattern of length m whose time is \(O(min(m {\rm log} |{\it \Sigma}|,m + {\rm log} n) + tocc)\), where tocc is the number of occurrences of the pattern.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adelson-Velskii, G.M., Landis, E.M.: An algorithm for the organizaton of information. Soviet Math. Doklady 3, 1259–1263 (1962)

    Google Scholar 

  2. Bayer, R.: Symetric Binary B-trees: Data structure and maintenance algorithms. Acta Informatica 1, 290–306 (1972)

    Article  MATH  MathSciNet  Google Scholar 

  3. Bayer, R., McCreight, E.M.: Organization and maintenance of large ordered indexes. Acta Informatica 1(3), 173–189 (1972)

    Article  Google Scholar 

  4. Bender, M., Cole, R., Demaine, E., Farach-Colton, M., Zito, J.: Two simplified algorithms for maintaining order in a list. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 152–164. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  5. Cole, R., Hariharan, R.: Dynamic lca queries in trees. In: Proc. 10th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 235–244 (1999)

    Google Scholar 

  6. Dietz, P.F., Sleator, D.D.: Two algorithms for maintaining order in a list. In: Proc. 19th ACM Symposium on Theory of Computing (STOC), pp. 365–372 (1987)

    Google Scholar 

  7. Farach, M.: Optimal suffix tree construction with large alphabets. In: Proc. 38th IEEE Symposium on Foundations of Computer Science, pp. 137–143 (1997)

    Google Scholar 

  8. Franceschini, G., Grossi, R.: A general technique for managing strings in comparison-driven data structures. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 606–617. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  9. Grossi, R., Italiano, G.F.: Efficient techniques for maintaining multidimensional keys in linked data structures. In: Wiedermann, J., Van Emde Boas, P., Nielsen, M. (eds.) ICALP 1999. LNCS, vol. 1644, pp. 372–381. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  10. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  11. Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  12. Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. In: Proc. 1st ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 319–327 (1990)

    Google Scholar 

  13. McCreight, E.M.: A space-economical suffix tree construction algorithm. J. of the ACM 23, 262–272 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  14. Rauhe, T., Alstrup, S., Brodal, G.S.: Pattern matching in dynamic texts. In: Proc. 11th ACM-SIAM Symposium on Discrete algorithms (SODA), pp. 819–828 (2000)

    Google Scholar 

  15. Sahinalp, S.C., Vishkin, U.: Efficient approximate and dynamic matching of patterns using a labeling paradigm. In: Proc. 37th FOCS, pp. 320–328 (1996)

    Google Scholar 

  16. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  17. Weiner, P.: Linear pattern matching algorithm. In: Proc. 14 IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Amir, A., Kopelowitz, T., Lewenstein, M., Lewenstein, N. (2005). Towards Real-Time Suffix Tree Construction. In: Consens, M., Navarro, G. (eds) String Processing and Information Retrieval. SPIRE 2005. Lecture Notes in Computer Science, vol 3772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575832_9

Download citation

  • DOI: https://doi.org/10.1007/11575832_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29740-6

  • Online ISBN: 978-3-540-32241-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics