Skip to main content

Online Suffix Tree Construction for Streaming Sequences

  • Conference paper

Part of the Communications in Computer and Information Science book series (CCIS,volume 6)

Abstract

In this study, we present an online suffix tree construction approach where multiple sequences are indexed by a single suffix tree. Due to the poor memory locality and high space consumption, online suffix tree construction on disk is a striving process. Even more, performance of the construction suffers when alphabet size is large. In order to overcome these difficulties, first, we present a space efficient node representation approach to be used in Ukkonen suffix tree construction algorithm. Next, we show that performance can be increased through incorporating semantic knowledge such as utilizing the frequently used letters of an alphabet. In particular, we estimate the frequently accessed nodes of the tree and introduce a sequence insertion strategy into the tree. As a result, we can speed up accessing to the frequently accessed nodes. Finally, we analyze the contribution of buffering strategies and page sizes on performance and perform detailed tests. We run a series of experimentation under various buffering strategies and page sizes. Experimental results showed that our approach outperforms existing ones.

Keywords

  • Suffix trees
  • sequence databases
  • time series indexing
  • poor memory locality

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms 2 (2004)

    Google Scholar 

  2. Bedathur, S., Haritsa, J.: Engineering a fast online persistent suffix tree construction. In: Proceedings of ICDE (2004)

    Google Scholar 

  3. Bieganski, J.R.P., Carlis, J.V.: Generalized suffix trees for biological sequence data: Application and implantation. In: Proc. of 27th HICSS. IEEE, Hawai (1994)

    Google Scholar 

  4. Cheung, C.-F., Yu, J.X., Lu, H.: Constructing suffix tree for gigabyte sequences with megabyte memory. IEEE Transactions on Knowledge and Data Engineering (2005)

    Google Scholar 

  5. Clifford, R., Sergot, M.J.: Distributed and paged suffix trees for large genetic databases. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676. Springer, Heidelberg (2003)

    CrossRef  Google Scholar 

  6. Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. The MIT Press, Boston (1989)

    MATH  Google Scholar 

  7. Farach, M., Ferragina, P., Muthukrishnan, S.: Overcoming the memory bottleneck in suffix tree construction. In: 39th Symp. on Foundations of Computer Science. IEEE Computer Society, Los Alamitos (1998)

    Google Scholar 

  8. Ferragina, P., Grossi, R., Montangero, M.: A note on updating suffix tree labels. Theoretical Computer Science (1998)

    Google Scholar 

  9. Folk, M., Riccardi, G., Zoellick, B.: File structures: an object-oriented approach with C++, 3rd edn. Addison-Wesley Longman Publishing, Amsterdam (1997)

    Google Scholar 

  10. Giegerich, R., Kurtz, S.: From Ukkonen to McCreight and Weiner: a unifying view of linear-time suffix tree construction. Algorithmica 19(3), 331–353 (1997)

    CrossRef  MathSciNet  MATH  Google Scholar 

  11. Gusfield, D.: Algorithms on strings, trees, and sequences Computer Science and Computational Biology. Cambridge Univ. Press, Cambridge (1997)

    CrossRef  MATH  Google Scholar 

  12. Huang, Y.-W., Yu, P.S.: Adaptive query processing for time-series data. In: Proceedings of KDD. ACM Press, New York (1999)

    Google Scholar 

  13. Hunt, E., Atkinson, M.P., Irving, R.W.: A database index to large biological sequences. In: 27th Int’l Conf. Very Large Data Bases. ACM Press, New York (2001)

    Google Scholar 

  14. Kurtz, S.: Reducing the space requirement of suffix trees. Software—Practice & Experience 29(13), 1149–1171 (1999)

    CrossRef  Google Scholar 

  15. Lemström, K.: String matching techniques for music retrieval, PhD thesis, University of Helsinki, Department of Computer Science (November 2000)

    Google Scholar 

  16. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing (1993)

    Google Scholar 

  17. Martinez, H.M.: An efficient method for indexing repeats in molecular sequences. Nucleic Acids Research (1983)

    Google Scholar 

  18. McCreight, E.M.: A Space-economical suffix tree construction algorithm. Journal of ACM 23 (1976)

    Google Scholar 

  19. Munro, J.I., Raman, V., Rao, S.: Space efficient suffix trees. J. of Algorithms 2 (2001)

    Google Scholar 

  20. Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys

    Google Scholar 

  21. http://www.nist.gov/dads/HTML/treetravrsl.html

  22. Phoophakdee, B., Zaki, M.: Genome-scale disk based suffix tree indexing. In: Proceedings of ACM SIGMOD (2007)

    Google Scholar 

  23. Sandeep, A., Akinapelli, S.: Online construction of search-friendly persistent suffix-tree layouts. M.Sc thesis, Indian Institute of Science Bangalore (July 2006)

    Google Scholar 

  24. Salzberg, B.: File Structures: An analytic approach. Prentice-Hall, Englewood Cliffs (1988)

    Google Scholar 

  25. Schürmann, K., Stoye, J.: Suffix tree construction and storage with limited main memory. unpublished technical report, Univ. Biefeld (2003)

    Google Scholar 

  26. Tian, Y., Tata, S., Hankins, R.A., Patel, J.M.: Practical methods for constructing suffix trees. The VLDB Journal (2005)

    Google Scholar 

  27. Ukkonen, E.: On-line construction of suffix-trees. Algorithmica (1995)

    Google Scholar 

  28. Weiner, P.: Linear pattern matching algorithm. In: Proc. of 14th IEEE Symp. On Switching and Automata Theory (1973)

    Google Scholar 

  29. Wong, S., Sung, W., Wong, L.: CPS-tree: A compact partitioned suffix tree for disk based indexing on large genome sequences. In: Proc. of IEEE ICDE, Istanbul (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ozcan, G., Alpkocak, A. (2008). Online Suffix Tree Construction for Streaming Sequences. In: Sarbazi-Azad, H., Parhami, B., Miremadi, SG., Hessabi, S. (eds) Advances in Computer Science and Engineering. CSICC 2008. Communications in Computer and Information Science, vol 6. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89985-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89985-3_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89984-6

  • Online ISBN: 978-3-540-89985-3

  • eBook Packages: Computer ScienceComputer Science (R0)