Skip to main content

Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing

  • Conference paper
  • First Online:
Combinatorial Algorithms (IWOCA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9843))

Included in the following conference series:

Abstract

We present a new data structure called the packed compact trie (packed c-trie) which stores a set S of k strings of total length n in \(n \log \sigma + O(k \log n)\) bits of space and supports fast pattern matching queries and updates, where \(\sigma \) is the alphabet size. Assume that \(\alpha = \log _\sigma n\) letters are packed in a single machine word on the standard word RAM model, and let f(kn) denote the query and update times of the dynamic predecessor/successor data structure of our choice which stores k integers from universe [1, n] in \(O(k \log n)\) bits of space. Then, given a string of length m, our packed c-tries support pattern matching queries and insert/delete operations in \(O(\frac{m}{\alpha } f(k,n))\) worst-case time and in \(O(\frac{m}{\alpha } + f(k,n))\) expected time. Our experiments show that our packed c-tries are faster than the standard compact tries (a.k.a. Patricia trees) on real data sets. We also discuss applications of our packed c-tries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The \(O(\log m)\) expected bound for insertion/deletion stated in [4] assumes that the prefix search for the string has already been performed.

  2. 2.

    For sufficiently long patterns of length \(m = \varTheta (n)\), our packed c-trie achieves worst-case sublinear o(n) time while the wexponential search tree requires O(n) time.

  3. 3.

    In the literature the locus is represented by (uch) where c is the first letter of the label of e. Since our packed c-trie does not maintain a search structure for branches, we represent the locus directly on e.

  4. 4.

    Since \(kM \ge n\) always hods, the n term is hidden in the time complexity.

  5. 5.

    Since all the factors of the LZDF are distinct, \(k = O(\frac{n}{\log _\sigma n})\) holds [22].

  6. 6.

    Pizza&Chili Corpus, http://pizzachili.dcc.uchile.cl.

  7. 7.

    Laboratory for webalgorithmics, uk-2005.urls.gz, http://law.di.unimi.it/datasets.php.

  8. 8.

    jawiki, https://dumps.wikimedia.org/jawiki/.

References

  1. Alstrup, S., Gavoille, C., Kaplan, H., Rauhe, T.: Nearest common ancestors: a survey and a new distributed algorithm. Theory Comp. Sys. 37, 441–456 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  2. Andersson, A., Thorup, M.: Dynamic ordered sets with exponential search trees. J. ACM 54(3), 13 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  3. Beame, P., Fich, F.E.: Optimal bounds for the predecessor problem and related problems. J. Comput. Syst. Sci. 65(1), 38–72 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  4. Belazzougui, D., Boldi, P., Vigna, S.: Dynamic Z-Fast tries. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 159–172. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Ben-Kiki, O., Bille, P., Breslauer, D., Gasieniec, L., Grossi, R., Weimann, O.: Optimal packed string matching. In: FSTTCS 2011, pp. 423–432 (2011)

    Google Scholar 

  6. Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the STOC 2004, pp. 91–100 (2004)

    Google Scholar 

  7. Cole, R., Hariharan, R.: Dynamic LCA queries on trees. SIAM J. Comput. 34(4), 894–923 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  8. Ferragina, P., Grossi, R.: The string B-tree: a new data structure for string search in external memory and its applications. J. ACM 46(2), 236–280 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  9. Fischer, J., Gawrychowski, P.: Alphabet-dependent string searching with wexponential search trees. In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 160–171. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  10. Fredman, M.L., Willard, D.E.: Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Sci. 47(3), 424–436 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  11. Goto, K., Bannai, H., Inenaga, S., Takeda, M.: LZD factorization: simple and practical online grammar compression with variable-to-fixed encoding. In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 219–230. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  12. Hon, W.-K., Lam, T.-W., Shah, R., Tam, S.-L., Vitter, J.S.: Succinct index for dynamic dictionary matching. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 1034–1043. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  13. Inenaga, S., Takeda, M.: On-line linear-time construction of word suffix trees. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 60–71. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  14. Jansson, J., Sadakane, K., Sung, W.: Linked dynamic tries with applications to LZ-compression in sublinear time and space. Algorithmica 71(4), 969–988 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  15. K"arkk"ainen, J., Ukkonen, E.: Sparse suffix trees. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp. 219–230. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  16. Morrison, D.R.: PATRICIA: practical algorithm to retrieve information coded in alphanumeric. J. ACM 15(4), 514–534 (1968)

    Article  Google Scholar 

  17. Uemura, T., Arimura, H.: Sparse and truncated suffix trees on variable-length codes. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 246–260. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  18. Ukkonen, E.: On-line construction of suffix-trees. Algorithmica 13(3), 249–260 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  19. Weiner, P.: Linear pattern-matching algorithms. In: Proceedings of 14th IEEE Annual Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

  20. Willard, D.E.: Log-logarithmic worst-case range queries are possible in space \(\varTheta (N)\). Inf. Process. Lett. 17, 81–84 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  21. Willard, D.E.: New trie data sturucture which support very fast search operations. J. Comput. Syst. Sci. 28, 379–394 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  22. Ziv, J., Lempel, A.: Compression of individual sequences via variable-length coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takuya Takagi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Takagi, T., Inenaga, S., Sadakane, K., Arimura, H. (2016). Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing. In: Mäkinen, V., Puglisi, S., Salmela, L. (eds) Combinatorial Algorithms. IWOCA 2016. Lecture Notes in Computer Science(), vol 9843. Springer, Cham. https://doi.org/10.1007/978-3-319-44543-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44543-4_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44542-7

  • Online ISBN: 978-3-319-44543-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics