Skip to main content
Log in

Top Tree Compression of Tries

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

We present a compressed representation of tries based on top tree compression [ICALP 2013] that works on a standard, comparison-based, pointer machine model of computation and supports efficient prefix search queries. Namely, we show how to preprocess a set of strings of total length n over an alphabet of size \(\sigma\) into a compressed data structure of worst-case optimal size \(O(n/\log _\sigma n)\) that given a pattern string P of length m determines if P is a prefix of one of the strings in time \(O(\min (m\log \sigma ,m + \log n))\). We show that this query time is in fact optimal regardless of the size of the data structure. Existing solutions either use \(\Omega (n)\) space or rely on word RAM techniques, such as tabulation, hashing, address arithmetic, or word-level parallelism, and hence do not work on a pointer machine. Our result is the first solution on a pointer machine that achieves worst-case o(n) space. Along the way, we develop several interesting data structures that work on a pointer machine and are of independent interest. These include an optimal data structures for random access to a grammar-compressed string and an optimal data structure for a variant of the level ancestor problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Here we use edge labels instead of nodes label. The two definitions are equivalent and edge labels are more natural for tries.

  2. Here \(\alpha _k(n)\) for any constant k denotes the inverse of the \(k^{th}\) row of Ackermann’s function, defined as \(\alpha _k(n)=1+\alpha _k(\alpha _{k-1}(n))\) so that \(\alpha _1(n)= n/2\), \(\alpha _2(n)=\log n\), \(\alpha _3(n)=\log ^* n\), and so on.

References

  1. Afshani, P., Arge, L., Larsen, K.G.: Higher-dimensional orthogonal range reporting and rectangle stabbing in the pointer machine model. In: Proceedings of the 28th SoCG, pp. 323–332 (2012)

  2. Alstrup, S., Holm, J.: Improved algorithms for finding level ancestors in dynamic trees. In: Proceedings of the 27th ICALP, pp. 73–84 (2000)

  3. Alstrup, S., Holm, J., Lichtenberg, K.D., Thorup, M.: Maintaining information in fully dynamic trees with top trees. ACM Trans. Algorithms 1(2), 243–264 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  4. Aoe, J.I.: An efficient digital search algorithm by using a double-array structure. IEEE Trans. Softw. Eng. 15(9), 1066–1077 (1989)

    Article  Google Scholar 

  5. Arz, J., Fischer, J.: LZ-compressed string dictionaries. In: Proceedings of the 24th DCC, pp. 322–331 (2014)

  6. Arz, J., Fischer, J.: Lempel–Ziv-78 compressed string dictionaries. Algorithmica 80, 1–36 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  7. Askitis, N., Sinha, R.: Engineering scalable, cache and space efficient tries for strings. VLDB J. 19(5), 633–660 (2010)

    Article  Google Scholar 

  8. Belazzougui, D., Boldi, P., Vigna, S.: Dynamic Z-fast tries. In: Proceedings of the 17th SPIRE, pp. 159–172 (2010)

  9. Belazzougui, D., Cunial, F., Gagie, T., Prezza, N., Raffinot, M.: Composite repetition-aware data structures. In: Proceedings of the 26th CPM, pp. 26–39 (2015)

  10. Belazzougui, D., Gagie, T., Gawrychowski, P., Kärkkäinen, J., Ordónez, A., Puglisi, S.J., Tabei, Y.: Queries on LZ-bounded encodings. In: Proceedings of the 25th DCC, pp. 83–92 (2015)

  11. Belazzougui, D., Gagie, T., Gog, S., Manzini, G., Sirén, J.: Relative FM-indexes. In: Proceedings of the 21st SPIRE, pp. 52–64 (2014)

  12. Bender, M.A., Farach-Colton, M.: The level ancestor problem simplified. Theoret. Comput. Sci. 321(1), 5–12 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  13. Benoit, D., Demaine, E.D., Munro, J.I., Raman, R., Raman, V., Rao, S.S.: Representing trees of higher degree. Algorithmica 43(4), 275–292 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  14. Bent, S.W., Sleator, D.D., Tarjan, R.E.: Biased search trees. SIAM J. Comput. 14(3), 545–568 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  15. Bille, P., Ettienne, M.B., Gørtz, I.L., Vildhøj, H.W.: Time-space trade-offs for Lempel–Ziv compressed indexing. Theor. Comput. Sci. 713, 66–77 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  16. Bille, P., Fernstrøm, F., Gørtz, I.L.: Tight bounds for top tree compression. In: Proceedings of the 24th SPIRE, pp. 97–102 (2017)

  17. Bille, P., Gawrychowski, P., Gørtz, I.L., Landau, G.M., Weimann, O.: Top tree compression of tries. In: Proceedings of the 30th ISAAC (2019)

  18. Bille, P., Gørtz, I.L., Skjoldjensen, F.R.: Deterministic indexing for packed strings. In: Proceedings of the 28th CPM (2017)

  19. Bille, P., Gørtz, I.L., Weimann, O., Landau, G.M.: Tree compression with top trees. Inf. Comput. 243, 166–177 (2015). (Announced at ICALP 2013)

    Article  MathSciNet  MATH  Google Scholar 

  20. Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings and trees. SIAM J. Comput. 44(3), 513–539 (2015). (Announced at SODA 2011)

    Article  MathSciNet  MATH  Google Scholar 

  21. Chazelle, B.: Lower bounds for orthogonal range searching: I. The reporting case. J. ACM 37(2), 200–212 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  22. Chazelle, B., Rosenberg, B.: Simplex range reporting on a pointer machine. Comput. Geom. 5(5), 237–247 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  23. Christiansen, A.R., Ettienne, M.B.: Compressed indexing with signature grammars. In: Proceedings of the 13th LATIN, pp. 331–345 (2018)

  24. Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fund. Inform. 111(3), 313–337 (2011)

    MathSciNet  MATH  Google Scholar 

  25. Claude, F., Navarro, G.: Improved grammar-based compressed indexes. In: Proceedings of the 19th SPIRE, pp. 180–192 (2012)

  26. Darragh, J.J., Cleary, J.G., Witten, I.H.: Bonsai: a compact representation of trees. Softw. Pract. Exp. 23(3), 277–291 (1993)

    Article  Google Scholar 

  27. Dietz, P.F.: Finding level-ancestors in dynamic trees. In: Proceedings of the 2nd WADS, pp. 32–40 (1991)

  28. Downey, P.J., Sethi, R., Tarjan, R.E.: Variations on the common subexpression problem. J. ACM 27(4), 758–771 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  29. Dudek, B., Gawrychowski, P.: Slowing down top trees for better worst-case compression. In: Proceedings of the 29th CPM, pp. 16:1–16:8 (2018)

  30. Farruggia, A., Gagie, T., Navarro, G., Puglisi, S.J., Sirén, J.: Relative suffix trees. Comput. J. 61(5), 773–788 (2017)

    Article  MathSciNet  Google Scholar 

  31. Fredkin, E.: Trie memory. Commun. ACM 3(9), 490–499 (1960)

    Article  Google Scholar 

  32. Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Proceedings of the 6th LATA, pp. 240–251 (2012)

  33. Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: LZ77-based self-indexing with faster pattern matching. In: Proceedings of the 11th LATIN, pp. 731–742 (2014)

  34. Grossi, R., Ottaviano, G.: Fast compressed tries through path decompositions. ACM J. Exp. Algorithm. 19, 3–4 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  35. Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  36. Hagerup, T.: Sorting and searching on the word RAM. In: Proceedings of the 15th STACS, pp. 366–398 (1998)

  37. He, M., Munro, J.I., Zhou, G.: Data structures for path queries. ACM Trans. Algorithms 12(4), 53:1–53:32 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  38. Hood, R., Melville, R.: Real-time queue operation in pure LISP. Inf. Process. Lett. 13(2), 50–54 (1981)

    Article  Google Scholar 

  39. Hübschle-Schneider, L., Raman, R.: Tree compression with top trees revisited. In: Proceedings of the 14th SEA, pp. 15–27 (2015)

  40. Kanda, S., Morita, K., Fuketa, M.: Compressed double-array tries for string dictionaries supporting fast lookup. Knowl. Inf. Syst. 51(3), 1023–1042 (2017)

    Article  Google Scholar 

  41. Kanda, S., Morita, K., Fuketa, M.: Practical implementation of space-efficient dynamic keyword dictionaries. In: Proceedings of the 24th SPIRE, pp. 221–233 (2017)

  42. Kärkkäinen, J., Ukkonen, E.: Lempel–Ziv parsing and sublinear-size index structures for string matching. In: Proceedings of the 3rd WSP, pp. 141–155 (1996)

  43. Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  44. Knuth, D.E.: The Art of Computer Programming, vol. 1. Addison Wesley, Boston (1969)

    MATH  Google Scholar 

  45. Knuth, D.E., Morris Jr. J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  46. Mäkinen, V.: Compact suffix array—a space-efficient full-text index. Fund. Inform. 56(1–2), 191–210 (2003)

    MathSciNet  MATH  Google Scholar 

  47. Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nordic J. Comput. 12(1), 40–66 (2005)

    MathSciNet  MATH  Google Scholar 

  48. Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of individual genomes. In: Proceedings of the 13th RECOMB, pp. 121–137 (2009)

  49. Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly repetitive sequence collections. J. Comput. Biol. 17(3), 281–308 (2010)

    Article  MathSciNet  Google Scholar 

  50. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39, 1 (2007)

    Article  MATH  Google Scholar 

  51. Navarro, G., Prezza, N.: Universal compressed text indexing. Theor. Comput. Sci. 762, 41–50 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  52. Nishimoto, T., Tomohiro, I., Inenaga, S., Bannai, H., Takeda, M.: Dynamic index and LZ factorization in compressed space. Discrete Appl. Math. 274, 116–129 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  53. Poyias, A., Raman, R.: Improved practical compact dynamic tries. In: Proceedings of the 22nd SPIRE, pp. 324–336 (2015)

  54. Preparata, F.P., Hong, S.J.: Convex hulls of finite sets of points in two and three dimensions. Commun. ACM 20(2), 87–93 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  55. Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applications to encoding K-ary trees, prefix sums and multisets. ACM Trans. Algorithms 3(4), 43 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  56. Sadakane, K.: Compressed text databases with efficient query algorithms based on the compressed suffix array. In: Proceedings of the 11th ISAAC, pp. 410–421 (2000)

  57. Sirén, J., Välimäki, N., Mäkinen, V., Navarro, G.: Run-length compressed indexes are superior for highly repetitive sequence collections. In: Proceedings of the 15th SPIRE, pp. 164–175 (2008)

  58. Takagi, T., Goto, K., Fujishige, Y., Inenaga, S., Arimura, H.: Linear-size CDAWG: new repetition-aware indexing and grammar compression. In: Proceedings of the 24th SPIRE, pp. 304–316 (2017)

  59. Takagi, T., Inenaga, S., Sadakane, K., Arimura, H.: Packed compact tries: a fast and efficient data structure for online string processing. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 100(9), 1785–1793 (2017)

    Article  MATH  Google Scholar 

  60. Tarjan, R.E.: A class of algorithms which require nonlinear time to maintain disjoint sets. J. Comput. Syst. Sci. 18(2), 110–127 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  61. Tsuruta, K., Köppl, D., Kanda, S., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M.: Dynamic packed compact tries revisited (2019). arXiv preprint arXiv:1904.07467

  62. Yata, S.: Dictionary compression by nesting prefix/patricia tries. In: Proceedings of the 17th Meeting of the Association for Natural Language (2011)

  63. Yoshinaga, N., Kitsuregawa, M.: A self-adaptive classifier for efficient text-stream processing. In: Proceedings of the 25th COLING, pp. 1091–1102 (2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philip Bille.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Philip Bille and Inge Li Gørtz: Supported by the Danish Research Council (DFF—4005-00267, DFF—1323-00178). Gad M. Landau: Supported by the Israel Science Foundation Grants 1475/18, and No. 2018141 from the United States-Israel Binational Science Foundation. Oren Weimann: Supported by the Israel Science Foundation Grant 592/17.

An extended abstract appeared at ISAAC 2019 [17].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bille, P., Gawrychowski, P., Gørtz, I.L. et al. Top Tree Compression of Tries. Algorithmica 83, 3602–3628 (2021). https://doi.org/10.1007/s00453-021-00869-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-021-00869-w

Keywords

Navigation