Top Tree Compression of Tries

Bille, Philip; Gawrychowski, Paweł; Gørtz, Inge Li; Landau, Gad M.; Weimann, Oren

doi:10.1007/s00453-021-00869-w

Top Tree Compression of Tries

Published: 20 August 2021

Volume 83, pages 3602–3628, (2021)
Cite this article

Algorithmica Aims and scope Submit manuscript

Philip Bille ORCID: orcid.org/0000-0002-1120-5154¹,
Paweł Gawrychowski²,
Inge Li Gørtz¹,
Gad M. Landau^3,4 &
…
Oren Weimann³

301 Accesses
Explore all metrics

Abstract

We present a compressed representation of tries based on top tree compression [ICALP 2013] that works on a standard, comparison-based, pointer machine model of computation and supports efficient prefix search queries. Namely, we show how to preprocess a set of strings of total length n over an alphabet of size \(\sigma\) into a compressed data structure of worst-case optimal size \(O(n/\log _\sigma n)\) that given a pattern string P of length m determines if P is a prefix of one of the strings in time \(O(\min (m\log \sigma ,m + \log n))\). We show that this query time is in fact optimal regardless of the size of the data structure. Existing solutions either use \(\Omega (n)\) space or rely on word RAM techniques, such as tabulation, hashing, address arithmetic, or word-level parallelism, and hence do not work on a pointer machine. Our result is the first solution on a pointer machine that achieves worst-case o(n) space. Along the way, we develop several interesting data structures that work on a pointer machine and are of independent interest. These include an optimal data structures for random access to a grammar-compressed string and an optimal data structure for a variant of the level ancestor problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing

Trickier XBWT Tricks

Tree Compression Using String Grammars

Article 06 February 2017

Notes

Here we use edge labels instead of nodes label. The two definitions are equivalent and edge labels are more natural for tries.
Here \(\alpha _k(n)\) for any constant k denotes the inverse of the \(k^{th}\) row of Ackermann’s function, defined as \(\alpha _k(n)=1+\alpha _k(\alpha _{k-1}(n))\) so that \(\alpha _1(n)= n/2\), \(\alpha _2(n)=\log n\), \(\alpha _3(n)=\log ^* n\), and so on.

References

Afshani, P., Arge, L., Larsen, K.G.: Higher-dimensional orthogonal range reporting and rectangle stabbing in the pointer machine model. In: Proceedings of the 28th SoCG, pp. 323–332 (2012)
Alstrup, S., Holm, J.: Improved algorithms for finding level ancestors in dynamic trees. In: Proceedings of the 27th ICALP, pp. 73–84 (2000)
Alstrup, S., Holm, J., Lichtenberg, K.D., Thorup, M.: Maintaining information in fully dynamic trees with top trees. ACM Trans. Algorithms 1(2), 243–264 (2005)
Article MathSciNet MATH Google Scholar
Aoe, J.I.: An efficient digital search algorithm by using a double-array structure. IEEE Trans. Softw. Eng. 15(9), 1066–1077 (1989)
Article Google Scholar
Arz, J., Fischer, J.: LZ-compressed string dictionaries. In: Proceedings of the 24th DCC, pp. 322–331 (2014)
Arz, J., Fischer, J.: Lempel–Ziv-78 compressed string dictionaries. Algorithmica 80, 1–36 (2018)
Article MathSciNet MATH Google Scholar
Askitis, N., Sinha, R.: Engineering scalable, cache and space efficient tries for strings. VLDB J. 19(5), 633–660 (2010)
Article Google Scholar
Belazzougui, D., Boldi, P., Vigna, S.: Dynamic Z-fast tries. In: Proceedings of the 17th SPIRE, pp. 159–172 (2010)
Belazzougui, D., Cunial, F., Gagie, T., Prezza, N., Raffinot, M.: Composite repetition-aware data structures. In: Proceedings of the 26th CPM, pp. 26–39 (2015)
Belazzougui, D., Gagie, T., Gawrychowski, P., Kärkkäinen, J., Ordónez, A., Puglisi, S.J., Tabei, Y.: Queries on LZ-bounded encodings. In: Proceedings of the 25th DCC, pp. 83–92 (2015)
Belazzougui, D., Gagie, T., Gog, S., Manzini, G., Sirén, J.: Relative FM-indexes. In: Proceedings of the 21st SPIRE, pp. 52–64 (2014)
Bender, M.A., Farach-Colton, M.: The level ancestor problem simplified. Theoret. Comput. Sci. 321(1), 5–12 (2004)
Article MathSciNet MATH Google Scholar
Benoit, D., Demaine, E.D., Munro, J.I., Raman, R., Raman, V., Rao, S.S.: Representing trees of higher degree. Algorithmica 43(4), 275–292 (2005)
Article MathSciNet MATH Google Scholar
Bent, S.W., Sleator, D.D., Tarjan, R.E.: Biased search trees. SIAM J. Comput. 14(3), 545–568 (1985)
Article MathSciNet MATH Google Scholar
Bille, P., Ettienne, M.B., Gørtz, I.L., Vildhøj, H.W.: Time-space trade-offs for Lempel–Ziv compressed indexing. Theor. Comput. Sci. 713, 66–77 (2018)
Article MathSciNet MATH Google Scholar
Bille, P., Fernstrøm, F., Gørtz, I.L.: Tight bounds for top tree compression. In: Proceedings of the 24th SPIRE, pp. 97–102 (2017)
Bille, P., Gawrychowski, P., Gørtz, I.L., Landau, G.M., Weimann, O.: Top tree compression of tries. In: Proceedings of the 30th ISAAC (2019)
Bille, P., Gørtz, I.L., Skjoldjensen, F.R.: Deterministic indexing for packed strings. In: Proceedings of the 28th CPM (2017)
Bille, P., Gørtz, I.L., Weimann, O., Landau, G.M.: Tree compression with top trees. Inf. Comput. 243, 166–177 (2015). (Announced at ICALP 2013)
Article MathSciNet MATH Google Scholar
Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings and trees. SIAM J. Comput. 44(3), 513–539 (2015). (Announced at SODA 2011)
Article MathSciNet MATH Google Scholar
Chazelle, B.: Lower bounds for orthogonal range searching: I. The reporting case. J. ACM 37(2), 200–212 (1990)
Article MathSciNet MATH Google Scholar
Chazelle, B., Rosenberg, B.: Simplex range reporting on a pointer machine. Comput. Geom. 5(5), 237–247 (1996)
Article MathSciNet MATH Google Scholar
Christiansen, A.R., Ettienne, M.B.: Compressed indexing with signature grammars. In: Proceedings of the 13th LATIN, pp. 331–345 (2018)
Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fund. Inform. 111(3), 313–337 (2011)
MathSciNet MATH Google Scholar
Claude, F., Navarro, G.: Improved grammar-based compressed indexes. In: Proceedings of the 19th SPIRE, pp. 180–192 (2012)
Darragh, J.J., Cleary, J.G., Witten, I.H.: Bonsai: a compact representation of trees. Softw. Pract. Exp. 23(3), 277–291 (1993)
Article Google Scholar
Dietz, P.F.: Finding level-ancestors in dynamic trees. In: Proceedings of the 2nd WADS, pp. 32–40 (1991)
Downey, P.J., Sethi, R., Tarjan, R.E.: Variations on the common subexpression problem. J. ACM 27(4), 758–771 (1980)
Article MathSciNet MATH Google Scholar
Dudek, B., Gawrychowski, P.: Slowing down top trees for better worst-case compression. In: Proceedings of the 29th CPM, pp. 16:1–16:8 (2018)
Farruggia, A., Gagie, T., Navarro, G., Puglisi, S.J., Sirén, J.: Relative suffix trees. Comput. J. 61(5), 773–788 (2017)
Article MathSciNet Google Scholar
Fredkin, E.: Trie memory. Commun. ACM 3(9), 490–499 (1960)
Article Google Scholar
Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Proceedings of the 6th LATA, pp. 240–251 (2012)
Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: LZ77-based self-indexing with faster pattern matching. In: Proceedings of the 11th LATIN, pp. 731–742 (2014)
Grossi, R., Ottaviano, G.: Fast compressed tries through path decompositions. ACM J. Exp. Algorithm. 19, 3–4 (2015)
Article MathSciNet MATH Google Scholar
Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)
Article MathSciNet MATH Google Scholar
Hagerup, T.: Sorting and searching on the word RAM. In: Proceedings of the 15th STACS, pp. 366–398 (1998)
He, M., Munro, J.I., Zhou, G.: Data structures for path queries. ACM Trans. Algorithms 12(4), 53:1–53:32 (2016)
Article MathSciNet MATH Google Scholar
Hood, R., Melville, R.: Real-time queue operation in pure LISP. Inf. Process. Lett. 13(2), 50–54 (1981)
Article Google Scholar
Hübschle-Schneider, L., Raman, R.: Tree compression with top trees revisited. In: Proceedings of the 14th SEA, pp. 15–27 (2015)
Kanda, S., Morita, K., Fuketa, M.: Compressed double-array tries for string dictionaries supporting fast lookup. Knowl. Inf. Syst. 51(3), 1023–1042 (2017)
Article Google Scholar
Kanda, S., Morita, K., Fuketa, M.: Practical implementation of space-efficient dynamic keyword dictionaries. In: Proceedings of the 24th SPIRE, pp. 221–233 (2017)
Kärkkäinen, J., Ukkonen, E.: Lempel–Ziv parsing and sublinear-size index structures for string matching. In: Proceedings of the 3rd WSP, pp. 141–155 (1996)
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)
Article MathSciNet MATH Google Scholar
Knuth, D.E.: The Art of Computer Programming, vol. 1. Addison Wesley, Boston (1969)
MATH Google Scholar
Knuth, D.E., Morris Jr. J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)
Article MathSciNet MATH Google Scholar
Mäkinen, V.: Compact suffix array—a space-efficient full-text index. Fund. Inform. 56(1–2), 191–210 (2003)
MathSciNet MATH Google Scholar
Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nordic J. Comput. 12(1), 40–66 (2005)
MathSciNet MATH Google Scholar
Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of individual genomes. In: Proceedings of the 13th RECOMB, pp. 121–137 (2009)
Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly repetitive sequence collections. J. Comput. Biol. 17(3), 281–308 (2010)
Article MathSciNet Google Scholar
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39, 1 (2007)
Article MATH Google Scholar
Navarro, G., Prezza, N.: Universal compressed text indexing. Theor. Comput. Sci. 762, 41–50 (2019)
Article MathSciNet MATH Google Scholar
Nishimoto, T., Tomohiro, I., Inenaga, S., Bannai, H., Takeda, M.: Dynamic index and LZ factorization in compressed space. Discrete Appl. Math. 274, 116–129 (2019)
Article MathSciNet MATH Google Scholar
Poyias, A., Raman, R.: Improved practical compact dynamic tries. In: Proceedings of the 22nd SPIRE, pp. 324–336 (2015)
Preparata, F.P., Hong, S.J.: Convex hulls of finite sets of points in two and three dimensions. Commun. ACM 20(2), 87–93 (1977)
Article MathSciNet MATH Google Scholar
Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applications to encoding K-ary trees, prefix sums and multisets. ACM Trans. Algorithms 3(4), 43 (2007)
Article MathSciNet MATH Google Scholar
Sadakane, K.: Compressed text databases with efficient query algorithms based on the compressed suffix array. In: Proceedings of the 11th ISAAC, pp. 410–421 (2000)
Sirén, J., Välimäki, N., Mäkinen, V., Navarro, G.: Run-length compressed indexes are superior for highly repetitive sequence collections. In: Proceedings of the 15th SPIRE, pp. 164–175 (2008)
Takagi, T., Goto, K., Fujishige, Y., Inenaga, S., Arimura, H.: Linear-size CDAWG: new repetition-aware indexing and grammar compression. In: Proceedings of the 24th SPIRE, pp. 304–316 (2017)
Takagi, T., Inenaga, S., Sadakane, K., Arimura, H.: Packed compact tries: a fast and efficient data structure for online string processing. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 100(9), 1785–1793 (2017)
Article MATH Google Scholar
Tarjan, R.E.: A class of algorithms which require nonlinear time to maintain disjoint sets. J. Comput. Syst. Sci. 18(2), 110–127 (1979)
Article MathSciNet MATH Google Scholar
Tsuruta, K., Köppl, D., Kanda, S., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M.: Dynamic packed compact tries revisited (2019). arXiv preprint arXiv:1904.07467
Yata, S.: Dictionary compression by nesting prefix/patricia tries. In: Proceedings of the 17th Meeting of the Association for Natural Language (2011)
Yoshinaga, N., Kitsuregawa, M.: A self-adaptive classifier for efficient text-stream processing. In: Proceedings of the 25th COLING, pp. 1091–1102 (2014)

Download references

Author information

Authors and Affiliations

DTU Compute, Technical University of Denmark, Lyngby, Denmark
Philip Bille & Inge Li Gørtz
University of Wrocław, Wrocław, Poland
Paweł Gawrychowski
University of Haifa, Haifa, Israel
Gad M. Landau & Oren Weimann
NYU Tandon School of Engineering, New York University, New York, USA
Gad M. Landau

Authors

Philip Bille
View author publications
You can also search for this author in PubMed Google Scholar
Paweł Gawrychowski
View author publications
You can also search for this author in PubMed Google Scholar
Inge Li Gørtz
View author publications
You can also search for this author in PubMed Google Scholar
Gad M. Landau
View author publications
You can also search for this author in PubMed Google Scholar
Oren Weimann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philip Bille.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Philip Bille and Inge Li Gørtz: Supported by the Danish Research Council (DFF—4005-00267, DFF—1323-00178). Gad M. Landau: Supported by the Israel Science Foundation Grants 1475/18, and No. 2018141 from the United States-Israel Binational Science Foundation. Oren Weimann: Supported by the Israel Science Foundation Grant 592/17.

An extended abstract appeared at ISAAC 2019 [17].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bille, P., Gawrychowski, P., Gørtz, I.L. et al. Top Tree Compression of Tries. Algorithmica 83, 3602–3628 (2021). https://doi.org/10.1007/s00453-021-00869-w

Download citation

Received: 20 December 2019
Accepted: 10 August 2021
Published: 20 August 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s00453-021-00869-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Top Tree Compression of Tries

Abstract

Access this article

Similar content being viewed by others

Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing

Trickier XBWT Tricks

Tree Compression Using String Grammars

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Top Tree Compression of Tries

Abstract

Access this article

Similar content being viewed by others

Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing

Trickier XBWT Tricks

Tree Compression Using String Grammars

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation