Abstract
We present a compressed representation of tries based on top tree compression [ICALP 2013] that works on a standard, comparison-based, pointer machine model of computation and supports efficient prefix search queries. Namely, we show how to preprocess a set of strings of total length n over an alphabet of size \(\sigma\) into a compressed data structure of worst-case optimal size \(O(n/\log _\sigma n)\) that given a pattern string P of length m determines if P is a prefix of one of the strings in time \(O(\min (m\log \sigma ,m + \log n))\). We show that this query time is in fact optimal regardless of the size of the data structure. Existing solutions either use \(\Omega (n)\) space or rely on word RAM techniques, such as tabulation, hashing, address arithmetic, or word-level parallelism, and hence do not work on a pointer machine. Our result is the first solution on a pointer machine that achieves worst-case o(n) space. Along the way, we develop several interesting data structures that work on a pointer machine and are of independent interest. These include an optimal data structures for random access to a grammar-compressed string and an optimal data structure for a variant of the level ancestor problem.
Similar content being viewed by others
Notes
Here we use edge labels instead of nodes label. The two definitions are equivalent and edge labels are more natural for tries.
Here \(\alpha _k(n)\) for any constant k denotes the inverse of the \(k^{th}\) row of Ackermann’s function, defined as \(\alpha _k(n)=1+\alpha _k(\alpha _{k-1}(n))\) so that \(\alpha _1(n)= n/2\), \(\alpha _2(n)=\log n\), \(\alpha _3(n)=\log ^* n\), and so on.
References
Afshani, P., Arge, L., Larsen, K.G.: Higher-dimensional orthogonal range reporting and rectangle stabbing in the pointer machine model. In: Proceedings of the 28th SoCG, pp. 323–332 (2012)
Alstrup, S., Holm, J.: Improved algorithms for finding level ancestors in dynamic trees. In: Proceedings of the 27th ICALP, pp. 73–84 (2000)
Alstrup, S., Holm, J., Lichtenberg, K.D., Thorup, M.: Maintaining information in fully dynamic trees with top trees. ACM Trans. Algorithms 1(2), 243–264 (2005)
Aoe, J.I.: An efficient digital search algorithm by using a double-array structure. IEEE Trans. Softw. Eng. 15(9), 1066–1077 (1989)
Arz, J., Fischer, J.: LZ-compressed string dictionaries. In: Proceedings of the 24th DCC, pp. 322–331 (2014)
Arz, J., Fischer, J.: Lempel–Ziv-78 compressed string dictionaries. Algorithmica 80, 1–36 (2018)
Askitis, N., Sinha, R.: Engineering scalable, cache and space efficient tries for strings. VLDB J. 19(5), 633–660 (2010)
Belazzougui, D., Boldi, P., Vigna, S.: Dynamic Z-fast tries. In: Proceedings of the 17th SPIRE, pp. 159–172 (2010)
Belazzougui, D., Cunial, F., Gagie, T., Prezza, N., Raffinot, M.: Composite repetition-aware data structures. In: Proceedings of the 26th CPM, pp. 26–39 (2015)
Belazzougui, D., Gagie, T., Gawrychowski, P., Kärkkäinen, J., Ordónez, A., Puglisi, S.J., Tabei, Y.: Queries on LZ-bounded encodings. In: Proceedings of the 25th DCC, pp. 83–92 (2015)
Belazzougui, D., Gagie, T., Gog, S., Manzini, G., Sirén, J.: Relative FM-indexes. In: Proceedings of the 21st SPIRE, pp. 52–64 (2014)
Bender, M.A., Farach-Colton, M.: The level ancestor problem simplified. Theoret. Comput. Sci. 321(1), 5–12 (2004)
Benoit, D., Demaine, E.D., Munro, J.I., Raman, R., Raman, V., Rao, S.S.: Representing trees of higher degree. Algorithmica 43(4), 275–292 (2005)
Bent, S.W., Sleator, D.D., Tarjan, R.E.: Biased search trees. SIAM J. Comput. 14(3), 545–568 (1985)
Bille, P., Ettienne, M.B., Gørtz, I.L., Vildhøj, H.W.: Time-space trade-offs for Lempel–Ziv compressed indexing. Theor. Comput. Sci. 713, 66–77 (2018)
Bille, P., Fernstrøm, F., Gørtz, I.L.: Tight bounds for top tree compression. In: Proceedings of the 24th SPIRE, pp. 97–102 (2017)
Bille, P., Gawrychowski, P., Gørtz, I.L., Landau, G.M., Weimann, O.: Top tree compression of tries. In: Proceedings of the 30th ISAAC (2019)
Bille, P., Gørtz, I.L., Skjoldjensen, F.R.: Deterministic indexing for packed strings. In: Proceedings of the 28th CPM (2017)
Bille, P., Gørtz, I.L., Weimann, O., Landau, G.M.: Tree compression with top trees. Inf. Comput. 243, 166–177 (2015). (Announced at ICALP 2013)
Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings and trees. SIAM J. Comput. 44(3), 513–539 (2015). (Announced at SODA 2011)
Chazelle, B.: Lower bounds for orthogonal range searching: I. The reporting case. J. ACM 37(2), 200–212 (1990)
Chazelle, B., Rosenberg, B.: Simplex range reporting on a pointer machine. Comput. Geom. 5(5), 237–247 (1996)
Christiansen, A.R., Ettienne, M.B.: Compressed indexing with signature grammars. In: Proceedings of the 13th LATIN, pp. 331–345 (2018)
Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fund. Inform. 111(3), 313–337 (2011)
Claude, F., Navarro, G.: Improved grammar-based compressed indexes. In: Proceedings of the 19th SPIRE, pp. 180–192 (2012)
Darragh, J.J., Cleary, J.G., Witten, I.H.: Bonsai: a compact representation of trees. Softw. Pract. Exp. 23(3), 277–291 (1993)
Dietz, P.F.: Finding level-ancestors in dynamic trees. In: Proceedings of the 2nd WADS, pp. 32–40 (1991)
Downey, P.J., Sethi, R., Tarjan, R.E.: Variations on the common subexpression problem. J. ACM 27(4), 758–771 (1980)
Dudek, B., Gawrychowski, P.: Slowing down top trees for better worst-case compression. In: Proceedings of the 29th CPM, pp. 16:1–16:8 (2018)
Farruggia, A., Gagie, T., Navarro, G., Puglisi, S.J., Sirén, J.: Relative suffix trees. Comput. J. 61(5), 773–788 (2017)
Fredkin, E.: Trie memory. Commun. ACM 3(9), 490–499 (1960)
Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Proceedings of the 6th LATA, pp. 240–251 (2012)
Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: LZ77-based self-indexing with faster pattern matching. In: Proceedings of the 11th LATIN, pp. 731–742 (2014)
Grossi, R., Ottaviano, G.: Fast compressed tries through path decompositions. ACM J. Exp. Algorithm. 19, 3–4 (2015)
Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)
Hagerup, T.: Sorting and searching on the word RAM. In: Proceedings of the 15th STACS, pp. 366–398 (1998)
He, M., Munro, J.I., Zhou, G.: Data structures for path queries. ACM Trans. Algorithms 12(4), 53:1–53:32 (2016)
Hood, R., Melville, R.: Real-time queue operation in pure LISP. Inf. Process. Lett. 13(2), 50–54 (1981)
Hübschle-Schneider, L., Raman, R.: Tree compression with top trees revisited. In: Proceedings of the 14th SEA, pp. 15–27 (2015)
Kanda, S., Morita, K., Fuketa, M.: Compressed double-array tries for string dictionaries supporting fast lookup. Knowl. Inf. Syst. 51(3), 1023–1042 (2017)
Kanda, S., Morita, K., Fuketa, M.: Practical implementation of space-efficient dynamic keyword dictionaries. In: Proceedings of the 24th SPIRE, pp. 221–233 (2017)
Kärkkäinen, J., Ukkonen, E.: Lempel–Ziv parsing and sublinear-size index structures for string matching. In: Proceedings of the 3rd WSP, pp. 141–155 (1996)
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)
Knuth, D.E.: The Art of Computer Programming, vol. 1. Addison Wesley, Boston (1969)
Knuth, D.E., Morris Jr. J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)
Mäkinen, V.: Compact suffix array—a space-efficient full-text index. Fund. Inform. 56(1–2), 191–210 (2003)
Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nordic J. Comput. 12(1), 40–66 (2005)
Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of individual genomes. In: Proceedings of the 13th RECOMB, pp. 121–137 (2009)
Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly repetitive sequence collections. J. Comput. Biol. 17(3), 281–308 (2010)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39, 1 (2007)
Navarro, G., Prezza, N.: Universal compressed text indexing. Theor. Comput. Sci. 762, 41–50 (2019)
Nishimoto, T., Tomohiro, I., Inenaga, S., Bannai, H., Takeda, M.: Dynamic index and LZ factorization in compressed space. Discrete Appl. Math. 274, 116–129 (2019)
Poyias, A., Raman, R.: Improved practical compact dynamic tries. In: Proceedings of the 22nd SPIRE, pp. 324–336 (2015)
Preparata, F.P., Hong, S.J.: Convex hulls of finite sets of points in two and three dimensions. Commun. ACM 20(2), 87–93 (1977)
Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applications to encoding K-ary trees, prefix sums and multisets. ACM Trans. Algorithms 3(4), 43 (2007)
Sadakane, K.: Compressed text databases with efficient query algorithms based on the compressed suffix array. In: Proceedings of the 11th ISAAC, pp. 410–421 (2000)
Sirén, J., Välimäki, N., Mäkinen, V., Navarro, G.: Run-length compressed indexes are superior for highly repetitive sequence collections. In: Proceedings of the 15th SPIRE, pp. 164–175 (2008)
Takagi, T., Goto, K., Fujishige, Y., Inenaga, S., Arimura, H.: Linear-size CDAWG: new repetition-aware indexing and grammar compression. In: Proceedings of the 24th SPIRE, pp. 304–316 (2017)
Takagi, T., Inenaga, S., Sadakane, K., Arimura, H.: Packed compact tries: a fast and efficient data structure for online string processing. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 100(9), 1785–1793 (2017)
Tarjan, R.E.: A class of algorithms which require nonlinear time to maintain disjoint sets. J. Comput. Syst. Sci. 18(2), 110–127 (1979)
Tsuruta, K., Köppl, D., Kanda, S., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M.: Dynamic packed compact tries revisited (2019). arXiv preprint arXiv:1904.07467
Yata, S.: Dictionary compression by nesting prefix/patricia tries. In: Proceedings of the 17th Meeting of the Association for Natural Language (2011)
Yoshinaga, N., Kitsuregawa, M.: A self-adaptive classifier for efficient text-stream processing. In: Proceedings of the 25th COLING, pp. 1091–1102 (2014)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Philip Bille and Inge Li Gørtz: Supported by the Danish Research Council (DFF—4005-00267, DFF—1323-00178). Gad M. Landau: Supported by the Israel Science Foundation Grants 1475/18, and No. 2018141 from the United States-Israel Binational Science Foundation. Oren Weimann: Supported by the Israel Science Foundation Grant 592/17.
An extended abstract appeared at ISAAC 2019 [17].
Rights and permissions
About this article
Cite this article
Bille, P., Gawrychowski, P., Gørtz, I.L. et al. Top Tree Compression of Tries. Algorithmica 83, 3602–3628 (2021). https://doi.org/10.1007/s00453-021-00869-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-021-00869-w