Skip to main content
Log in

Dynamic Dictionaries for Multisets and Counting Filters with Constant Time Operations

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

We resolve the open problem posed by Arbitman, Naor, and Segev [FOCS 2010] of designing a dynamic dictionary for multisets in the following setting: (1) The dictionary supports multiplicity queries and allows insertions and deletions to the multiset. (2) The dictionary is designed to support multisets of cardinality at most n (i.e., including multiplicities). (3) The space required for the dictionary is \((1+o(1))\cdot n\log \frac{u}{n} + \varTheta (n)\) bits, where u denotes the cardinality of the universe of the elements. This space is \(1+o(1)\) times the information-theoretic lower bound for static dictionaries over multisets of cardinality n if \(u=\omega (n)\). (4) All operations are completed in constant time in the worst case with high probability in the word RAM model. A direct consequence of our construction is the first dynamic counting filter (i.e., a dynamic data structure that supports approximate multiplicity queries with a one-sided error) that, with high probability, supports operations in constant time and requires space that is \(1+o(1)\) times the information-theoretic lower bound for filters plus O(n) bits. The main technical component of our solution is based on efficiently storing variable-length bounded binary counters and its analysis via weighted balls-into-bins experiments in which the weight of a ball is logarithmic in its multiplicity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. To see why this is the case, consider a \(u\times (n+1)\) grid and all the shortest paths that go from the leftmost bottom vertex to the rightmost top vertex. Such a path consists of \(u+n-1\) edges and can be completely described by its n horizontal edges, where each horizontal edge corresponds to one occurence of an element of the universe in the input set.

  2. All logarithms are base 2 unless otherwise stated. \(\ln x\) is used to denote the natural logarithm.

  3. This equality holds when u is significantly larger than n.

  4. By with high probability (whp), we mean with probability at least \(1-1/n^{\varOmega (1)}\). The constant in the exponent can be controlled by the designer and only affects the o(1) term in the space of the dictionary or the filter.

  5. For example, storing n copies of the same element would lead to almost all the elements being stored in the second level spare, causing the spare to overflow.

  6. The dense case is especially relevant in practical approximate membership (filter) settings in which \(u/n=1/\varepsilon \) due to the reduction of Carter et al. [8].

  7. Note that we allow \(\varepsilon \) to be as small as n/u (below this threshold, we can simply use a dictionary instead).

  8. To be more exact, for each bit of the counter, the construction in Pagh et al. [27] allocates a dictionary on sets such that the value of the bit can be retrieved by performing a lookup in the dictionary. Updating a bit of the counter is done by inserting or deleting elements in the associated dictionary.

  9. Data structures for predecessor and successor queries such as [31] can support multisets but they do not meet the required performance guarantees for multiplicity queries.

  10. We require that \({{\,\mathrm{\textsf {op}}\,}}_t={{\,\mathrm{\textsf {delete}}\,}}(x_t)\) only if \(\mathcal {M}_{t-1}(x_t)>0\).

  11. The probability space is induced only by the random choices (i.e., choice of hash functions) that the filter makes. Note also that if \({{\,\mathrm{\textsf {op}}\,}}_t={{\,\mathrm{\textsf {op}}\,}}_{t'}={{\,\mathrm{\textsf {count}}\,}}(x)\), then the events \(\text {Err}_{t}\) and \(\text {Err}_{t'}\) need not be independent.

  12. Note, however, that we define a \({{\,\mathrm{\textsf {CD}}\,}}\) to be full if the sum of counter lengths is 12B (even if we did not use all its space). The justification for this choice of constants is to simplify the analysis.

  13. Note that the fact that we maintain Invariant 2 in a “lazy” fashion does not affect this analysis. If an insertion to the spare fails due to a non-spare element residing in it, we move the non-spare element to the first level. Thus, the temporary presence of non-spare elements does not affect the performance of the spare.

References

  1. Arbitman, Y., Naor, M., Segev, G.: De-amortized cuckoo hashing: Provable worst-case performance and experimental results. In: International colloquium on automata, languages and programming pp. 107–118. Springer (2009)

  2. Arbitman, Y., Naor, M., Segev, G.: Backyard cuckoo hashing: Constant worst-case operations with a succinct representation. In: 2010 IEEE 51st Annual symposium on foundations of computer science, pp. 787–796. IEEE (2010)

  3. Bercea, I.O., Even, G.: Fully-dynamic space-efficient dictionaries and filters with constant number of memory accesses. arxiv:1911.05060 (2019)

  4. Bercea, I.O., Even, G.: A dynamic space-efficient filter with constant time operations. In: 17th scandinavian symposium and workshops on algorithm theory, SWAT 2020, June 22-24, 2020, Tórshavn, Faroe Islands, pp. 11:1–11:17 (2020). https://doi.org/10.4230/LIPIcs.SWAT.2020.11

  5. Blandford, D.K., Blelloch, G.E.: Compact dictionaries for variable-length keys and data with applications. ACM Trans. Algorith. 4(2), 1–25 (2008). https://doi.org/10.1145/1361192.1361194

    Article  MathSciNet  MATH  Google Scholar 

  6. Bonomi, F., Mitzenmacher, M., Panigrahy, R., Singh, S., Varghese, G.: An improved construction for counting Bloom filters. In: European symposium on algorithms, pp. 684–695. Springer (2006)

  7. Broder, A., Mitzenmacher, M.: Using multiple hash functions to improve ip lookups. In: Proceedings IEEE INFOCOM 2001. Conference on computer communications. Twentieth annual joint conference of the IEEE computer and communications society (Cat. No. 01CH37213), vol. 3, pp. 1454–1463. IEEE (2001)

  8. Carter, L., Floyd, R., Gill, J., Markowsky, G., Wegman, M.: Exact and approximate membership testers. In: Proceedings of the tenth annual ACM symposium on theory of computing, pp. 59–65. ACM (1978)

  9. Cohen, S., Matias, Y.: Spectral Bloom filters. In: Proceedings of the 2003 ACM SIGMOD International conference on Management of data, pp. 241–252 (2003)

  10. Dalal, K., Devroye, L., Malalla, E., McLeish, E.: Two-way chaining with reassignment. SIAM J. Comp. 35(2), 327–340 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  11. Demaine, E.D., auf der Heide, F.M., Pagh, R., Pătraşcu, M.: De dictionariis dynamicis pauco spatio utentibus. In: Latin American symposium on theoretical informatics, pp. 349–361. Springer (2006)

  12. Dietzfelbinger, M., auf der Heide, F.M.: A new universal class of hash functions and dynamic hashing in real time. In: International colloquium on automata, languages and programming, pp. 6–19. Springer (1990)

  13. Dietzfelbinger, M., Rink, M.: Applications of a splitting trick. In: International colloquium on automata, languages and programming, pp. 354–365. Springer (2009)

  14. Dietzfelbinger, M., Weidling, C.: Balanced allocation and dictionaries with tightly packed constant size bins. Theor. Comp. Sci. 380(1–2), 47–68 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  15. Dubhashi, D., Ranjan, D.: Balls and bins: a study in negative dependence. Rand. Struct. & Algorith. 13(2), 99–124 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  16. Elias, P.: Efficient storage and retrieval by content and address of static files. J. ACM (JACM) 21(2), 246–260 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  17. Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw. 8(3), 281–293 (2000)

    Article  Google Scholar 

  18. Fano, R.M.: On the number of bits required to implement an associative memory. memorandum 61. Computer structures group, Project MAC, MIT, Cambridge, Mass. (1971)

  19. Fotakis, D., Pagh, R., Sanders, P., Spirakis, P.: Space efficient hash tables with worst case constant access time. Theory Comp. Sys. 38(2), 229–248 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  20. Hagerup, T., Rüb, C.: A guided tour of chernoff bounds. Inf. Process. Lett. 33(6), 305–308 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  21. Hagerup, T., Tholey, T.: Efficient minimal perfect hashing in nearly minimal space. In: Annual symposium on theoretical aspects of computer science, pp. 317–326. Springer (2001)

  22. Kaplan, E., Naor, M., Reingold, O.: Derandomized constructions of k-wise (almost) independent permutations. Algorithmica 55(1), 113–133 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  23. Kirsch, A., Mitzenmacher, M.: Using a queue to de-amortize cuckoo hashing in hardware. In: Proceedings of the forty-fifth annual allerton conference on communication, control and computing, vol. 75 (2007)

  24. Knuth, D.E.: The art of computer programming, vol. 3: Searching and sorting. Reading MA: Addison-Wisley (1973)

  25. Lovett, S., Porat, E.: A lower bound for dynamic approximate membership data structures. In: 2010 IEEE 51st Annual symposium on foundations of computer science, pp. 797–804. IEEE (2010)

  26. Naor, M., Reingold, O.: On the construction of pseudorandom permutations: Luby-Rackoff revisited. J. Cryptol. 12(1), 29–66 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  27. Pagh, A., Pagh, R., Rao, S.S.: An optimal Bloom filter replacement. In: SODA, pp. 823–829. SIAM (2005)

  28. Pagh, R.: Low redundancy in static dictionaries with constant query time. SIAM J. Comp. 31(2), 353–363 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  29. Pagh, R., Rodler, F.F.: Cuckoo hashing. In: European symposium on algorithms, pp. 121–133. Springer (2001)

  30. Panigrahy, R.: Efficient hashing with lookups in two memory accesses. In: Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 830–839. Society for industrial and applied mathematics (2005)

  31. Pătraşcu, M., Thorup, M.: Dynamic integer sets with optimal rank, select, and predecessor search. In: 2014 IEEE 55th Annual symposium on foundations of computer science, pp. 166–175. IEEE (2014)

  32. Raman, R., Rao, S.S.: Succinct dynamic dictionaries and trees. In: International colloquium on automata, languages and programming, pp. 357–368. Springer (2003)

  33. Schmidt, J.P., Siegel, A., Srinivasan, A.: Chernoff-Hoeffding bounds for applications with limited independence. SIAM J. Discr. Math. 8(2), 223–250 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  34. Siegel, A.: On universal classes of extremely random constant-time hash functions. SIAM J. Comp. 33(3), 505–543 (2004)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ioana O. Bercea.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was supported by a grant from the United States-Israel Binational Science Foundation (BSF), Jerusalem, Israel and the United States National Science Foundation (NSF)

An extended abstract of this paper appeared in WADS 2021.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bercea, I.O., Even, G. Dynamic Dictionaries for Multisets and Counting Filters with Constant Time Operations. Algorithmica 85, 1786–1804 (2023). https://doi.org/10.1007/s00453-022-01057-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-022-01057-0

Keywords

Navigation