Explicit and Efficient Hash Families Suffice for Cuckoo Hashing with a Stash

Aumüller, Martin; Dietzfelbinger, Martin; Woelfel, Philipp

doi:10.1007/s00453-013-9840-x

Explicit and Efficient Hash Families Suffice for Cuckoo Hashing with a Stash

Published: 08 October 2013

Volume 70, pages 428–456, (2014)
Cite this article

Algorithmica Aims and scope Submit manuscript

Martin Aumüller¹,
Martin Dietzfelbinger¹ &
Philipp Woelfel²

559 Accesses
15 Citations
3 Altmetric
Explore all metrics

Abstract

It is shown that for cuckoo hashing with a stash as proposed by Kirsch et al. (Proc. 16th European Symposium on Algorithms (ESA), pp. 611–622, Springer, Berlin, 2008) families of very simple hash functions can be used, maintaining the favorable performance guarantees: with constant stash size s the probability of a rehash is O(1/n ^s+1), the lookup time and the deletion time are O(s) in the worst case, and the amortized expected insertion time is O(s) as well. Instead of the full randomness needed for the analysis of Kirsch et al. and of Kutzelnigg (Discrete Math. Theor. Comput. Sci., 12(3):81–102, 2010) (resp. Θ(logn)-wise independence for standard cuckoo hashing) the new approach even works with 2-wise independent hash families as building blocks. Both construction and analysis build upon the work of Dietzfelbinger and Woelfel (Proc. 35th ACM Symp. on Theory of Computing (STOC), pp. 629–638, 2003). The analysis, which can also be applied to the fully random case, utilizes a graph counting argument and is much simpler than previous proofs. The results can be generalized to situations where the stash size is non-constant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Alibi: A Flaw in Cuckoo-Hashing Based Hierarchical ORAM Schemes and a Solution

Hardness-Preserving Reductions via Cuckoo Hashing

Article 07 May 2018

Hardness Preserving Reductions via Cuckoo Hashing

Notes

κ-wise independent families of hash functions are defined in Sect. 2.
Personal communication with Mikkel Thorup, 2012.
The notation “$\exists T \subseteq S \colon\mathcal{A}_{T} \cap\mathrm{bad}_{T}$” stands for the formally correct $\bigcup_{T \subseteq S} (\mathcal{A}_{T} \cap\mathrm{bad}_{T})$. Generally, in slight abuse of notation, we will often use the name of an event “$\mathcal{A}_{T}$” (or “$\mathcal{A}_{T} \cap\mathrm{bad}_{T}$”) also for the statement “$\mathcal{A}_{T}$ occurs” (or “$\mathcal{A}_{T} \cap \mathrm{bad}_{T}$ occurs”).
When the stash has non-constant size, this yields non-constant lookup time. One way to circumvent this is to organize the stash itself as a hash table, which introduces failure probabilities of other types. See [1] for a detailed discussion of this issue.
http://www.boost.org.
Source code available at: http://eiche.theoinf.tu-ilmenau.de/ch-stash/.

References

Arbitman, Y.: Efficient dictionary data structures based on cuckoo hashing. Master’s thesis, Weizmann Institute of Science (2010)
Carter, L., Wegman, M.N.: Universal classes of hash functions. J. Comput. Syst. Sci. 18(2), 143–154 (1979)
Article MATH MathSciNet Google Scholar
Devroye, L., Morin, P.: Cuckoo hashing: Further analysis. Inf. Process. Lett. 86(4), 215–219 (2003)
Article MATH MathSciNet Google Scholar
Diestel, R.: Graph Theory. Springer, Berlin (2005)
MATH Google Scholar
Dietzfelbinger, M., Hagerup, T., Katajainen, J., Penttonen, M.: A reliable randomized algorithm for the closest-pair problem. J. Algorithms 25(1), 19–51 (1997)
Article MATH MathSciNet Google Scholar
Dietzfelbinger, M., Rink, M.: Applications of a splitting trick. In: Proc. 36th International Colloquium on Automata, Languages and Programming (ICALP). LNCS, vol. 5555, pp. 354–365. Springer, Berlin (2009)
Chapter Google Scholar
Dietzfelbinger, M., Schellbach, U.: On risks of using cuckoo hashing with simple universal hash classes. In: Proc. 20th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 795–804 (2009)
Chapter Google Scholar
Dietzfelbinger, M., Weidling, C.: Balanced allocation and dictionaries with tightly packed constant size bins. Theor. Comput. Sci. 380(1–2), 47–68 (2007)
Article MATH MathSciNet Google Scholar
Dietzfelbinger, M., Woelfel, P.: Almost random graphs with simple hash functions. In: Proc. 35th ACM Symp. on Theory of Computing (STOC), New York, NY, USA, pp. 629–638 (2003)
Google Scholar
Fotakis, D., Pagh, R., Sanders, P., Spirakis, P.G.: Space efficient hash tables with worst case constant access time. Theory Comput. Syst. 38(2), 229–248 (2005)
Article MATH MathSciNet Google Scholar
Goodrich, M.T., Mitzenmacher, M.: Privacy-preserving access of outsourced data via oblivious ram simulation. In: Proc. 38th International Colloquium on Automata, Languages and Programming (ICALP), pp. 576–587 (2011)
Chapter Google Scholar
Kirsch, A., Mitzenmacher, M., Wieder, U.: More robust hashing: cuckoo hashing with a stash. In: Proc. 16th European Symposium on Algorithms (ESA). LNCS, vol. 5193, pp. 611–622. Springer, Berlin (2008)
Google Scholar
Kirsch, A., Mitzenmacher, M., Wieder, U.: More robust hashing: cuckoo hashing with a stash. SIAM J. Comput. 39(4), 1543–1561 (2009)
Article MATH MathSciNet Google Scholar
Klassen, T.Q., Woelfel, P.: Independence of tabulation-based hash classes. In: Proc. 10th Theoretical Informatics—Latin American Symposium (LATIN). LNCS, vol. 7256, pp. 506–517. Springer, Berlin (2012)
Google Scholar
Kutzelnigg, R.: A further analysis of cuckoo hashing with a stash and random graphs of excess r. Discrete Math. Theor. Comput. Sci. 12(3), 81–102 (2010)
MATH MathSciNet Google Scholar
Mitzenmacher, M., Vadhan, S.P.: Why simple hash functions work: exploiting the entropy in a data stream. In: Proc. 19th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 746–755 (2008)
Google Scholar
Pagh, R., Rodler, F.F.: Cuckoo hashing. J. Algorithms 51(2), 122–144 (2004)
Article MATH MathSciNet Google Scholar
Pǎtraşcu, M., Thorup, M.: The power of simple tabulation hashing. J. ACM 59(3), 14 (2012)
MathSciNet Google Scholar
Siegel, A.: On universal classes of extremely random constant-time hash functions. SIAM J. Comput. 33(3), 505–543 (2004)
Article MATH MathSciNet Google Scholar
Thorup, M., Zhang, Y.: Tabulation based 4-universal hashing with applications to second moment estimation. In: Proc. 15th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 615–624 (2004)
Google Scholar
Thorup, M., Zhang, Y.: Tabulation-based 5-independent hashing with applications to linear probing and second moment estimation. SIAM J. Comput. 41(2), 293–331 (2012)
Article MATH MathSciNet Google Scholar
Wegman, M.N., Carter, L.: New hash functions and their use in authentication and set equality. J. Comput. Syst. Sci. 22, 265–279 (1981)
Article MATH MathSciNet Google Scholar
Woelfel, P.: Asymmetric balanced allocation with simple hash functions. In: Proc. 17th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 424–433 (2006)
Google Scholar

Download references

Acknowledgements

We thank Pascal Klaue for implementing the algorithms and carrying out the experiments presented in Sect. 7. We thank the anonymous reviewers, whose suggestions helped a lot in improving the presentation of this work. We especially thank one reviewer who pointed out the extensions to non-constant stash size and κ-wise independent hash families.

Author information

Authors and Affiliations

Faculty of Computer Science and Automation, Technische Universität Ilmenau, 98694, Ilmenau, Germany
Martin Aumüller & Martin Dietzfelbinger
Department of Computer Science, University of Calgary, Calgary, Alberta, T2N 1N4, Canada
Philipp Woelfel

Authors

Martin Aumüller
View author publications
You can also search for this author in PubMed Google Scholar
Martin Dietzfelbinger
View author publications
You can also search for this author in PubMed Google Scholar
Philipp Woelfel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Aumüller.

Additional information

M. Dietzfelbinger was supported in part by DFG grant DI 412/10-2. P. Woelfel was supported by a Discovery Grant from the National Sciences and Research Council of Canada (NSERC). A preliminary version of this paper appeared under the title “Explicit and Efficient Hash Functions Suffice for Cuckoo Hashing with a Stash” in Proceedings of the 20th Annual European Symposium on Algorithms, Ljubljana, Slovenia, September 2012, Lecture Notes in Computer Science 7501, Springer 2012.

Appendix: Excess, Stash Size, and Insertions

In this supplementary section, provided for the convenience of the reader, we clarify the connection between stash size needed and the excess ex(G(S,h ₁,h ₂)) of the cuckoo graph G(S,h ₁,h ₂) as well as the role of insertion procedures. In particular, we prove Lemma 5. The central statements of this section can also be found in [13, 15].

1.1 A.1 The Excess of a Graph

For G a graph, ζ(G) denotes the number of connected components of G. The cyclomatic number γ(G), technically defined as “the dimension of the cycle space of G”, can be characterized by the following basic formula [4]:

$$ \gamma(G) = m - n + \zeta(G), $$

(7)

for n the number of nodes and m the number of edges of G. Note that acyclic graphs are characterized by the equation n=m+ζ(G) and hence by the equation γ(G)=0. The following lemma gives two helpful ways of viewing γ(G).

Lemma 13

(a)
Assume G′ is obtained from G by removing an edge e. If e is a cycle edge then γ(G′)=γ(G)−1, otherwise γ(G′)=γ(G).
(b)
If we remove edges from G sequentially, in an arbitrary order, and the resulting graph is acyclic, then γ(G) is the number of removed cycle edges—edges that are on a cycle when removed.
(c)
γ(G) is the minimum number of edges one has to remove from G such that the resulting graph is acyclic.

Proof

(a) We have, using (7) twice:

$$\gamma\bigl(G'\bigr) = (m-1) - n + \zeta\bigl(G'\bigr) = \gamma(G) - \bigl(1 - \bigl(\zeta \bigl(G'\bigr)-\zeta(G)\bigr) \bigr). $$

We observe:

If e is a cycle edge in G, then ζ(G′)=ζ(G), and hence γ(G′)=γ(G)−1.
If e is not a cycle edge, then ζ(G′)=ζ(G)+1, and hence γ(G′)=γ(G).

(b) By what we just observed, to reduce the cyclomatic number from γ(G) to 0 the number of rounds in which an edge is removed that is on a cycle must be γ(G). (c) By (b), if we start with G and iterate removing cycle edges, we obtain an acyclic graph, and the number of steps is γ(G). If we remove fewer than γ(G) edges (in any order), by (b) the resulting graph cannot be acyclic. □

We have defined the excess ex(G) of a graph G as the minimum number of edges one has to remove from G so that the remaining subgraph has only acyclic and unicyclic components. In [15] the characterization of this quantity given next was used as a definition; the same idea was used in [13] (without giving it a name).

For G a graph, let ζ _cyc(G) denote the number of cyclic components of G.

Lemma 14

In all graphs G the equation ex(G)=γ(G)−ζ _cyc(G) is satisfied.

Proof

Assume G has n nodes and m edges.

“≤”: Starting with G, we iteratively remove cycle edges until each cyclic component has only one cycle left. The number of edges removed is at least ex(G). Call the resulting graph G′. Removing one cycle edge from each of the ζ _cyc(G) cyclic components of G′ will yield an acyclic graph. Lemma 13(b) tells us that together exactly γ(G) edges have been removed; hence γ(G)≥ex(G)+ζ _cyc(G).

“≥”: Choose a set E ⁺ of ex(G) edges in G such that removing these edges leaves a graph G′ with only acyclic and unicyclic components. Now imagine that the edges in E ⁺ are removed one by one in an arbitrary order. Let β denote the number of edges in E ⁺ that are on a cycle when removed; the other ex(G)−β many were non-cycle edges when removed. Removing one cycle edge from each cyclic component of G′ will leave an acyclic graph. Counting the number of cycle edges we removed altogether, and applying Lemma 13(b) again, we see that γ(G)=β+ζ _cyc(G′). Since removing a non-cycle edge from a graph can increase the number of cyclic components by at most 1, we have that ζ _cyc(G′)≤ζ _cyc(G)+(ex(G)−β). Combining the inequalities yields γ(G)≤ζ _cyc(G)+ex(G). □

1.2 A.2 The Excess of the Cuckoo Graph and the Stash Size

The purpose of this section is to prove Lemma 5, which we recall here. We assume that h ₁ and h ₂ are given, and write G(S) for G(S,h ₁,h ₂), for S⊆U.

Lemma 5

[13]

The keys from S can be stored in the two tables and a stash of size s using h ₁,h ₂ if and only if ex(G(S))≤s.

Proof

“⇒”: Assume T is a subset of S of size at most s such that all keys from S′=S−T can be stored in the two tables. Then all components of G(S′) must be acyclic or unicyclic. (Assume C is a component with γ(C)>1. Then by (7) the number of edges (keys) in C would be strictly larger than the number of nodes (table positions), which is impossible.) Since G(S′) is obtained from G(S) by removing the edges (h ₁(x),h ₂(x)), x∈T, we get ex(G(S))≤s.

“⇐”: Assume ex(G(S))≤s. Choose a subset T of S of size ex(G(S)) such that G(S−T) has only acyclic and unicyclic components. From what is known about the behavior of standard cuckoo hashing, we can store S′=S−T in the two tables using h ₁ and h ₂ (e.g., see [3, Sect. 4]). (This can also be proved directly. If one of the nodes touched by an edge (h ₁(x),h ₂(x)), x∈S′, has degree 1, we place x in the corresponding cell. Iterating this, we can place all keys excepting those that belong to cycle edges. Since G(S′) has only acyclic and unicyclic components, the cycle edges form isolated simple cycles, and clearly the keys that belong to such a cycle can be placed in the corresponding cells.) By assumption, the keys from T fit into the stash. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aumüller, M., Dietzfelbinger, M. & Woelfel, P. Explicit and Efficient Hash Families Suffice for Cuckoo Hashing with a Stash. Algorithmica 70, 428–456 (2014). https://doi.org/10.1007/s00453-013-9840-x

Download citation

Received: 14 December 2012
Accepted: 23 September 2013
Published: 08 October 2013
Issue Date: November 2014
DOI: https://doi.org/10.1007/s00453-013-9840-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Explicit and Efficient Hash Families Suffice for Cuckoo Hashing with a Stash

Abstract

Access this article

Similar content being viewed by others

Alibi: A Flaw in Cuckoo-Hashing Based Hierarchical ORAM Schemes and a Solution

Hardness-Preserving Reductions via Cuckoo Hashing

Hardness Preserving Reductions via Cuckoo Hashing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Excess, Stash Size, and Insertions

1.1 A.1 The Excess of a Graph

Lemma 13

Proof

Lemma 14

Proof

1.2 A.2 The Excess of the Cuckoo Graph and the Stash Size

Lemma 5

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Explicit and Efficient Hash Families Suffice for Cuckoo Hashing with a Stash

Abstract

Access this article

Similar content being viewed by others

Alibi: A Flaw in Cuckoo-Hashing Based Hierarchical ORAM Schemes and a Solution

Hardness-Preserving Reductions via Cuckoo Hashing

Hardness Preserving Reductions via Cuckoo Hashing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Excess, Stash Size, and Insertions

Appendix: Excess, Stash Size, and Insertions

1.1 A.1 The Excess of a Graph

Lemma 13

Proof

Lemma 14

Proof

1.2 A.2 The Excess of the Cuckoo Graph and the Stash Size

Lemma 5

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation