Skip to main content
Log in

Cache-Oblivious Hashing

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average t q =1+1/2Ω(b) disk accesses for any load factor α bounded away from 1. However, such near-perfect performance is achieved only when b is known and the hash table is particularly tuned for working with such a blocking. In this paper we study if it is possible to build a cache-oblivious hash table that works well with any blocking. Such a hash table will automatically perform well across all levels of the memory hierarchy and does not need any hardware-specific tuning, an important feature in autonomous databases.

We first show that linear probing, a classical collision resolution strategy for hash tables, can be easily made cache-oblivious but it only achieves t q =1+Θ(α/b) even if a truly random hash function is used. Then we demonstrate that the block probing algorithm (Pagh et al. in SIAM Rev. 53(3):547–558, 2011) achieves t q =1+1/2Ω(b), thus matching the cache-aware bound, if the following two conditions hold: (a) b is a power of 2; and (b) every block starts at a memory address divisible by b. Note that the two conditions hold on a real machine, although they are not stated in the cache-oblivious model. Interestingly, we also show that neither condition is dispensable: if either of them is removed, the best obtainable bound is t q =1+O(α/b), which is exactly what linear probing achieves.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Roman Rietsche, Christian Dremel, … Jan-Marco Leimeister

Notes

  1. Strictly speaking the structure should be unaware of both m and b. But for most data structure problems the operations on the structure are always oblivious to m, so we only need to require that the layout works for all b.

  2. Chaining would perform worse cache-obliviously because the list associated with each position is not laid out consecutively.

  3. Here we do not allow keys to be stored in internal memory: since the memory holds at most m keys, it does not affect the average search cost as long as n is sufficiently larger than m.

References

  1. Afshani, P., Hamilton, C., Zeh, N.: Cache-oblivious range reporting with optimal queries requires superlinear space. Discrete Comput. Geom. 45(4), 824–850 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  2. Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)

    Article  MathSciNet  Google Scholar 

  3. Bender, M.A., Demaine, E.D., Farach-Colton, M.: Cache-oblivious B-trees. SIAM J. Comput. 35(2), 341–358 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  4. Bender, M.A., Brodal, G.S., Fagerberg, R., Ge, D., He, S., Hu, H., Iacono, J., López-Ortiz, A.: The cost of cache-oblivious searching. Algorithmica 61(2), 463–505 (2010)

    Article  Google Scholar 

  5. Brodal, G.S., Fagerberg, R.: On the limits of cache-obliviousness. In: Proc. ACM Symposium on Theory of Computing, pp. 307–315 (2003)

    Google Scholar 

  6. Carter, J., Wegman, M.: Universal classes of hash functions. J. Comput. Syst. Sci. 18, 143–154 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  7. Demaine, E.: Cache-oblivious algorithms and data structures. In: EEF Summer School on Massive Datasets. Springer, Berlin (2002)

    Google Scholar 

  8. Fagin, R., Nievergelt, J., Pippenger, N., Strong, H.: Extendible hashing—a fast access method for dynamic files. ACM Trans. Database Syst. 4(3), 315–344 (1979)

    Article  Google Scholar 

  9. Fredman, M.L., Komlos, J., Szemeredi, E.: Storing a sparse table with o(1) worst -case access time. In: Proc. IEEE Symposium on Foundations of Computer Science, pp. 165–170 (1982)

    Google Scholar 

  10. Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proc. IEEE Symposium on Foundations of Computer Science, pp. 285–298 (1999)

    Google Scholar 

  11. Gonnet, G.H., Larson, P.-Å.: External hashing with limited internal storage. J. ACM 35(1), 161–184 (1988)

    Article  MathSciNet  Google Scholar 

  12. He, B., Luo, Q.: Cache-oblivious databases: limitations and opportunities. ACM Trans. Database Syst. 33(2), 8 (2008)

    Article  Google Scholar 

  13. Jensen, M.S., Pagh, R.: Optimality in external memory hashing. Algorithmica 52(3), 403–411 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  14. Knuth, D.E.: Sorting and Searching. The Art of Computer Programming, vol. 3. Addison-Wesley, Reading (1973)

    Google Scholar 

  15. Larson, P.-A.: Dynamic hash tables. Commun. ACM 31(4), 446–457 (1988)

    Article  Google Scholar 

  16. Larson, P.-A.: Linear hashing with separators—a dynamic hashing scheme achieving one-access retrieval. ACM Trans. Database Syst. 13(3), 366–388 (1988)

    Article  Google Scholar 

  17. Litwin, W.: Linear hashing: a new tool for file and table addressing. In: Proc. International Conference on Very Large Data Bases, pp. 212–223 (1980)

    Google Scholar 

  18. Mairson, H.G.: The effect of table expansion on the program complexity of perfect hash functions. BIT Numer. Math. 32(3), 430–440 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  19. Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)

    Book  MATH  Google Scholar 

  20. Pagh, R., Rodler, F.F.: Cuckoo hashing. J. Algorithms 51, 122–144 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  21. Pagh, A., Pagh, R., Ružić, M.: Linear probing with 5-wise independence. SIAM Rev. 53(3), 547–558 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  22. Qi, H., Martel, C.U.: Design and analysis of hashing algorithms with cache effects. Technical report, UC, Davis (1998)

  23. Schmidt, J., Siegel, A., Srinivasan, A.: Chernoff–Hoeffding bounds for applications with limited independence. SIAM J. Discrete Math. 8, 223 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  24. Tenenbaum, G.: Introduction to analytic and probabilistic number theory. Cambridge Univ. Press, Cambridge (1995)

    Google Scholar 

  25. Verbin, E., Zhang, Q.: The limits of buffering: a tight lower bound for dynamic membership in the external memory model. In: Proc. ACM Symposium on Theory of Computing, pp. 447–456 (2010)

    Chapter  Google Scholar 

  26. Wegman, M., Carter, J.: New hash functions and their use in authentication and set equality. J. Comput. Syst. Sci. 22(3), 265–279 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  27. Wei, Z., Yi, K., Zhang, Q.: Dynamic external hashing: the limit of buffering. In: Proc. ACM Symposium on Parallelism in Algorithms and Architectures, pp. 253–259 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke Yi.

Additional information

A preliminary version of this paper was presented at the ACM Symposium on Principles of Database Systems, 2010.

The work of Rasmus Pagh was supported by the Danish National Research Foundation, as part of the project “Scalable Query Evaluation for Reliable Databases”. Most of the work was done while Z. Wei and Q. Zhang were Ph.D. students at the Hong Kong University of Science and Technology.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pagh, R., Wei, Z., Yi, K. et al. Cache-Oblivious Hashing. Algorithmica 69, 864–883 (2014). https://doi.org/10.1007/s00453-013-9763-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-013-9763-6

Keywords

Navigation