Advertisement

Algorithmica

, Volume 69, Issue 4, pp 864–883 | Cite as

Cache-Oblivious Hashing

  • Rasmus Pagh
  • Zhewei Wei
  • Ke YiEmail author
  • Qin Zhang
Article

Abstract

The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average t q =1+1/2 Ω(b) disk accesses for any load factor α bounded away from 1. However, such near-perfect performance is achieved only when b is known and the hash table is particularly tuned for working with such a blocking. In this paper we study if it is possible to build a cache-oblivious hash table that works well with any blocking. Such a hash table will automatically perform well across all levels of the memory hierarchy and does not need any hardware-specific tuning, an important feature in autonomous databases.

We first show that linear probing, a classical collision resolution strategy for hash tables, can be easily made cache-oblivious but it only achieves t q =1+Θ(α/b) even if a truly random hash function is used. Then we demonstrate that the block probing algorithm (Pagh et al. in SIAM Rev. 53(3):547–558, 2011) achieves t q =1+1/2 Ω(b), thus matching the cache-aware bound, if the following two conditions hold: (a) b is a power of 2; and (b) every block starts at a memory address divisible by b. Note that the two conditions hold on a real machine, although they are not stated in the cache-oblivious model. Interestingly, we also show that neither condition is dispensable: if either of them is removed, the best obtainable bound is t q =1+O(α/b), which is exactly what linear probing achieves.

Keywords

Cache-oblivious algorithms Hashing 

References

  1. 1.
    Afshani, P., Hamilton, C., Zeh, N.: Cache-oblivious range reporting with optimal queries requires superlinear space. Discrete Comput. Geom. 45(4), 824–850 (2011) CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988) CrossRefMathSciNetGoogle Scholar
  3. 3.
    Bender, M.A., Demaine, E.D., Farach-Colton, M.: Cache-oblivious B-trees. SIAM J. Comput. 35(2), 341–358 (2005) CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Bender, M.A., Brodal, G.S., Fagerberg, R., Ge, D., He, S., Hu, H., Iacono, J., López-Ortiz, A.: The cost of cache-oblivious searching. Algorithmica 61(2), 463–505 (2010) CrossRefGoogle Scholar
  5. 5.
    Brodal, G.S., Fagerberg, R.: On the limits of cache-obliviousness. In: Proc. ACM Symposium on Theory of Computing, pp. 307–315 (2003) Google Scholar
  6. 6.
    Carter, J., Wegman, M.: Universal classes of hash functions. J. Comput. Syst. Sci. 18, 143–154 (1979) CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Demaine, E.: Cache-oblivious algorithms and data structures. In: EEF Summer School on Massive Datasets. Springer, Berlin (2002) Google Scholar
  8. 8.
    Fagin, R., Nievergelt, J., Pippenger, N., Strong, H.: Extendible hashing—a fast access method for dynamic files. ACM Trans. Database Syst. 4(3), 315–344 (1979) CrossRefGoogle Scholar
  9. 9.
    Fredman, M.L., Komlos, J., Szemeredi, E.: Storing a sparse table with o(1) worst -case access time. In: Proc. IEEE Symposium on Foundations of Computer Science, pp. 165–170 (1982) Google Scholar
  10. 10.
    Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proc. IEEE Symposium on Foundations of Computer Science, pp. 285–298 (1999) Google Scholar
  11. 11.
    Gonnet, G.H., Larson, P.-Å.: External hashing with limited internal storage. J. ACM 35(1), 161–184 (1988) CrossRefMathSciNetGoogle Scholar
  12. 12.
    He, B., Luo, Q.: Cache-oblivious databases: limitations and opportunities. ACM Trans. Database Syst. 33(2), 8 (2008) CrossRefGoogle Scholar
  13. 13.
    Jensen, M.S., Pagh, R.: Optimality in external memory hashing. Algorithmica 52(3), 403–411 (2008) CrossRefzbMATHMathSciNetGoogle Scholar
  14. 14.
    Knuth, D.E.: Sorting and Searching. The Art of Computer Programming, vol. 3. Addison-Wesley, Reading (1973) Google Scholar
  15. 15.
    Larson, P.-A.: Dynamic hash tables. Commun. ACM 31(4), 446–457 (1988) CrossRefGoogle Scholar
  16. 16.
    Larson, P.-A.: Linear hashing with separators—a dynamic hashing scheme achieving one-access retrieval. ACM Trans. Database Syst. 13(3), 366–388 (1988) CrossRefGoogle Scholar
  17. 17.
    Litwin, W.: Linear hashing: a new tool for file and table addressing. In: Proc. International Conference on Very Large Data Bases, pp. 212–223 (1980) Google Scholar
  18. 18.
    Mairson, H.G.: The effect of table expansion on the program complexity of perfect hash functions. BIT Numer. Math. 32(3), 430–440 (1992) CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995) CrossRefzbMATHGoogle Scholar
  20. 20.
    Pagh, R., Rodler, F.F.: Cuckoo hashing. J. Algorithms 51, 122–144 (2004) CrossRefzbMATHMathSciNetGoogle Scholar
  21. 21.
    Pagh, A., Pagh, R., Ružić, M.: Linear probing with 5-wise independence. SIAM Rev. 53(3), 547–558 (2011) CrossRefzbMATHMathSciNetGoogle Scholar
  22. 22.
    Qi, H., Martel, C.U.: Design and analysis of hashing algorithms with cache effects. Technical report, UC, Davis (1998) Google Scholar
  23. 23.
    Schmidt, J., Siegel, A., Srinivasan, A.: Chernoff–Hoeffding bounds for applications with limited independence. SIAM J. Discrete Math. 8, 223 (1995) CrossRefzbMATHMathSciNetGoogle Scholar
  24. 24.
    Tenenbaum, G.: Introduction to analytic and probabilistic number theory. Cambridge Univ. Press, Cambridge (1995) Google Scholar
  25. 25.
    Verbin, E., Zhang, Q.: The limits of buffering: a tight lower bound for dynamic membership in the external memory model. In: Proc. ACM Symposium on Theory of Computing, pp. 447–456 (2010) CrossRefGoogle Scholar
  26. 26.
    Wegman, M., Carter, J.: New hash functions and their use in authentication and set equality. J. Comput. Syst. Sci. 22(3), 265–279 (1981) CrossRefzbMATHMathSciNetGoogle Scholar
  27. 27.
    Wei, Z., Yi, K., Zhang, Q.: Dynamic external hashing: the limit of buffering. In: Proc. ACM Symposium on Parallelism in Algorithms and Architectures, pp. 253–259 (2009) Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.IT University of CopenhagenCopenhagenDenmark
  2. 2.Department of Computer ScienceMADALGO (Center for Massive Data Algorithmics—A Center of the Danish National Research Foundation), Aarhus UniversityAarhusDenmark
  3. 3.Hong Kong University of Science and TechnologyHong KongChina
  4. 4.Indiana University BloomingtonBloomingtonUSA

Personalised recommendations