The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average t q =1+1/2 Ω(b) disk accesses for any load factor α bounded away from 1. However, such near-perfect performance is achieved only when b is known and the hash table is particularly tuned for working with such a blocking. In this paper we study if it is possible to build a cache-oblivious hash table that works well with any blocking. Such a hash table will automatically perform well across all levels of the memory hierarchy and does not need any hardware-specific tuning, an important feature in autonomous databases.
We first show that linear probing, a classical collision resolution strategy for hash tables, can be easily made cache-oblivious but it only achieves t q =1+Θ(α/b) even if a truly random hash function is used. Then we demonstrate that the block probing algorithm (Pagh et al. in SIAM Rev. 53(3):547–558, 2011) achieves t q =1+1/2 Ω(b), thus matching the cache-aware bound, if the following two conditions hold: (a) b is a power of 2; and (b) every block starts at a memory address divisible by b. Note that the two conditions hold on a real machine, although they are not stated in the cache-oblivious model. Interestingly, we also show that neither condition is dispensable: if either of them is removed, the best obtainable bound is t q =1+O(α/b), which is exactly what linear probing achieves.
KeywordsCache-oblivious algorithms Hashing
- 5.Brodal, G.S., Fagerberg, R.: On the limits of cache-obliviousness. In: Proc. ACM Symposium on Theory of Computing, pp. 307–315 (2003) Google Scholar
- 7.Demaine, E.: Cache-oblivious algorithms and data structures. In: EEF Summer School on Massive Datasets. Springer, Berlin (2002) Google Scholar
- 9.Fredman, M.L., Komlos, J., Szemeredi, E.: Storing a sparse table with o(1) worst -case access time. In: Proc. IEEE Symposium on Foundations of Computer Science, pp. 165–170 (1982) Google Scholar
- 10.Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proc. IEEE Symposium on Foundations of Computer Science, pp. 285–298 (1999) Google Scholar
- 14.Knuth, D.E.: Sorting and Searching. The Art of Computer Programming, vol. 3. Addison-Wesley, Reading (1973) Google Scholar
- 17.Litwin, W.: Linear hashing: a new tool for file and table addressing. In: Proc. International Conference on Very Large Data Bases, pp. 212–223 (1980) Google Scholar
- 22.Qi, H., Martel, C.U.: Design and analysis of hashing algorithms with cache effects. Technical report, UC, Davis (1998) Google Scholar
- 24.Tenenbaum, G.: Introduction to analytic and probabilistic number theory. Cambridge Univ. Press, Cambridge (1995) Google Scholar
- 27.Wei, Z., Yi, K., Zhang, Q.: Dynamic external hashing: the limit of buffering. In: Proc. ACM Symposium on Parallelism in Algorithms and Architectures, pp. 253–259 (2009) Google Scholar