Minimal and Monotone Minimal Perfect Hash Functions
A minimal perfect hash function (MPHF) is a (data structure providing a) bijective map from a set S of n keys to the set of the first n natural numbers. In the static case (i.e., when the set S is known in advance), there is a wide spectrum of solutions available, offering different trade-offs in terms of construction time, access time and size of the data structure. MPHFs have been shown to be useful to compress data in several data management tasks. In particular, order-preserving minimal perfect hash functions have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given order leads to an unavoidable \(\varOmega (n \log n)\) lower bound on the number of bits required to store the function. Recently, it was observed that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of web graphs, etc. MPHFs that preserve the intrinsic order of the keys are called monotone (MMPHF). The problem of building MMPHFs is more recent and less studied (for example, no lower bounds are known) but once more there is a wide spectrum of solutions available, by now. In this paper, we survey some of the most practical techniques and tools for the construction of MPHFs and MMPHFs.
KeywordsHash Function Binary Search Binary String Lexicographic Order Query Time
I want to thank Sebastiano Vigna for his comments and insightful suggestions. This paper is partially funded by the Google Focused Award “Web Algorithmics for Large-Scale Data Analysis”.
- 5.Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Monotone minimal perfect hashing: Searching a sorted table with \(O(1)\) accesses. In: Proceedings of the 20th Annual ACM-SIAM Symposium On Discrete Mathematics (SODA), pp. 785–794, New York, ACM Press (2009)Google Scholar
- 6.Boldi, P., Vigna, S.: The WebGraph framework i: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601, Manhattan, USA, ACM Press (2004)Google Scholar
- 7.Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Srinivasan, S., Ramamritham, K., Kumar, A., Ravindra, M.P., Bertino, E., Kumar, R. (eds.) Proceedings of the 20th International Conference on World Wide Web, pp. 587–596. ACM (2011)Google Scholar
- 8.Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co. Inc., Boston (1999)Google Scholar
- 9.Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practise of monotone minimal perfect hashing. In: Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 132–144. SIAM (2009)Google Scholar
- 10.Knuth, D.E.: The Art of Computer Programming. Addison-Wesley, Boston (1973)Google Scholar
- 11.Mitzenmacher, M., Vadhan, S.: Why simple hash functions work: exploiting the entropy in a data stream. In: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, pp. 746–755. Society for Industrial and Applied Mathematics, Philadelphia (2008)Google Scholar
- 12.Jacobson, G.: Space-efficient static trees and graphs. In: 30th Annual Symposium on Foundations of Computer Science (FOCS 1989), pp. 549–554. IEEE Computer Society Press, Research Triangle Park, North Carolina (1989)Google Scholar
- 13.Patrascu, M.: Succincter. In: 49th Annual IEEE Symposium on Foundations of Computer Science, pp. 305–313. IEEE Computer Society (2008)Google Scholar
- 15.Gog, S., Petri, M.: Optimized succinct data structures for massive data. Software: Practice and Experience (2014). To appearGoogle Scholar
- 19.Chazelle, B., Kilian, J., Rubinfeld, R., Tal, A.: The Bloomier filter: an efficient data structure for static support lookup tables. In: Munro, J.I. (ed.) Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, pp. 30–39. SIAM (2004)Google Scholar