Advertisement

Minimal and Monotone Minimal Perfect Hash Functions

  • Paolo BoldiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9234)

Abstract

A minimal perfect hash function (MPHF) is a (data structure providing a) bijective map from a set S of n keys to the set of the first n natural numbers. In the static case (i.e., when the set S is known in advance), there is a wide spectrum of solutions available, offering different trade-offs in terms of construction time, access time and size of the data structure. MPHFs have been shown to be useful to compress data in several data management tasks. In particular, order-preserving minimal perfect hash functions have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given order leads to an unavoidable \(\varOmega (n \log n)\) lower bound on the number of bits required to store the function. Recently, it was observed that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of web graphs, etc. MPHFs that preserve the intrinsic order of the keys are called monotone (MMPHF). The problem of building MMPHFs is more recent and less studied (for example, no lower bounds are known) but once more there is a wide spectrum of solutions available, by now. In this paper, we survey some of the most practical techniques and tools for the construction of MPHFs and MMPHFs.

Keywords

Hash Function Binary Search Binary String Lexicographic Order Query Time 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

I want to thank Sebastiano Vigna for his comments and insightful suggestions. This paper is partially funded by the Google Focused Award “Web Algorithmics for Large-Scale Data Analysis”.

References

  1. 1.
    Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with \(O(1)\) worst case access time. J. Assoc. Comput. Mach. 31, 538–544 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Fredman, M.L., Komlós, J.: On the size of separating systems and families of perfect hash functions. SIAM J. Algebr. Discret. Methods 5, 61–68 (1984)CrossRefzbMATHGoogle Scholar
  3. 3.
    Fox, E.A., Chen, Q.F., Daoud, A.M., Heath, L.S.: Order-preserving minimal perfect hash functions and information retrieval. ACM Trans. Inf. Sys. 9, 281–308 (1991)CrossRefGoogle Scholar
  4. 4.
    Majewski, B.S., Wormald, N.C., Havas, G., Czech, Z.J.: A family of perfect hashing methods. Comput. J. 39, 547–554 (1996)CrossRefGoogle Scholar
  5. 5.
    Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Monotone minimal perfect hashing: Searching a sorted table with \(O(1)\) accesses. In: Proceedings of the 20th Annual ACM-SIAM Symposium On Discrete Mathematics (SODA), pp. 785–794, New York, ACM Press (2009)Google Scholar
  6. 6.
    Boldi, P., Vigna, S.: The WebGraph framework i: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601, Manhattan, USA, ACM Press (2004)Google Scholar
  7. 7.
    Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Srinivasan, S., Ramamritham, K., Kumar, A., Ravindra, M.P., Bertino, E., Kumar, R. (eds.) Proceedings of the 20th International Conference on World Wide Web, pp. 587–596. ACM (2011)Google Scholar
  8. 8.
    Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co. Inc., Boston (1999)Google Scholar
  9. 9.
    Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practise of monotone minimal perfect hashing. In: Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 132–144. SIAM (2009)Google Scholar
  10. 10.
    Knuth, D.E.: The Art of Computer Programming. Addison-Wesley, Boston (1973)Google Scholar
  11. 11.
    Mitzenmacher, M., Vadhan, S.: Why simple hash functions work: exploiting the entropy in a data stream. In: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, pp. 746–755. Society for Industrial and Applied Mathematics, Philadelphia (2008)Google Scholar
  12. 12.
    Jacobson, G.: Space-efficient static trees and graphs. In: 30th Annual Symposium on Foundations of Computer Science (FOCS 1989), pp. 549–554. IEEE Computer Society Press, Research Triangle Park, North Carolina (1989)Google Scholar
  13. 13.
    Patrascu, M.: Succincter. In: 49th Annual IEEE Symposium on Foundations of Computer Science, pp. 305–313. IEEE Computer Society (2008)Google Scholar
  14. 14.
    Vigna, S.: Broadword implementation of rank/select queries. In: McGeoch, C.C. (ed.) WEA 2008. LNCS, vol. 5038, pp. 154–168. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  15. 15.
    Gog, S., Petri, M.: Optimized succinct data structures for massive data. Software: Practice and Experience (2014). To appearGoogle Scholar
  16. 16.
    Bloom, B.H.: Space-time trade-offs in hash coding with allowable errors. Commun. ACM 13, 422–426 (1970)CrossRefzbMATHGoogle Scholar
  17. 17.
    Mehlhorn, K.: Data Structures and Algorithms 1: Sorting and Searching. EATCS monographs on theoretical computer science, vol. 1. Springer, Heidelberg (1984) CrossRefzbMATHGoogle Scholar
  18. 18.
    Hagerup, T., Tholey, T.: Efficient minimal perfect hashing in nearly minimal space. In: Ferreira, A., Reichel, H. (eds.) STACS 2001. LNCS, vol. 2010, pp. 317–326. Springer, Heidelberg (2001) CrossRefGoogle Scholar
  19. 19.
    Chazelle, B., Kilian, J., Rubinfeld, R., Tal, A.: The Bloomier filter: an efficient data structure for static support lookup tables. In: Munro, J.I. (ed.) Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, pp. 30–39. SIAM (2004)Google Scholar
  20. 20.
    Belazzougui, D., Botelho, F.C., Dietzfelbinger, M.: Hash, displace, and compress. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 682–693. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  21. 21.
    Molloy, M.: Cores in random hypergraphs and Boolean formulas. Random Struct. Algorithms 27, 124–135 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Dietzfelbinger, M.: Design strategies for minimal perfect hash functions. In: Hromkovič, J., Královič, R., Nunkesser, M., Widmayer, P. (eds.) SAGA 2007. LNCS, vol. 4665, pp. 2–17. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  23. 23.
    Fredriksson, K., Nikitin, F.: Simple compression code supporting random access and fast string matching. In: Demetrescu, C. (ed.) WEA 2007. LNCS, vol. 4525, pp. 203–216. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  24. 24.
    Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practice of monotone minimal perfect hashing. ACM J. Exp. Algorithmic 16, 3.2:1–3.2:26 (2011)MathSciNetGoogle Scholar
  25. 25.
    Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space, with applications. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part I. LNCS, vol. 6346, pp. 427–438. Springer, Heidelberg (2010) CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Dipartimento di InformaticaUniversità degli Studi di MilanoMilanItaly

Personalised recommendations