Chapter

Algorithms - ESA 2009

Volume 5757 of the series Lecture Notes in Computer Science pp 682-693

Hash, Displace, and Compress

  • Djamal BelazzouguiAffiliated withEcole Nationale Supérieure d’Informatique
  • , Fabiano C. BotelhoAffiliated withDepartment of Computer Engineering, Federal Center for Technological Education of Minas Gerais
  • , Martin DietzfelbingerAffiliated withFaculty of Computer Science and Automation, Technische Universität Ilmenau

* Final gross prices may vary according to local VAT.

Get Access

Abstract

A hash function h, i.e., a function from the set U of all keys to the range range [m] = {0,...,m − 1} is called a perfect hash function (PHF) for a subset S ⊆ U of size n ≤ m if h is 1-1 on S. The important performance parameters of a PHF are representation size, evaluation time and construction time. In this paper, we present an algorithm that permits to obtain PHFs with expected representation size very close to optimal while retaining O(n) expected construction time and O(1) evaluation time in the worst case. For example in the case m = 1.23n we obtain a PHF that uses space 1.4 bits per key, and for m = 1.01n we obtain space 1.98 bits per key, which was not achievable with previously known methods. Our algorithm is inspired by several known algorithms; the main new feature is that we combine a modification of Pagh’s “hash-and-displace” approach with data compression on a sequence of hash function indices. Our algorithm can also be used for k-perfect hashing, where at most k keys may be mapped to the same value.