Abstract
Recent work has shown that perfect hashing and retrieval of data values associated with a key can be done in such a way that there is no need to store the keys and that only a few bits of additional space per element are needed. We present FiRe – a new, very simple approach to such data structures. FiRe allows very fast construction and better cache efficiency. The main idea is to substitute keys by small fingerprints. Collisions between fingerprints are resolved by recursively handling those elements in an overflow data structure. FiRe is dynamizable, easily parallelizable and allows distributed implementation without communicating keys. Depending on implementation choices, queries may require close to a single access to a cache line or the data structure needs as low as 2.58 bits of additional space per element.
Keywords
- Hash Function
- Hash Table
- Query Time
- Full Paper
- Construction Time
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Belazzougui, D., Botelho, F.C., Dietzfelbinger, M.: Hash, displace, and compress. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 682–693. Springer, Heidelberg (2009)
Botelho, F.C., Pagh, R., Ziviani, N.: Simple and space-efficient minimal perfect hash functions. In: Dehne, F., Sack, J.-R., Zeh, N. (eds.) WADS 2007. LNCS, vol. 4619, pp. 139–150. Springer, Heidelberg (2007)
Botelho, F.C., Ziviani, N.: External perfect hashing for very large key sets. In: 16th ACM Conference on Information and Knowledge Management, pp. 653–662. ACM, New York (2007)
de Castro Reis, D., Belazzougui, D., Botelho, F.C., Ziviani, N.: CMPH – C Minimal Perfect Hashing Library, http://cmph.sf.net
Demaine, E.D., der Heide, F.M.A., Pagh, R., Pǎtraşcu, M.: De dictionariis dynamicis pauco spatio utentibus. In: Correa, J.R., Hevia, A., Kiwi, M. (eds.) LATIN 2006. LNCS, vol. 3887, pp. 349–361. Springer, Heidelberg (2006)
Dietzfelbinger, M., Weidling, C.: Balanced allocation and dictionaries with tightly packed constant size bins. Theoret. Comput. Sci. 380(1-2), 47–68 (2007), http://dx.doi.org/10.1016/j.tcs.2007.02.054
Dietzfelbinger, M.: Design strategies for minimal perfect hash functions. In: Hromkovič, J., Královič, R., Nunkesser, M., Widmayer, P. (eds.) SAGA 2007. LNCS, vol. 4665, pp. 2–17. Springer, Heidelberg (2007), http://dx.doi.org/10.1007/978-3-540-74871-7_2
Dietzfelbinger, M., Pagh, R.: Succinct data structures for retrieval and approximate membership (Extended abstract). In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 385–396. Springer, Heidelberg (2008)
Edelkamp, S., Sanders, P., Šimeček, P.: Semi-external LTL model checking. In: Gupta, A., Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 530–542. Springer, Heidelberg (2008)
Eppstein, D., Goodrich, M.: Straggler identification in round-trip data streams via newton’s identities and invertible Bloom filters. IEEE Trans. Knowl. Data Eng. 23(2), 297–306 (2011)
Fan, B., Andersen, D.G., Kaminsky, M.: Cuckoo filter: Better than bloom. Login 38(4) (2013)
Färber, F., et al.: SAP HANA Database: Data management for modern business applications. SIGMOD Rec. 40(4), 45–51 (2012), http://doi.acm.org/10.1145/2094114.2094126
Fredriksson, K., Nikitin, F.: Simple compression code supporting random access and fast string matching. In: Demetrescu, C. (ed.) WEA 2007. LNCS, vol. 4525, pp. 203–216. Springer, Heidelberg (2007)
Google: Google books Ngram Viewer, http://storage.googleapis.com/books/ngrams/books/datasetsv2.html
Hagerup, T., Tholey, T.: Efficient minimal perfect hashing in nearly minimal space. In: Ferreira, A., Reichel, H. (eds.) STACS 2001. LNCS, vol. 2010, pp. 317–326. Springer, Heidelberg (2001)
Jenkins, B.: Algorithm alley: Hash functions. Dr. Dobb’s Journal (1997)
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)
Lim, H., Andersen, D.G., Kaminsky, M.: Practical batch-updatable external hashing with sorting. In: ALENEX, pp. 173–182. SIAM, Philadelphia (2013)
Navarro, G., Providel, E.: Fast, small, simple rank/select on bitmaps. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 295–306. Springer, Heidelberg (2012)
Wassenberg, J., Sanders, P.: Engineering a multi-core radix sort. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 6853, pp. 160–169. Springer, Heidelberg (2011)
Zhou, W.: A Compact Cache-Efficient Function Store with Constant Evaluation Time. Bachelor thesis, KIT and SAP (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Müller, I., Sanders, P., Schulze, R., Zhou, W. (2014). Retrieval and Perfect Hashing Using Fingerprinting. In: Gudmundsson, J., Katajainen, J. (eds) Experimental Algorithms. SEA 2014. Lecture Notes in Computer Science, vol 8504. Springer, Cham. https://doi.org/10.1007/978-3-319-07959-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-07959-2_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07958-5
Online ISBN: 978-3-319-07959-2
eBook Packages: Computer ScienceComputer Science (R0)