Skip to main content

Cache-Conscious Collision Resolution in String Hash Tables

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 3772)

Abstract

In-memory hash tables provide fast access to large numbers of strings, with less space overhead than sorted structures such as tries and binary trees. If chains are used for collision resolution, hash tables scale well, particularly if the pattern of access to the stored strings is skew. However, typical implementations of string hash tables, with lists of nodes, are not cache-efficient. In this paper we explore two alternatives to the standard representation: the simple expedient of including the string in its node, and the more drastic step of replacing each list of nodes by a contiguous array of characters. Our experiments show that, for large sets of strings, the improvement is dramatic. In all cases, the new structures give substantial savings in space at no cost in time. In the best case, the overhead space required for pointers is reduced by a factor of around 50, to less than two bits per string (with total space required, including 5.68 megabytes of strings, falling from 20.42 megabytes to 5.81 megabytes), while access times are also reduced.

Keywords

  • Hash Function
  • Load Average
  • Hash Table
  • Collision Resolution
  • Space Overhead

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/11575832_11
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-540-32241-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading (1974)

    MATH  Google Scholar 

  2. Baer, J., Chen, T.: Effective hardware-based data prefetching for high-performance processors. IEEE Transactions on Computers 44(5), 609–623 (1995)

    MATH  CrossRef  Google Scholar 

  3. Callahan, D., Kennedy, K., Porterfield, A.: Software prefetching. In: Proc. ASPLOS Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 40–52. ACM Press, New York (1991)

    CrossRef  Google Scholar 

  4. Chilimbi, T.M., Hill, M.D., Larus, J.R.: Cache-conscious structure layout. In: Proc. ACM SIGPLAN conf. on Programming Language Design and Implementation, pp. 1–12. ACM Press, New York (1999)

    Google Scholar 

  5. Collins, J., Sair, S., Calder, B., Tullsen, D.M.: Pointer cache assisted prefetching. In: Proc. Annual ACM/IEEE MICRO Int. Symp. on Microarchitecture, pp. 62–73. IEEE Computer Society Press, Los Alamitos (2002)

    Google Scholar 

  6. Fu, J.W.C., Patel, J.H., Janssens, B.L.: Stride directed prefetching in scalar processors. SIGMICRO Newsletter 23(1-2), 102–110 (1992)

    CrossRef  Google Scholar 

  7. Halatsis, C., Philokyprou, G.: Pseudochaining in hash tables. Communications of the ACM 21(7), 554–557 (1978)

    MATH  CrossRef  MathSciNet  Google Scholar 

  8. Harman, D.: Overview of the second text retrieval conference (TREC-2). In: Information Processing & Management, pp. 271–289. Pergamon Press, Inc., Oxford (1995)

    Google Scholar 

  9. Heileman, G.L., Luo, W.: How caching affects hashing. In: Proc. ALENEX Workshop on Algorithm Engineering and Experiments (January 2005)

    Google Scholar 

  10. Heinz, S., Zobel, J., Williams, H.E.: Self-adjusting trees in practice for large text collections. Software—Practice and Experience 31(10), 925–939 (2001)

    MATH  CrossRef  Google Scholar 

  11. Karlsson, M., Dahlgren, F., Stenstrom, P.: A prefetching technique for irregular accesses to linked data structures. In: Proc. Symp. on High-Performance Computer Architecture, January 2000, pp. 206–217 (2000)

    Google Scholar 

  12. Knuth, D.E.: The Art of Computer Programming: Sorting and Searching, 2nd edn., vol. 3. Addison-Wesley Longman, Amsterdam (1998)

    Google Scholar 

  13. Larson, P.: Performance analysis of linear hashing with partial expansions. ACM Transactions on Database Systems 7(4), 566–587 (1982)

    MATH  CrossRef  Google Scholar 

  14. Lebeck, A.R.: Cache conscious programming in undergraduate computer science. In: Proc. SIGCSE Technical Symp. on Computer Science Education, pp. 247–251. ACM Press, New York (1999)

    Google Scholar 

  15. McCabe, J.: On serial files with relocatable records. Operations Research 13, 609–618 (1965)

    MATH  CrossRef  MathSciNet  Google Scholar 

  16. Munro, J.I., Celis, P.: Techniques for collision resolution in hash tables with open addressing. In: Proc. ACM Fall Joint Computer Conf., pp. 601–610. IEEE Computer Society Press, Los Alamitos (1986)

    Google Scholar 

  17. Peterson, W.W.: Open addressing. IBM J. Research & Development 1, 130–146 (1957)

    CrossRef  Google Scholar 

  18. Ramakrishna, M.V., Zobel, J.: Performance in practice of string hashing functions. In: Proc. DASFAA Symp. on Databases Systems for Advanced Applications, vol. 6, pp. 215–224. World Scientific, Singapore (1997)

    Google Scholar 

  19. Rathi, A., Lu, H., Hedrick, G.E.: Performance comparison of extendible hashing and linear hashing techniques. In: Proc. ACM SIGSMALL/PC Symp. on Small Systems, pp. 178–185. ACM Press, New York (1990)

    CrossRef  Google Scholar 

  20. Rosenberg, A.L., Stockmeyer, L.J.: Hashing schemes for extendible arrays. Jour. of the ACM 24(2), 199–221 (1977)

    MATH  CrossRef  MathSciNet  Google Scholar 

  21. Roth, A., Sohi, G.S.: Effective jump-pointer prefetching for linked data structures. In: Proc. Int. Symp. on Computer Architecture, pp. 111–121. IEEE Computer Society Press, Los Alamitos (1999)

    Google Scholar 

  22. Sarwate, D.V.: A note on universal classes of hash functions. Information Processing Letters 10(1), 41–45 (1980)

    MATH  CrossRef  MathSciNet  Google Scholar 

  23. Sinha, R., Ring, D., Zobel, J.: Cache-efficient string sorting using copying. In: submission

    Google Scholar 

  24. Sinha, R., Zobel, J.: Cache-conscious sorting of large sets of strings with dynamic tries. ACM Jour. of Exp. Algorithmics 9 (2005)

    Google Scholar 

  25. Vitter, J.S.: Analysis of the search performance of coalesced hashing. Jour. of the ACM 30(2), 231–258 (1983)

    MATH  CrossRef  MathSciNet  Google Scholar 

  26. Yang, C., Lebeck, A.R., Tseng, H., Lee, C.: Tolerating memory latency through push prefetching for pointer-intensive applications. ACM Trans. Architecture Code Optimisation 1(4), 445–475 (2004)

    CrossRef  Google Scholar 

  27. Zobel, J., Williams, H.E., Heinz, S.: In-memory hash tables for accumulating text vocabularies. Information Processing Letters 80(6), 271–277 (2001)

    MATH  CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Askitis, N., Zobel, J. (2005). Cache-Conscious Collision Resolution in String Hash Tables. In: Consens, M., Navarro, G. (eds) String Processing and Information Retrieval. SPIRE 2005. Lecture Notes in Computer Science, vol 3772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575832_11

Download citation

  • DOI: https://doi.org/10.1007/11575832_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29740-6

  • Online ISBN: 978-3-540-32241-2

  • eBook Packages: Computer ScienceComputer Science (R0)