Cache-Conscious Collision Resolution in String Hash Tables

  • Nikolas Askitis
  • Justin Zobel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3772)

Abstract

In-memory hash tables provide fast access to large numbers of strings, with less space overhead than sorted structures such as tries and binary trees. If chains are used for collision resolution, hash tables scale well, particularly if the pattern of access to the stored strings is skew. However, typical implementations of string hash tables, with lists of nodes, are not cache-efficient. In this paper we explore two alternatives to the standard representation: the simple expedient of including the string in its node, and the more drastic step of replacing each list of nodes by a contiguous array of characters. Our experiments show that, for large sets of strings, the improvement is dramatic. In all cases, the new structures give substantial savings in space at no cost in time. In the best case, the overhead space required for pointers is reduced by a factor of around 50, to less than two bits per string (with total space required, including 5.68 megabytes of strings, falling from 20.42 megabytes to 5.81 megabytes), while access times are also reduced.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading (1974)MATHGoogle Scholar
  2. 2.
    Baer, J., Chen, T.: Effective hardware-based data prefetching for high-performance processors. IEEE Transactions on Computers 44(5), 609–623 (1995)MATHCrossRefGoogle Scholar
  3. 3.
    Callahan, D., Kennedy, K., Porterfield, A.: Software prefetching. In: Proc. ASPLOS Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 40–52. ACM Press, New York (1991)CrossRefGoogle Scholar
  4. 4.
    Chilimbi, T.M., Hill, M.D., Larus, J.R.: Cache-conscious structure layout. In: Proc. ACM SIGPLAN conf. on Programming Language Design and Implementation, pp. 1–12. ACM Press, New York (1999)Google Scholar
  5. 5.
    Collins, J., Sair, S., Calder, B., Tullsen, D.M.: Pointer cache assisted prefetching. In: Proc. Annual ACM/IEEE MICRO Int. Symp. on Microarchitecture, pp. 62–73. IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  6. 6.
    Fu, J.W.C., Patel, J.H., Janssens, B.L.: Stride directed prefetching in scalar processors. SIGMICRO Newsletter 23(1-2), 102–110 (1992)CrossRefGoogle Scholar
  7. 7.
    Halatsis, C., Philokyprou, G.: Pseudochaining in hash tables. Communications of the ACM 21(7), 554–557 (1978)MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Harman, D.: Overview of the second text retrieval conference (TREC-2). In: Information Processing & Management, pp. 271–289. Pergamon Press, Inc., Oxford (1995)Google Scholar
  9. 9.
    Heileman, G.L., Luo, W.: How caching affects hashing. In: Proc. ALENEX Workshop on Algorithm Engineering and Experiments (January 2005)Google Scholar
  10. 10.
    Heinz, S., Zobel, J., Williams, H.E.: Self-adjusting trees in practice for large text collections. Software—Practice and Experience 31(10), 925–939 (2001)MATHCrossRefGoogle Scholar
  11. 11.
    Karlsson, M., Dahlgren, F., Stenstrom, P.: A prefetching technique for irregular accesses to linked data structures. In: Proc. Symp. on High-Performance Computer Architecture, January 2000, pp. 206–217 (2000)Google Scholar
  12. 12.
    Knuth, D.E.: The Art of Computer Programming: Sorting and Searching, 2nd edn., vol. 3. Addison-Wesley Longman, Amsterdam (1998)Google Scholar
  13. 13.
    Larson, P.: Performance analysis of linear hashing with partial expansions. ACM Transactions on Database Systems 7(4), 566–587 (1982)MATHCrossRefGoogle Scholar
  14. 14.
    Lebeck, A.R.: Cache conscious programming in undergraduate computer science. In: Proc. SIGCSE Technical Symp. on Computer Science Education, pp. 247–251. ACM Press, New York (1999)Google Scholar
  15. 15.
    McCabe, J.: On serial files with relocatable records. Operations Research 13, 609–618 (1965)MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Munro, J.I., Celis, P.: Techniques for collision resolution in hash tables with open addressing. In: Proc. ACM Fall Joint Computer Conf., pp. 601–610. IEEE Computer Society Press, Los Alamitos (1986)Google Scholar
  17. 17.
    Peterson, W.W.: Open addressing. IBM J. Research & Development 1, 130–146 (1957)CrossRefGoogle Scholar
  18. 18.
    Ramakrishna, M.V., Zobel, J.: Performance in practice of string hashing functions. In: Proc. DASFAA Symp. on Databases Systems for Advanced Applications, vol. 6, pp. 215–224. World Scientific, Singapore (1997)Google Scholar
  19. 19.
    Rathi, A., Lu, H., Hedrick, G.E.: Performance comparison of extendible hashing and linear hashing techniques. In: Proc. ACM SIGSMALL/PC Symp. on Small Systems, pp. 178–185. ACM Press, New York (1990)CrossRefGoogle Scholar
  20. 20.
    Rosenberg, A.L., Stockmeyer, L.J.: Hashing schemes for extendible arrays. Jour. of the ACM 24(2), 199–221 (1977)MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Roth, A., Sohi, G.S.: Effective jump-pointer prefetching for linked data structures. In: Proc. Int. Symp. on Computer Architecture, pp. 111–121. IEEE Computer Society Press, Los Alamitos (1999)Google Scholar
  22. 22.
    Sarwate, D.V.: A note on universal classes of hash functions. Information Processing Letters 10(1), 41–45 (1980)MATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Sinha, R., Ring, D., Zobel, J.: Cache-efficient string sorting using copying. In: submissionGoogle Scholar
  24. 24.
    Sinha, R., Zobel, J.: Cache-conscious sorting of large sets of strings with dynamic tries. ACM Jour. of Exp. Algorithmics 9 (2005)Google Scholar
  25. 25.
    Vitter, J.S.: Analysis of the search performance of coalesced hashing. Jour. of the ACM 30(2), 231–258 (1983)MATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Yang, C., Lebeck, A.R., Tseng, H., Lee, C.: Tolerating memory latency through push prefetching for pointer-intensive applications. ACM Trans. Architecture Code Optimisation 1(4), 445–475 (2004)CrossRefGoogle Scholar
  27. 27.
    Zobel, J., Williams, H.E., Heinz, S.: In-memory hash tables for accumulating text vocabularies. Information Processing Letters 80(6), 271–277 (2001)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Nikolas Askitis
    • 1
  • Justin Zobel
    • 1
  1. 1.School of Computer Science and Information TechnologyRMIT UniversityMelbourneAustralia

Personalised recommendations