ISAAC 2007: Algorithms and Computation pp 739-750 | Cite as

Fast Evaluation of Union-Intersection Expressions

  • Philip Bille
  • Anna Pagh
  • Rasmus Pagh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4835)

Abstract

We show how to represent sets in a linear space data structure such that expressions involving unions and intersections of sets can be computed in a worst-case efficient way. This problem has applications in e.g. information retrieval and database systems. We mainly consider the RAM model of computation, and sets of machine words, but also state our results in the I/O model. On a RAM with word size w, a special case of our result is that the intersection of m (preprocessed) sets, containing n elements in total, can be computed in expected time O(n (logw)2 / w + km), where k is the number of elements in the intersection. If the first of the two terms dominates, this is a factor w 1 − o(1) faster than the standard solution of merging sorted lists. We show a cell probe lower bound of time \(\Omega(n/(w m \log m)+ (1-\tfrac{\log k}{w}) k)\), meaning that our upper bound is nearly optimal for small m. Our algorithm uses a novel combination of approximate set representations and word-level parallelism.

Keywords

Hash Function Hash Table Space Usage Expression Tree Packed Array 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Comm. ACM 31(9), 1116–1127 (1988)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Albers, S., Hagerup, T.: Improved parallel integer sorting without concurrent writing. Information and Computation 136, 25–51 (1997)MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Andersson, A., Hagerup, T., Nilsson, S., Raman, R.: Sorting in linear time? In: Proceedings of the 27th annual ACM symposium on Theory of computing (STOC 1995), pp. 427–436 (1995)Google Scholar
  4. 4.
    Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D.: An information statistics approach to data stream and communication complexity. J. Comput. System Sci. 68(4), 702–732 (2004)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Barbay, J., Kenyon, C.: Adaptive intersection and t-threshold problems. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2002), pp. 390–399 (2002)Google Scholar
  6. 6.
    Bille, P., Pagh, A., Pagh, R.: Fast evaluation of union-intersection expressions. Technical Report arXiv:0708.3259v1 (2007), http://arxiv.org
  7. 7.
    Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)MATHCrossRefGoogle Scholar
  8. 8.
    Broder, A.Z., Mitzenmacher, M.: Network applications of Bloom filters: A survey. In: Proceedings of the 40th Annual Allerton Conference on Communication, Control, and Computing, pp. 636–646. ACM Press, New York (2002)Google Scholar
  9. 9.
    Brodnik, A., Munro, J.I.: Membership in constant time and almost-minimum space. SIAM J. Comput. 28(5), 1627–1640 (1999)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Carter, J.L., Wegman, M.N.: Universal classes of hash functions. J. Comput. System Sci. 18(2), 143–154 (1979)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Carter, L., Floyd, R., Gill, J., Markowsky, G., Wegman, M.: Exact and approximate membership testers. In: Proceedings of the 10th Annual ACM Symposium on Theory of Computing (STOC 1978), pp. 59–65. ACM Press, New York (1978)Google Scholar
  12. 12.
    Chakrabarti, A., Khot, S., Sun, X.: Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In: IEEE Conference on Computational Complexity, pp. 107–117. IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
  13. 13.
    Chiniforooshan, E., Farzan, A., Mirzazadeh, M.: Worst case optimal union-intersection expression evaluation. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 179–190. Springer, Heidelberg (2005)Google Scholar
  14. 14.
    Demaine, E.D., López-Ortiz, A., Munro, J.I.: Adaptive set intersections, unions, and differences. In: Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2000), pp. 743–752 (2000)Google Scholar
  15. 15.
    Hwang, F.K., Lin, S.: A simple algorithm for merging two disjoint linearly ordered sets. SIAM J. Comput. 1(1), 31–39 (1972)MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press, Cambridge (1997)MATHGoogle Scholar
  17. 17.
    Mirzazadeh, M.: Adaptive comparison-based algorithms for evaluating set queries. Master’s thesis, University of Waterloo (2004)Google Scholar
  18. 18.
    Pagh, A., Pagh, R., Rao, S.S.: An optimal Bloom filter replacement. In: Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2005), pp. 823–829. ACM Press, New York (2005)Google Scholar
  19. 19.
    Thorup, M.: Even strongly universal hashing is pretty fast. In: Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2000), pp. 496–497. ACM Press, New York (2000)Google Scholar
  20. 20.
    Yao, A.C.: Some complexity questions related to distributive computing (preliminary report). In: Proceedings of the 11th Annual ACM Symposium on Theory of Computing (STOC 1979), pp. 209–213. ACM Press, New York (1979)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Philip Bille
    • 1
  • Anna Pagh
    • 1
  • Rasmus Pagh
    • 1
  1. 1.Computational Logic and Algorithms Group, IT University of CopenhagenDenmark

Personalised recommendations