ISAAC 2007: Algorithms and Computation pp 739-750 | Cite as
Fast Evaluation of Union-Intersection Expressions
Abstract
We show how to represent sets in a linear space data structure such that expressions involving unions and intersections of sets can be computed in a worst-case efficient way. This problem has applications in e.g. information retrieval and database systems. We mainly consider the RAM model of computation, and sets of machine words, but also state our results in the I/O model. On a RAM with word size w, a special case of our result is that the intersection of m (preprocessed) sets, containing n elements in total, can be computed in expected time O(n (logw)2 / w + km), where k is the number of elements in the intersection. If the first of the two terms dominates, this is a factor w 1 − o(1) faster than the standard solution of merging sorted lists. We show a cell probe lower bound of time \(\Omega(n/(w m \log m)+ (1-\tfrac{\log k}{w}) k)\), meaning that our upper bound is nearly optimal for small m. Our algorithm uses a novel combination of approximate set representations and word-level parallelism.
Keywords
Hash Function Hash Table Space Usage Expression Tree Packed ArrayPreview
Unable to display preview. Download preview PDF.
References
- 1.Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Comm. ACM 31(9), 1116–1127 (1988)CrossRefMathSciNetGoogle Scholar
- 2.Albers, S., Hagerup, T.: Improved parallel integer sorting without concurrent writing. Information and Computation 136, 25–51 (1997)MATHCrossRefMathSciNetGoogle Scholar
- 3.Andersson, A., Hagerup, T., Nilsson, S., Raman, R.: Sorting in linear time? In: Proceedings of the 27th annual ACM symposium on Theory of computing (STOC 1995), pp. 427–436 (1995)Google Scholar
- 4.Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D.: An information statistics approach to data stream and communication complexity. J. Comput. System Sci. 68(4), 702–732 (2004)MATHCrossRefMathSciNetGoogle Scholar
- 5.Barbay, J., Kenyon, C.: Adaptive intersection and t-threshold problems. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2002), pp. 390–399 (2002)Google Scholar
- 6.Bille, P., Pagh, A., Pagh, R.: Fast evaluation of union-intersection expressions. Technical Report arXiv:0708.3259v1 (2007), http://arxiv.org
- 7.Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)MATHCrossRefGoogle Scholar
- 8.Broder, A.Z., Mitzenmacher, M.: Network applications of Bloom filters: A survey. In: Proceedings of the 40th Annual Allerton Conference on Communication, Control, and Computing, pp. 636–646. ACM Press, New York (2002)Google Scholar
- 9.Brodnik, A., Munro, J.I.: Membership in constant time and almost-minimum space. SIAM J. Comput. 28(5), 1627–1640 (1999)MATHCrossRefMathSciNetGoogle Scholar
- 10.Carter, J.L., Wegman, M.N.: Universal classes of hash functions. J. Comput. System Sci. 18(2), 143–154 (1979)MATHCrossRefMathSciNetGoogle Scholar
- 11.Carter, L., Floyd, R., Gill, J., Markowsky, G., Wegman, M.: Exact and approximate membership testers. In: Proceedings of the 10th Annual ACM Symposium on Theory of Computing (STOC 1978), pp. 59–65. ACM Press, New York (1978)Google Scholar
- 12.Chakrabarti, A., Khot, S., Sun, X.: Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In: IEEE Conference on Computational Complexity, pp. 107–117. IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
- 13.Chiniforooshan, E., Farzan, A., Mirzazadeh, M.: Worst case optimal union-intersection expression evaluation. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 179–190. Springer, Heidelberg (2005)Google Scholar
- 14.Demaine, E.D., López-Ortiz, A., Munro, J.I.: Adaptive set intersections, unions, and differences. In: Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2000), pp. 743–752 (2000)Google Scholar
- 15.Hwang, F.K., Lin, S.: A simple algorithm for merging two disjoint linearly ordered sets. SIAM J. Comput. 1(1), 31–39 (1972)MATHCrossRefMathSciNetGoogle Scholar
- 16.Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press, Cambridge (1997)MATHGoogle Scholar
- 17.Mirzazadeh, M.: Adaptive comparison-based algorithms for evaluating set queries. Master’s thesis, University of Waterloo (2004)Google Scholar
- 18.Pagh, A., Pagh, R., Rao, S.S.: An optimal Bloom filter replacement. In: Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2005), pp. 823–829. ACM Press, New York (2005)Google Scholar
- 19.Thorup, M.: Even strongly universal hashing is pretty fast. In: Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2000), pp. 496–497. ACM Press, New York (2000)Google Scholar
- 20.Yao, A.C.: Some complexity questions related to distributive computing (preliminary report). In: Proceedings of the 11th Annual ACM Symposium on Theory of Computing (STOC 1979), pp. 209–213. ACM Press, New York (1979)Google Scholar