ICDCIT 2008: Distributed Computing and Internet Technology pp 145-156 | Cite as
Optimizing Distributed Joins with Bloom Filters
Conference paper
Abstract
Distributed joins have gained importance in the past decade, mainly due to the increased number of available data sources on the Internet. In this work we extend Bloomjoin, the state of the art algorithm for distributed joins, so that it minimizes the network usage for the query execution based on database statistics. We present 4 extensions of the algorithm, and construct a query optimizer for selecting the best extension for each query. Our theoretical analysis and experimental evaluation shows significant network cost savings compared to the original Bloomjoin algorithm.
Keywords
Hash Function Query Processing Coordinator Site Query Result Bloom Filter
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Preview
Unable to display preview. Download preview PDF.
References
- 1.Bernstein, J. P.A., Goodman, N., Wong, E., Reeve, C.L., Rothnie Jr., J.B.: Query processing in a system for distributed databases (sdd-1). ACM Trans. Database Syst. 6(4), 602–625 (1981)CrossRefMATHGoogle Scholar
- 2.Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)CrossRefMATHGoogle Scholar
- 3.Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. 32(4), 422–469 (2000)CrossRefGoogle Scholar
- 4.Mackert, L.F., Lohman, G.M.: R* optimizer validation and performance evaluation for local queries. In: Zaniolo, C. (ed.) Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data, May 28-30, pp. 84–95. ACM Press, Washington (1986)CrossRefGoogle Scholar
- 5.Mitzenmacher, M.: Compressed bloom filters. IEEE/ACM Trans. Netw. 10(5), 604–612 (2002)CrossRefMATHGoogle Scholar
- 6.Papapetrou, O., Michael, L., Nejdl, W., Siberski, W.: Additional analysis on bloom filters. Technical report, Division of Engineering and Applied Sciences, Harvard University and L3S Research Center, Leibniz Universität Hannover (2007)Google Scholar
- 7.Valduriez, P., Gardarin, G.: Join and semijoin algorithms for a multiprocessor database machine. ACM Trans. Database Syst. 9(1), 133–161 (1984)CrossRefGoogle Scholar
- 8.Yu, C.T., Chang, C.C.: Distributed query processing. ACM Comput. Surv. 16(4), 399–433 (1984)CrossRefMATHGoogle Scholar
Copyright information
© Springer-Verlag Berlin Heidelberg 2008