DBJ — A dynamic balancing hash join algorithm in multiprocessor database systems
The Dynamic Balancing Hash Join (DBJ), has been proposed to handle the problem of skewed data in the join operation in multiprocessor database systems. The objective of this new algorithm is to avoid the high cost of preprocessing inherent in existing algorithms. The new algorithm only redistributes a small portion of the partitioned data and, thereby achieves a balanced output with little extra cost. This is achieved dynamically, without knowledge of the input distribution, nor any co-ordinating processor. A performance analysis shows that the new algorithm performs better than existing balancing hash join algorithms for a wide degree of skew.
Unable to display preview. Download preview PDF.
- 1.David J. Dewitt, R. H. Gerber, “Multiprocessor Hash Based Join Algorithms”, Proc. of the 11th VLDB Conference, Stockholm, Sweden, 1985.Google Scholar
- 2.David J. Dewitt, J.F. Naughton, D.A. Schneider, S. Seshadri, “Practical Skew Handling in Parallel Joins”, Proc. of the 18th VLDB Conference, Vancouver, Canada, 1992.Google Scholar
- 3.Kien A. Hua, Chiang Lee, “Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning”, Proc. of the 17th VLDB Conference, Barcelona, Spain, 1991.Google Scholar
- 4.Kitsuregawa, M., Tanaka, H., and T. Motooka, “Application of Hash to Data Base Machine and its Architecture”, New Generation Computing, Vol:1, No. 1, 1983.Google Scholar
- 5.M. Kitsuregawa and Y. Ogawa, “Bucket Spreading Parallel Hash: A New Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer(SDC)”, Proc. of 16th VLDB Conference, Brisbane, Australia, 1990.Google Scholar
- 6.Hongjun Lu, Kian-Lee Tan, Ming-Chien Shan, “Hash Based Join Algorithms for Multiprocessor Computers with Shared Memory”, Proc. of the 16th VLDB Conference, Brisbane, Australia, 1990.Google Scholar
- 7.Donovan A. Schneider and David J Dewitt, “A Performance Evaluation of Four Parallel Join Algorithms in a Shared Nothing Multiprocessor Environment”, Proc. of the 1989 SIGMOD Conference, June, 1989.Google Scholar
- 8.C.B. Walton, A.G. Dale and R.M. Jenevein, “A taxonomy and performance model of data skew effects in parallel joins”, Proc. of the 17th VLDB Conference, Barcelona, Spain, 1991.Google Scholar
- 9.G. D. Knott, “Hashing function”, The Computer Journal, Vol. 18, No 3, 1973.Google Scholar
- 10.J.L. Wolf, D.M. Dias, P.S. Yu and J.J. Turek, “An effective algorithm for parallelizing hash join in the presence of data skew”, IBM T.J. Watson Research Center Tech Report RC 15510, 1990.Google Scholar
- 11.R. G. Johnson, N. J. Martin, X. Zhao, “The ADEPT Parallel Database Architecture and Implementation”, Proc. of the International Conference on Parallel Computing '91, London, September, 1991.Google Scholar
- 12.Witold Litwin, Marie-Anne Neimat, and Donovan A. Schneider, “Lh*-Linear Hashing for Distributed Files”, Proc. of the 1993 SIGMOD Conference, June, 1993.Google Scholar