A Skew-Insensitive Algorithm for Join and Multi-join Operations on Shared Nothing Machines

  • M. Bamha
  • G. Hains
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1873)

Abstract

Join is an expensive and frequently used operation whose parallelization is highly desirable. However effectiveness of parallel joins depends on the ability to evenly divide load among processors. Data skew can have a disastrous effect on performance. Although many skew-handling algorithms have been proposed they remain generally inefficient in the case of multi-joins due to join product skew, costly and unnecessary redistribution and communication costs. A parallel join algorithm called fa_join has been introduced in an earlier paper with deterministic and near-perfect balancing properties. Despite its advantages, fa_join is sensitive to the correlation of the attribute value distributions in both relations. We present here an improved version of the algorithm called Sfa_join with a symmetric treatment of both relations. Its predictably low join-product and attribute-value skew makes it suitable for repeated use in multi-join operations. Its performance is analyzed theoretically and experimentally, to confirm its linear speed-up and its superiority over fa_join.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    M. Bamha and G. Hains. A frequency adaptive join algorithm for SN machines. Journal of Parallel and Distributed Computing Practices, 2000. To appear.Google Scholar
  2. 2.
    M. Bamha and G. Hains. A symmetric frequency-adaptive join algorithm for shared nothing machines. Research Report RR-LIFO-2000-03, LIFO, Université d’Orléans, 2000. ftp://ftp-lifo.univ-orleans.fr/pub/RR/RR2000/RR2000-03.ps.
  3. 3.
    M. Bamha and G. Hains. A self-balancing join algorithm for Shared Nothing machines. In the Proc of the 10th International Conference on Parallel and Distributed Computing Systems, Las Vegas, Nevada, October 1998.Google Scholar
  4. 4.
    David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider, and S. Seshrdri. Practical Skew Handling in Parallel Joins. In Proceedings of the 18th VLDB Conference, Vancouver, British Columbia, Canada, 1992.Google Scholar
  5. 5.
    L. Harada and M. Kitsuregawa. Dynamic join product skew handling for hash-joins in shared-nothing database systems. In Fourth International Conference on Database Systems for Advanced Applications, pages 246–255, 1995.Google Scholar
  6. 6.
    Kian-Lee Tan Hongjun Lu. Dynamic and load-balanced task-oriented database query processing in parallel systems. In Proceedings of the 3third Conf. Extending Data Base Technology, 1992, pp. 357–372, 1992.Google Scholar
  7. 7.
    K. A. Hua and C. Lee. Handling data skew in multiprocessor database computers using partition tuning. In G. M. Lohman, A. Sernadas, and R. Camps, editors, Proc. of the 17th International Conference on Very Large Data Bases, pages 525–535, Barcelona, Catalonia, Spain, 1991. Morgan Kaufmann.Google Scholar
  8. 8.
    Hongjun Lu, Beng-Chin Ooi, and Kian-Lee Tan. Query Processing in Parallel Relational Database Systems. IEEE Computer Society Press, California, 1994.Google Scholar
  9. 9.
    H. Märtens. Skew-insensitive join processing in shared-disk database systems. Proc. of Issues and Applications of Database Technology (IADT’ 98), Berlin, 1998.Google Scholar
  10. 10.
    A. N. Mourad, R. J. T. Morris, A. Swami, and H. C. Young. Limits of parallelism in hash join algorithms. Performance evaluation, 20(1/3):301–316, May 1994.Google Scholar
  11. 11.
    Viswanath Poosala and Yannis E. Ioannidis. Estimation of query-result distribution and its application in parallel-join load balancing. In: Proc. 22th Int. Conf. on Very Large Database Systems, VLDB’96, Bombay, India, 1996.Google Scholar
  12. 12.
    Donovan A. Schneider and David J. DeWitt. A performance of four parallel join algorithms in a shared-nothing multiprocessor environment. ACM SIGMOD, 1989.Google Scholar
  13. 13.
    M. Seetha and Philip S. Yu. Effectiveness of Parallel Joins, published in the IEEE, Trans. Knowledge and Data Enginneerings, Vol. 2, No 4, PP 410–424, 1990.CrossRefGoogle Scholar
  14. 14.
    Leslie Valiant. A Bridging Model for Parallel Computation,. Communication of the ACM, Vol 33, No. 8., August 1990.Google Scholar
  15. 15.
    Annita N. Wilschut, Jan Flokstra, and Peter M.G. Apers. Parallel Evaluation of Multi-join Queries. In the Proc. Of the ACM-SIGMOD, California, 1995.Google Scholar
  16. 16.
    G. K. Zipf. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Reading, MA, Adisson-Wesley, 1949.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • M. Bamha
    • 1
  • G. Hains
    • 1
  1. 1.LIFOUniversit’d’OrlánsOrléans Cedex 2France

Personalised recommendations