An Efficient Equi-semi-join Algorithm for Distributed Architectures

  • M. Bamha
  • G. Hains
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3515)

Abstract

Semi-joins is the most used technique to optimize the treatment of complex relational queries on distributed architectures. However the overcost related to semi-joins computation can be very high due to data skew and to the high cost of communication in distributed architectures. In this paper we present a parallel equi-semi-join algorithm for shared nothing machines. The performance of this algorithm is analyzed using the BSP cost model and is proved to have asymptotic optimal complexity and perfect load balancing even for highly skewed data. This guarantees unlimited scalability in all situations for this key algorithm.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bamha, M., Exbrayat, M.: Pipelining a skew-insensitive parallel join algorithm. Parallel Processing Letters 3(3), 317–328 (2003)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Bamha, M., Hains, G.: A skew insensitive algorithm for join and multi-join operation on Shared Nothing machines. In: Ibrahim, M., Küng, J., Revell, N. (eds.) DEXA 2000. LNCS, vol. 1873, p. 644. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  3. 3.
    Bamha, M., Hains, G.: A frequency adaptive join algorithm for Shared Nothing machines. Journal of Parallel and Distributed Computing Practices (PDCP) 3(3), 333–345 (1999)Google Scholar
  4. 4.
    Lawrence Carter, J., Wegman, M.N.: Universal classes of hash functions. Journal of Computer and System Sciences 18(2), 143–154 (1979)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    DeWitt, D.J., Naughton, J.F., Schneider, D.A., Seshadri, S.: Practical Skew Handling in Parallel Joins. In: Proceedings of the 18th VLDB Conference, Vancouver, British Columbia, Canada, pp. 27–40 (1992)Google Scholar
  6. 6.
    Hua, K.A., Lee, C.: Handling data skew in multiprocessor database computers using partition tuning. In: Proc. of the 17th International Conference on Very Large Data Bases, Barcelona, Catalonia, Spain, pp. 525–535 (1991)Google Scholar
  7. 7.
    Kitsuregawa, M., Ogawa, Y.: Bucket spreading parallel hash: A new, robust, parallel hash join method for skew in the super database computer (SDC). In: 16th International Conference on Very Large Data Bases, pp. 210–221 (1990)Google Scholar
  8. 8.
    Seetha, M., Yu, P.S.: Effectiveness of parallel joins. IEEE, Transactions on Knowledge and Data Enginneerings 2(4), 410–424 (1990)CrossRefGoogle Scholar
  9. 9.
    Skillicorn, D.B., Hill, J.M.D., McColl, W.F.: Questions and Answers about BSP. Scientific Programming 6(3), 249–274 (1997)Google Scholar
  10. 10.
    Stocker, K., Kossmann, D., Braumandl, R., Kemper, A.: Integrating semi-join-reducers into state-of-the-art query processors. In: Proceedings of the 17th International Conference on Data Engineering, pp. 575–584 (2001)Google Scholar
  11. 11.
    Wolf, J.L., Dias, D.M., Yu, P.S., Turek, J.: New algorithms for parallelizing relational database joins in the presence of data skew. IEEE Transactions on Knowledge and Data Engineering 6(6), 990–997 (1994)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • M. Bamha
    • 1
  • G. Hains
    • 1
  1. 1.LIFOUniversité d’OrléansOrléans Cedex 2France

Personalised recommendations