A New Framework for Join Product Skew

  • Victor Kyritsis
  • Paraskevas V. Lekeas
  • Dora Souliou
  • Foto Afrati
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6799)

Abstract

Different types of data skew can result in load imbalance in the context of parallel joins under the shared nothing architecture. We study one important type of skew, join product skew (JPS). A static approach based on frequency classes is proposed which takes for granted the data distribution of join attribute values. It comes from the observation that the join selectivity can be expressed as a sum of products of frequencies of the join attribute values. As a consequence, an appropriate assignment of join sub-tasks that takes into consideration the magnitude of the frequency products can alleviate the join product skew. Motivated by the aforementioned remark, we propose an algorithm, called Handling Join Product Skew (HJPS), to handle join product skew.

Keywords

Parallel DBMS join operation data distribution data skew load balance shared nothing architecture 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alsabti, K., Ranka, S.: Skew-insensitive parallel algorithms for relational join. In: HIPC 1998: Proceedings of the Fifth International Conference on High Performance Computing, p. 367. IEEE Computer Society, Washington, DC, USA (1998)Google Scholar
  2. 2.
    Bamha, M., Hains, G.: Frequency-adaptive join for shared nothing machines, pp. 227–241 (2001)Google Scholar
  3. 3.
    DeWitt, D.J., Gray, J.: Parallel database systems: The future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)CrossRefGoogle Scholar
  4. 4.
    DeWitt, D.J., Naughton, J.F., Schneider, D.A., Seshadri, S.: Practical skew handling in parallel joins. In: Proceedings of 18th International Conference on VLDB, Vancouver, Canada, pp. 27–40. Morgan Kaufmann, San Francisco (1992)Google Scholar
  5. 5.
    Haas, P.J., Naughton, J.F., Swami, A.N.: On the relative cost of sampling for join selectivity estimation. In: PODS 1994: Proceedings of the Thirteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 14–24. ACM, New York (1994)CrossRefGoogle Scholar
  6. 6.
    Harada, L., Kitsuregawa, M.: Dynamic join product skew handling for hash-joins in shared-nothing database systems. In: Proceedings of the 4th International Conference on DASFAA, Database Systems for Advanced Applications 1995, Singapore. Advanced Database Research and Development Series, vol. 5, pp. 246–255 (1995)Google Scholar
  7. 7.
    Seetha Lakshmi, M., Yu, P.S.: Effectiveness of parallel joins. IEEE Trans. Knowl. Data Eng. 2(4), 410–424 (1990)CrossRefGoogle Scholar
  8. 8.
    Mehta, M., DeWitt, D.J.: Data placement in shared-nothing parallel database systems. VLDB J. 6(1), 53–72 (1997)CrossRefGoogle Scholar
  9. 9.
    Walton, C.B., Dale, A.G., Jenevein, R.M.: A taxonomy and performance model of data skew effects in parallel joins. In: Proceedings of 17th International Conference on VLDB 1991, Barcelona, Catalonia, Spain, pp. 537–548. Morgan Kaufmann, San Francisco (1991)Google Scholar
  10. 10.
    Xu, Y., Kostamaa, P.: Efficient outer join data skew handling in parallel dbms. PVLDB 2(2), 1390–1396 (2009)Google Scholar
  11. 11.
    Xu, Y., Kostamaa, P., Zhou, X., Chen, L.: Handling data skew in parallel joins in shared-nothing systems. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1043–1052. ACM, New York (2008)CrossRefGoogle Scholar
  12. 12.
    Xiaofang, Z., Orlowska, M.E.: Handling data skew in parallel hash join computation using two-phase scheduling. In: Algorithms and Architectures for Parallel Processing, pp. 527–536. IEEE Computer Society, Los Alamitos (1995)Google Scholar
  13. 13.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Victor Kyritsis
    • 1
  • Paraskevas V. Lekeas
    • 2
  • Dora Souliou
    • 1
  • Foto Afrati
    • 1
  1. 1.National Technical University of AthensAthensGreece
  2. 2.Department of Applied MathematicsUniversity of CreteHerakleioGreece

Personalised recommendations