A New Framework for Join Product Skew
Different types of data skew can result in load imbalance in the context of parallel joins under the shared nothing architecture. We study one important type of skew, join product skew (JPS). A static approach based on frequency classes is proposed which takes for granted the data distribution of join attribute values. It comes from the observation that the join selectivity can be expressed as a sum of products of frequencies of the join attribute values. As a consequence, an appropriate assignment of join sub-tasks that takes into consideration the magnitude of the frequency products can alleviate the join product skew. Motivated by the aforementioned remark, we propose an algorithm, called Handling Join Product Skew (HJPS), to handle join product skew.
KeywordsParallel DBMS join operation data distribution data skew load balance shared nothing architecture
Unable to display preview. Download preview PDF.
- 1.Alsabti, K., Ranka, S.: Skew-insensitive parallel algorithms for relational join. In: HIPC 1998: Proceedings of the Fifth International Conference on High Performance Computing, p. 367. IEEE Computer Society, Washington, DC, USA (1998)Google Scholar
- 2.Bamha, M., Hains, G.: Frequency-adaptive join for shared nothing machines, pp. 227–241 (2001)Google Scholar
- 4.DeWitt, D.J., Naughton, J.F., Schneider, D.A., Seshadri, S.: Practical skew handling in parallel joins. In: Proceedings of 18th International Conference on VLDB, Vancouver, Canada, pp. 27–40. Morgan Kaufmann, San Francisco (1992)Google Scholar
- 6.Harada, L., Kitsuregawa, M.: Dynamic join product skew handling for hash-joins in shared-nothing database systems. In: Proceedings of the 4th International Conference on DASFAA, Database Systems for Advanced Applications 1995, Singapore. Advanced Database Research and Development Series, vol. 5, pp. 246–255 (1995)Google Scholar
- 9.Walton, C.B., Dale, A.G., Jenevein, R.M.: A taxonomy and performance model of data skew effects in parallel joins. In: Proceedings of 17th International Conference on VLDB 1991, Barcelona, Catalonia, Spain, pp. 537–548. Morgan Kaufmann, San Francisco (1991)Google Scholar
- 10.Xu, Y., Kostamaa, P.: Efficient outer join data skew handling in parallel dbms. PVLDB 2(2), 1390–1396 (2009)Google Scholar
- 12.Xiaofang, Z., Orlowska, M.E.: Handling data skew in parallel hash join computation using two-phase scheduling. In: Algorithms and Architectures for Parallel Processing, pp. 527–536. IEEE Computer Society, Los Alamitos (1995)Google Scholar