Abstract
Different types of data skew can result in load imbalance in the context of parallel joins under the shared nothing architecture. We study one important type of skew, join product skew (JPS). A static approach based on frequency classes is proposed which takes for granted the data distribution of join attribute values. It comes from the observation that the join selectivity can be expressed as a sum of products of frequencies of the join attribute values. As a consequence, an appropriate assignment of join sub-tasks that takes into consideration the magnitude of the frequency products can alleviate the join product skew. Motivated by the aforementioned remark, we propose an algorithm, called Handling Join Product Skew (HJPS), to handle join product skew.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alsabti, K., Ranka, S.: Skew-insensitive parallel algorithms for relational join. In: HIPC 1998: Proceedings of the Fifth International Conference on High Performance Computing, p. 367. IEEE Computer Society, Washington, DC, USA (1998)
Bamha, M., Hains, G.: Frequency-adaptive join for shared nothing machines, pp. 227–241 (2001)
DeWitt, D.J., Gray, J.: Parallel database systems: The future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)
DeWitt, D.J., Naughton, J.F., Schneider, D.A., Seshadri, S.: Practical skew handling in parallel joins. In: Proceedings of 18th International Conference on VLDB, Vancouver, Canada, pp. 27–40. Morgan Kaufmann, San Francisco (1992)
Haas, P.J., Naughton, J.F., Swami, A.N.: On the relative cost of sampling for join selectivity estimation. In: PODS 1994: Proceedings of the Thirteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 14–24. ACM, New York (1994)
Harada, L., Kitsuregawa, M.: Dynamic join product skew handling for hash-joins in shared-nothing database systems. In: Proceedings of the 4th International Conference on DASFAA, Database Systems for Advanced Applications 1995, Singapore. Advanced Database Research and Development Series, vol. 5, pp. 246–255 (1995)
Seetha Lakshmi, M., Yu, P.S.: Effectiveness of parallel joins. IEEE Trans. Knowl. Data Eng. 2(4), 410–424 (1990)
Mehta, M., DeWitt, D.J.: Data placement in shared-nothing parallel database systems. VLDB J. 6(1), 53–72 (1997)
Walton, C.B., Dale, A.G., Jenevein, R.M.: A taxonomy and performance model of data skew effects in parallel joins. In: Proceedings of 17th International Conference on VLDB 1991, Barcelona, Catalonia, Spain, pp. 537–548. Morgan Kaufmann, San Francisco (1991)
Xu, Y., Kostamaa, P.: Efficient outer join data skew handling in parallel dbms. PVLDB 2(2), 1390–1396 (2009)
Xu, Y., Kostamaa, P., Zhou, X., Chen, L.: Handling data skew in parallel joins in shared-nothing systems. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1043–1052. ACM, New York (2008)
Xiaofang, Z., Orlowska, M.E.: Handling data skew in parallel hash join computation using two-phase scheduling. In: Algorithms and Architectures for Parallel Processing, pp. 527–536. IEEE Computer Society, Los Alamitos (1995)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kyritsis, V., Lekeas, P.V., Souliou, D., Afrati, F. (2012). A New Framework for Join Product Skew. In: Lacroix, Z., Vidal, M.E. (eds) Resource Discovery. RED 2010. Lecture Notes in Computer Science, vol 6799. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27392-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-27392-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27391-9
Online ISBN: 978-3-642-27392-6
eBook Packages: Computer ScienceComputer Science (R0)