Holistic Shuffler for the Parallel Processing of SQL Window Functions
Window functions are a sub-class of analytical operators that allow data to be handled in a derived view of a given relation, while taking into account their neighboring tuples. Currently, systems bypass parallelization opportunities which become especially relevant when considering Big Data as data is naturally partitioned. We present a shuffling technique to improve the parallel execution of window functions when data is naturally partitioned when the query holds a partitioning clause that does not match the natural partitioning of the relation. We evaluated this technique with a non-cumulative ranking function and we were able to reduce data transfer among parallel workers in 85 % when compared to a naive approach.
This work was part-funded by project LeanBigData: Ultra-Scalable and Ultra-Efficient Integrated and Visual Big Data Analytics (FP7-619606), and by the ERDF – European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 Programme within project \(\ll \)POCI-01-0145-FEDER-006961\(\gg \), and by National Funds through the FCT – Fundação para a Ciẽncia e a Tecnologia (Portuguese Foundation for Science and Technology) as part of project UID/EEA/50014/2013.
- 1.Reactive programming (2015). http://reactivex.io
- 2.Reactive programming for java (2015). https://github.com/ReactiveX/RxJava
- 4.Chen, G., Vo, H.T., Wu, S., Ooi, B.C., Özsu, M.T.: A framework for supporting DBMS-like indexes in the cloud. Proc. VLDB Endowment 4(11), 702–713 (2011)Google Scholar
- 5.Garcia-Molina, H.: Database Systems: The Complete Book. Pearson Education, India (2008)Google Scholar
- 6.Poosala, V., Ganti, V., Ioannidis, Y.E.: Approximate query answering using histograms. IEEE Data Eng. Bull. 22(4), 5–14 (1999)Google Scholar
- 8.Zuzarte, C., Pirahesh, H., Ma, W., Cheng, Q., Liu, L., Wong, K.: Winmagic: subquery elimination using window aggregation. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 652–656. ACM (2003)Google Scholar
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.