Skip to main content
Log in

Flexible partitioning for selective binary theta-joins in a massively parallel setting

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Efficient join processing plays an important role in big data analysis. In this work, we focus on generic theta joins in a massively parallel environment, such as MapReduce and Spark. Theta joins are notoriously slow due to their inherent quadratic complexity, even when their selectivity is low, e.g., 1%. The main performance bottleneck differs between cases, and is due to any of the following factors or their combination: amount of data being shuffled, memory load on reducers, or computation load on reducers. We propose an ensemble-based partitioning approach that tackles all three aspects. In this way, we can save communication cost, we better respect the memory and computation limitations of reducers and overall, we reduce the total execution time. The key idea behind our partitioning is to cluster join key values following two techniques, namely matrix re-arrangement and agglomerative clustering. These techniques can run either in isolation or in combination. We present thorough experimental results using both band queries on real data and arbitrary synthetic predicates. We show that we can save up to 45% of the communication cost and reduce the computation load of a single reducer up to 50% in band queries, whereas the savings are up to 74 and 80%, respectively, in queries with arbitrary theta predicates. Apart from being effective, the potential benefits of our approach can be estimated before execution from metadata, which allows for informed partitioning decisions. Finally, our solutions are flexible in that they can account for any weighted combination of the three bottleneck factors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. In our 4-page abstract [16], we provide a preliminary version of the material in Sect. 4. All the remainder material in this work is novel.

  2. In the remainder of this work we will use the terms region, partition and group interchangeably; we will also use the term reducer for the worker node, where local join processing takes place, but this does not imply that we are tailored to a MapReduce setting only.

  3. It is also trivial to express imb as a function of mri and rep through simple algebraic manipulation.

  4. http://www.math.uwaterloo.ca/tsp/concorde/.

  5. TSPk is implemented according to [9], the code of which has been integrated into our codebase under the https://github.com/JohnKoumarelas/binarythetajoins/tree/master/btj/tspk directory.

  6. Available from http://cdiac.ornl.gov/ftp/ndp026c/.

References

  1. Afrati, F., Ullman, J.: Matching bounds for the all-pairs mapreduce problem. In: Proceedings of the 17th International Database Engineering & Applications Symposium, pp. 3–4. ACM (2013)

  2. Afrati, F.N., Sarma, A.D., Salihoglu, S., Ullman, J.D.: Upper and lower bounds on the cost of a map-reduce computation. PVLDB 6(4), 277–288 (2013)

    Google Scholar 

  3. Afrati, F.N., Ullman, J.D.: Optimizing multiway joins in a map-reduce environment. IEEE Trans. Knowl. Data Eng. 23(9), 1282–1298 (2011)

    Article  Google Scholar 

  4. Beame, P., Koutris, P., Suciu, D.: Skew in parallel query processing. In: PODS, pp. 212–223 (2014)

  5. Chan, H.M., Milner, D.A.: Direct clustering algorithm for group formation in cellular manufacture. J. Manuf. Syst. 1(1), 65–75 (1982)

    Article  Google Scholar 

  6. Chen, S.-Y., Chang, T.-P., Chang, Z.-H.: An efficient theta-join query processing algorithm on mapreduce framework. In: Proceedings of the 2012 International Symposium on Computer, Consumer and Control (IS3C), pp. 686–689. IEEE (2012)

  7. Chu, S., Balazinska, M., Suciu, D.: From theory to practice: efficient join query evaluation in a parallel database system. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31–June 4, 2015, pp. 63–78 (2015)

  8. Chu, X., Ilyas, I.F., Koutris, P.: Distributed data deduplication. PVLDB 9(11), 864–875 (2016)

    Google Scholar 

  9. Climer, S., Zhang, W.: Rearrangement clustering: pitfalls, remedies, and applications. J. Mach. Learn. Res. 7, 919–943 (2006)

    MathSciNet  MATH  Google Scholar 

  10. Crotty, A., Galakatos, A., Dursun, K., Kraska, T., Binnig, C., Çetintemel, U., Zdonik, S.: An architecture for compiling udf-centric workflows. PVLDB 8(12), 1466–1477 (2015)

    Google Scholar 

  11. Doulkeridis, C., Nørvåg, K.: A survey of large-scale analytical query processing in mapreduce. VLDB J. 23(3), 1–26 (2013)

    Google Scholar 

  12. Elseidy, M., Elguindy, A., Vitorovic, A., Koch, C.: Scalable and adaptive online joins. PVLDB 7(6), 441–452 (2014)

    Google Scholar 

  13. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, Burlington (2000)

    MATH  Google Scholar 

  14. Khayyat, Z., Lucia, W., Singh, M., Ouzzani, M., Papotti, P., Quiané-Ruiz, J.-A., Tang, N., Kalnis, P.: Lightning fast and space efficient inequality joins. PVLDB 8(13), 2074–2085 (2015)

    Google Scholar 

  15. King, J.R.: Machine-component grouping in production flow analysis: an approach using a rank order clustering algorithm. Int. J. Prod. Res. 18(2), 213–232 (1980)

    Article  MathSciNet  Google Scholar 

  16. Koumarelas, I., Naskos, A., Gounaris, A.: Binary theta-joins using mapreduce: efficiency analysis and improvements. In: Proceedings of the International Workshop on Algorithms for MapReduce and Beyond (BMR) (in conjunction with EDBT/ICDT’2014), Athens, Greece (2014)

  17. Lenstra, J.K., Rinnooy Kan, A.H.G.: Some simple applications of the travelling salesman problem. Oper. Res. Q. 26(4), 717–733 (1975)

    Article  MATH  Google Scholar 

  18. Lenstra, J.K.: Technical noteclustering a data array and the traveling-salesman problem. Oper. Res. 22(2), 413–414 (1974)

    Article  MATH  Google Scholar 

  19. Li, F., Ooi, B.C., Tamer Özsu, M., Wu, S.: Distributed data management using mapreduce. ACM Comput. Surv. 46(3), 31 (2014)

    Google Scholar 

  20. McCormick, W.T., Schweitzer, P.J., White, T.W.: Problem decomposition and data reorganization by a clustering technique. Oper. Res. 20(5), 993–1009 (1972)

    Article  MATH  Google Scholar 

  21. Okcan, A., Riedewald, M.: Processing theta-joins using mapreduce. In: SIGMOD Conference, pp. 949–960 (2011)

  22. Okcan, A., Riedewald, M.: Anti-combining for mapreduce. In: SIGMOD Conference, pp. 839–850 (2014)

  23. Ren, K., Kwon, Y.C., Balazinska, M., Howe, B.: Hadoop’s adolescence. PVLDB 6(10), 853–864 (2013)

    Google Scholar 

  24. Sarma, A.D., He, Y., Chaudhuri, S.: Clusterjoin: a similarity joins framework using map-reduce. PVLDB 7(12), 1059–1070 (2014)

    Google Scholar 

  25. Tao, Y., Lin, W., Xiao, X.: Minimal mapreduce algorithms. In: SIGMOD Conference, pp. 529–540 (2013)

  26. Tous, R., Gounaris, A., Tripiana, C., Torres, J., Girona, S., Ayguade, E., Labarta, J., Becerra, Y., Carrera, D., Valero, M.: Spark deployment and performance evaluation on the marenostrum supercomputer. In: IEEE BigData (2015)

  27. Vitorovic, A., Elseidy, M., Koch, C.: Load balancing and skew resilience for parallel joins. In: Proceedings of the ICDE (2016)

  28. Yan, K., Zhu, H.: Two MRJs for multi-way theta-join in mapreduce. In: Yan, K., Zhu, H. (eds.) Internet and Distributed Computing Systems, pp. 321–332. Springer, New York (2013)

    Chapter  Google Scholar 

  29. Zhang, C., Li, J., Wu, L.: Optimizing theta-joins in a mapreduce environment. Int. J. Database Theory Appl. 6(4), 91–107 (2013)

    Google Scholar 

  30. Zhang, X., Chen, L., Wang, M.: Efficient multi-way theta-join processing using mapreduce. PVLDB 5(11), 1184–1195 (2012)

    Google Scholar 

Download references

Acknowledgements

We would like to thank Jordi Torres, Rubèn Tous and Carlos Tripiana from the Barcelona Supercomputing Center for their help in running the Spark experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anastasios Gounaris.

Appendix: Additional evaluation results

Appendix: Additional evaluation results

Table 7 Summary improvements due to the re-arrangement policies grouped by the number of reducers (for band queries)
Table 8 Summary improvements due to the re-arrangement policies grouped by the number of bands (for band queries)
Table 9 Summary improvements due to the re-arrangement policies grouped by the number of reducers (for band queries)
Table 10 Summary improvements due to the re-arrangement policies grouped by the number of reducers (for random queries)

Tables 7 and 8 refer to the experiments in Sect. 6.3 for the band queries on solar altitude. Table 8 presents the same results as Table 7 but groups the experiments differently to show the impact of selectivity (in terms of number of bands). Although the behavior differs according to the number of bands, the impact of selectivity is considered to be small. The second column (coverage) shows the percentage of the cases, where an improvement on M-Bucket-I is achieved by any technique, i.e., it answers the question “How frequently matrix re-arrangement leads to improvements?”, whereas the other columns answer the question “How high are the improvements when they happen?”. Further observations that can be drawn are: (i) The higher the number of reducers, the less frequently matrix re-arrangement yields improvements. (ii) The benefits on OF values due to the re-arrangement techniques may come at the expense of a small degradation of imbalance, as shown in the last column, but in general imb is not affected much. (iii) There are several cases where the improvement is very small or negligible. Table 9 shows the corresponding details for the band queries on longitude, where the best improvements on mrcl are up to 44%. Table 10 shows the impact of the re-arrangement techniques on the OFs for random queries with \(100 \times 100\) JM. The main observation is that, compared to Table 7, both the coverage and the improvements are higher; e.g., we have observed reductions in rep by 74% (i.e., nearly 4 times less) and in mrcl by 56%. For random queries with \(200 \times 200\) JMs, the improvements are of lower magnitude but the coverage is 88% (no detailed results are presented).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Koumarelas, I., Naskos, A. & Gounaris, A. Flexible partitioning for selective binary theta-joins in a massively parallel setting. Distrib Parallel Databases 36, 301–337 (2018). https://doi.org/10.1007/s10619-017-7214-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-017-7214-0

Keywords

Navigation