Robust and Efficient Large-Large Table Outer Joins on Distributed Infrastructures

  • Long Cheng
  • Spyros Kotoulas
  • Tomas E Ward
  • Georgios Theodoropoulos
Conference paper

DOI: 10.1007/978-3-319-09873-9_22

Volume 8632 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Cheng L., Kotoulas S., Ward T.E., Theodoropoulos G. (2014) Robust and Efficient Large-Large Table Outer Joins on Distributed Infrastructures. In: Silva F., Dutra I., Santos Costa V. (eds) Euro-Par 2014 Parallel Processing. Euro-Par 2014. Lecture Notes in Computer Science, vol 8632. Springer, Cham

Abstract

Outer joins are ubiquitous in many workloads but are sensitive to load-balancing problems. Current approaches mitigate such problems caused by data skew by using (partial) replication. However, contemporary replication-based approaches (1) introduce overhead, since they usually result in redundant data movement, (2) are sensitive to parameter tuning and value of data skew and (3) typically require that one side is small. In this paper, we propose a novel parallel algorithm, Redistribution and Efficient Query with Counters (REQC), aimed at robustness in terms of size of join sides, variation in skew and parameter tuning. Experimental results demonstrate that our algorithm is faster, more robust and less demanding in terms of network bandwidth, compared to the state-of-the-art.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Long Cheng
    • 1
    • 2
    • 3
  • Spyros Kotoulas
    • 2
  • Tomas E Ward
    • 1
  • Georgios Theodoropoulos
    • 4
  1. 1.National University of Ireland MaynoothIreland
  2. 2.IBM ResearchIreland
  3. 3.Technische Universität DresdenGermany
  4. 4.Durham UniversityUK