Skip to main content

Efficient Processing of Multi-way Joins Using MapReduce

  • Conference paper
Intelligent Computation in Big Data Era (ICYCSEE 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 503))

Abstract

Multi-way join is critical for many big data applications such as data mining and knowledge discovery. Even though lots of research have been devoted to processing multi-way joins using MapReduce, there are still several problems in general to be further improved, such as transferring numerous unpromising intermediate data and lacking of better coordination mechanisms. This work proposes an efficient multi-way joins processing model using MapReduce, named Sharing-Coordination-MapReduce (SC-MapReduce), which has the functions of sharing and coordination. Our SC-MapReduce model can filter the unpromising intermediate data largely by using the sharing mechanism and optimize the multiple tasks coordination of multi-way joins. Extensive experiments show that the proposed model is efficient, robust and scalable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM (CACM) 51(1), 107–113 (2008)

    Article  Google Scholar 

  2. Okcan, A., Riedewald, M.: Processing theta-joins using MapReduce. In: SIGMOD, pp. 949–960 (2011)

    Google Scholar 

  3. Afrati, F.N., Ullman, J.D.: Optimizing Multiway Joins in a Map-Reduce Environment. IEEE Trans. Knowl. Data Eng (TKDE) 23(9), 1282–1298 (2011)

    Article  Google Scholar 

  4. Zhang, X., Chen, L., Wang, M.: Efficient Multi-way Theta-Join Processing Using MapReduce. PVLDB 5(11), 1184–1195 (2012)

    MathSciNet  Google Scholar 

  5. Pansare, N., Borkar, V.R., Jermaine, C., Condie, T.: Online Aggregation for Large MapReduce Jobs. PVLDB 4(11), 1135–1145 (2011)

    Google Scholar 

  6. Okcan, A., Riedewald, M.: Processing theta-joins using MapReduce. In: SIGMOD, pp. 949–960 (2011)

    Google Scholar 

  7. Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: EDBT, pp. 99–110 (2010)

    Google Scholar 

  8. Vernica, R., Carey, M.J., Li, C.: Efficient parallel set-similarity joins using MapReduce. In: SIGMOD, pp. 495–506 (2010)

    Google Scholar 

  9. Jiang, D., Tung, A.K.H., Chen, G.: MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters. IEEE Trans. Knowl. Data Eng (TKDE) 23(9), 1299–1311 (2011)

    Article  Google Scholar 

  10. Fries, S., Boden, B., et al.: PHiDJ: Parallel similarity self-join for high-dimensional vector data with MapReduce. In: ICDE, pp. 796–807 (2014)

    Google Scholar 

  11. Ma, Y., Meng, X.: Set similarity join on massive probabilistic data using MapReduce. Distributed and Parallel Databases (DPD) 32(3), 447–464 (2014)

    Article  Google Scholar 

  12. Lee, T., Bae, H.-C., et al.: Join processing with threshold-based filtering in MapReduce. The Journal of Supercomputing (TJS) 69(2), 793–813 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ding, L., Liu, S., Liu, Y., Liu, A., Song, B. (2015). Efficient Processing of Multi-way Joins Using MapReduce. In: Wang, H., et al. Intelligent Computation in Big Data Era. ICYCSEE 2015. Communications in Computer and Information Science, vol 503. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46248-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-46248-5_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-46247-8

  • Online ISBN: 978-3-662-46248-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics