Skip to main content

Abstract

In recent years, Map-Reduce systems have grown into leading solution for processing large volumes of data. Often, in order to minimize the execution time, the developers express their programs using procedural language instead of high-level query language. In such cases one has full control over the program execution, what can lead to several problems, especially when join operation is concerned. In the literature the wide range of join techniques has been proposed, although many of them cannot be easily classified using old Map-Side/Reduce-Side distinction. The main goal of this paper is to propose the taxonomy of the existing join algorithms and provide their evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache hive reference. https://hive.apache.org/

  2. Apache pig reference. http://pig.apache.org/

  3. Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 99–110 (2010)

    Google Scholar 

  4. Atta, F., Viglas, S., Niazi, S.: SAND join - a skew handling join algorithm for google’s mapreduce framework. In: 2011 IEEE 14th International Multitopic Conference (INMIC), pp. 170–175, December 2011

    Google Scholar 

  5. Atta, F.: Implementation and Analysis of Join Algorithms to handle skew for the Hadoop Map/Reduce Framework. Master’s thesis, University of Edinburgh (2010)

    Google Scholar 

  6. Balazinska, M., Howe, B., Kwon, Y., Ren, K.: Managing skew in hadoop. IEEE Data Eng. Bull. 36(1), 24–33 (2013)

    Google Scholar 

  7. Chandar, J.: Join Algorithms using Map/Reduce. Master’s thesis, University of Edinburgh (2010)

    Google Scholar 

  8. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Mag. Commun. ACM - 50th anniversary issue: 1958–2008 51(1), 107–113 (2008)

    Article  Google Scholar 

  9. Ercegovac, V., Blanas, S.: A comparison of join algorithms for log processing in mapreduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 975–986 (2010)

    Google Scholar 

  10. Karloff, H., Suri, S., Vassilvitskii, S.: A model of computation for MapReduce, pp. 938–948 (2010)

    Google Scholar 

  11. Lee, T., Kim, K., Kim, H.J.: Join processing using bloom filter in mapreduce. In: Proceedings of the 2012 ACM Research in Applied Computation Symposium, RACS 2012, pp. 100–105. ACM, New York (2012). http://doi.acm.org/10.1145/2401603.2401626

  12. Li, J., Wu, L., Zhang, C.: Optimizing theta-joins in a mapreduce environment. Int. J. Database Theory Appl. 6, 91–108 (2013)

    Google Scholar 

  13. Luo, G., Dong, L.: Adaptive join plan generation in hadoop

    Google Scholar 

  14. Miner, D., Shook, A.: MapReduce Design Patterns. O’Reilly, Beijing (2013). http://opac.inria.fr/record=b1134500, dEBSZ

    Google Scholar 

  15. Okcan, A., Riedewald, M.: Processing theta-joins using mapreduce. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 949–960 (2011)

    Google Scholar 

  16. Palla, K.: A Comparative Analysis of Join Algorithms Using the Hadoop Map/Reduce Framework. Master’s thesis, University of Edinburgh (2009)

    Google Scholar 

  17. Pigul, A.: Generalized Parallel Join Algorithms and Designing Cost Models (2012)

    Google Scholar 

  18. White, T.: Hadoop: The Definitive Guide, chap. 8, 3rd edn. O’reilly, Sebastopol (2012)

    Google Scholar 

  19. Zhang, X., Chen, L., Wang, M.: Efficient multiway theta-join processing using mapreduce. In: Proceedings of the VLDB Endowment (PVLDB), vol. 5(11), pp. 1184–1195 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Maciej Penar or Artur Wilczek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Penar, M., Wilczek, A. (2016). The Evaluation of Map-Reduce Join Algorithms. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-34099-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-34098-2

  • Online ISBN: 978-3-319-34099-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics