The Evaluation of Map-Reduce Join Algorithms

Penar, Maciej; Wilczek, Artur

doi:10.1007/978-3-319-34099-9_14

Maciej Penar¹⁵ &
Artur Wilczek¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 613))

Included in the following conference series:

1206 Accesses

Abstract

In recent years, Map-Reduce systems have grown into leading solution for processing large volumes of data. Often, in order to minimize the execution time, the developers express their programs using procedural language instead of high-level query language. In such cases one has full control over the program execution, what can lead to several problems, especially when join operation is concerned. In the literature the wide range of join techniques has been proposed, although many of them cannot be easily classified using old Map-Side/Reduce-Side distinction. The main goal of this paper is to propose the taxonomy of the existing join algorithms and provide their evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Apache hive reference. https://hive.apache.org/
Apache pig reference. http://pig.apache.org/
Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 99–110 (2010)
Google Scholar
Atta, F., Viglas, S., Niazi, S.: SAND join - a skew handling join algorithm for google’s mapreduce framework. In: 2011 IEEE 14th International Multitopic Conference (INMIC), pp. 170–175, December 2011
Google Scholar
Atta, F.: Implementation and Analysis of Join Algorithms to handle skew for the Hadoop Map/Reduce Framework. Master’s thesis, University of Edinburgh (2010)
Google Scholar
Balazinska, M., Howe, B., Kwon, Y., Ren, K.: Managing skew in hadoop. IEEE Data Eng. Bull. 36(1), 24–33 (2013)
Google Scholar
Chandar, J.: Join Algorithms using Map/Reduce. Master’s thesis, University of Edinburgh (2010)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Mag. Commun. ACM - 50th anniversary issue: 1958–2008 51(1), 107–113 (2008)
Article Google Scholar
Ercegovac, V., Blanas, S.: A comparison of join algorithms for log processing in mapreduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 975–986 (2010)
Google Scholar
Karloff, H., Suri, S., Vassilvitskii, S.: A model of computation for MapReduce, pp. 938–948 (2010)
Google Scholar
Lee, T., Kim, K., Kim, H.J.: Join processing using bloom filter in mapreduce. In: Proceedings of the 2012 ACM Research in Applied Computation Symposium, RACS 2012, pp. 100–105. ACM, New York (2012). http://doi.acm.org/10.1145/2401603.2401626
Li, J., Wu, L., Zhang, C.: Optimizing theta-joins in a mapreduce environment. Int. J. Database Theory Appl. 6, 91–108 (2013)
Google Scholar
Luo, G., Dong, L.: Adaptive join plan generation in hadoop
Google Scholar
Miner, D., Shook, A.: MapReduce Design Patterns. O’Reilly, Beijing (2013). http://opac.inria.fr/record=b1134500, dEBSZ
Google Scholar
Okcan, A., Riedewald, M.: Processing theta-joins using mapreduce. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 949–960 (2011)
Google Scholar
Palla, K.: A Comparative Analysis of Join Algorithms Using the Hadoop Map/Reduce Framework. Master’s thesis, University of Edinburgh (2009)
Google Scholar
Pigul, A.: Generalized Parallel Join Algorithms and Designing Cost Models (2012)
Google Scholar
White, T.: Hadoop: The Definitive Guide, chap. 8, 3rd edn. O’reilly, Sebastopol (2012)
Google Scholar
Zhang, X., Chen, L., Wang, M.: Efficient multiway theta-join processing using mapreduce. In: Proceedings of the VLDB Endowment (PVLDB), vol. 5(11), pp. 1184–1195 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science and Management, Wroclaw University of Technology, Wybrzeże Wyspiańskiego 27, 50-370, Wrocław, Poland
Maciej Penar & Artur Wilczek

Authors

Maciej Penar
View author publications
You can also search for this author in PubMed Google Scholar
Artur Wilczek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Maciej Penar or Artur Wilczek .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Stanisław Kozielski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Dariusz Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Paweł Kasprowski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Bożena Małysiak-Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Daniel Kostrzewa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Penar, M., Wilczek, A. (2016). The Evaluation of Map-Reduce Join Algorithms. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-34099-9_14
Published: 28 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34098-2
Online ISBN: 978-3-319-34099-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics