Improving MapReduce Performance through Complexity and Performance Based Data Placement in Heterogeneous Hadoop Clusters

Arasanal, Rajashekhar M.; Rumani, Daanish U.

doi:10.1007/978-3-642-36071-8_8

Rajashekhar M. Arasanal &
Daanish U. Rumani

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7753))

Included in the following conference series:

International Conference on Distributed Computing and Internet Technology

2556 Accesses
15 Citations

Abstract

MapReduce has emerged as an important programming model with clusters having tens of thousands of nodes. Hadoop, an open source implementation of MapReduce may contain various nodes which are heterogeneous in their computing capacity for various reasons. It is important for the data placement algorithms to partition the input and intermediate data based on the computing capacities of the nodes in the cluster. We propose several enhancements to data placing algorithms in Hadoop such that the load is distributed across the nodes evenly. In this work, we propose two techniques to measure the computing capacities of the nodes. Secondly, we propose improvements to the input data distribution algorithm based on the map and reduce function complexities and the measured heterogeneity of nodes. Finally, we evaluate the improvement of the MapReduce performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Mustafa Rafique, M., Rose, B., Butt, A.R., Nikolopoulos, D.S.: Supporting MapReduce on Large-Scale Asymmetric Multi-Core Clusters
Google Scholar
Babu, S.: Towards automatic optimization of MapReduce programs. In: Proc. SoCC, pp. 137–142 (2010)
Google Scholar
Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., Goldberg, A.: Quincy: Fair Scheduling for Distributed Computing Clusters
Google Scholar
Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., Qin, X.: Improving MapReduce performance through data placement in heterogeneous Hadoop clusters. In: Proc. IPDPS Workshops, pp. 1–9 (2010)
Google Scholar
Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Handling Data Skew in MapReduce. In: Proc. CLOSER, pp. 574–583 (2011)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI 2004: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA (December 2004)
Google Scholar
Hadoop, http://hadoop.apache.org/
Hadoop Single Node Setup, http://hadoop.apache.org/common/docs/r1.0.1/single_node_setup.html
Hadoop Cluster Setup, http://hadoop.apache.org/common/docs/r1.0.1/cluster_setup.html
Configuring Eclipse for Hadoop Development (a screencast), http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/
Amazon AWS, https://console.aws.amazon.com/console/home

Download references

Authors

Rajashekhar M. Arasanal
View author publications
You can also search for this author in PubMed Google Scholar
Daanish U. Rumani
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty Incharge, Information Processing and Business Intelligence Unit, Birla Institute of Technology and Science, Pilani, Hyderabad Campus, 500078, Hyderabad, Andhra Pradesh, India
Chittaranjan Hota
Department of Computer Science, Clemson University, 29634, Clemson, SC, USA
Pradip K. Srimani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arasanal, R.M., Rumani, D.U. (2013). Improving MapReduce Performance through Complexity and Performance Based Data Placement in Heterogeneous Hadoop Clusters. In: Hota, C., Srimani, P.K. (eds) Distributed Computing and Internet Technology. ICDCIT 2013. Lecture Notes in Computer Science, vol 7753. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36071-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-36071-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36070-1
Online ISBN: 978-3-642-36071-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics