Abstract
MapReduce has emerged as an important programming model with clusters having tens of thousands of nodes. Hadoop, an open source implementation of MapReduce may contain various nodes which are heterogeneous in their computing capacity for various reasons. It is important for the data placement algorithms to partition the input and intermediate data based on the computing capacities of the nodes in the cluster. We propose several enhancements to data placing algorithms in Hadoop such that the load is distributed across the nodes evenly. In this work, we propose two techniques to measure the computing capacities of the nodes. Secondly, we propose improvements to the input data distribution algorithm based on the map and reduce function complexities and the measured heterogeneity of nodes. Finally, we evaluate the improvement of the MapReduce performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Mustafa Rafique, M., Rose, B., Butt, A.R., Nikolopoulos, D.S.: Supporting MapReduce on Large-Scale Asymmetric Multi-Core Clusters
Babu, S.: Towards automatic optimization of MapReduce programs. In: Proc. SoCC, pp. 137–142 (2010)
Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., Goldberg, A.: Quincy: Fair Scheduling for Distributed Computing Clusters
Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., Qin, X.: Improving MapReduce performance through data placement in heterogeneous Hadoop clusters. In: Proc. IPDPS Workshops, pp. 1–9 (2010)
Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Handling Data Skew in MapReduce. In: Proc. CLOSER, pp. 574–583 (2011)
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI 2004: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA (December 2004)
Hadoop, http://hadoop.apache.org/
Hadoop Single Node Setup, http://hadoop.apache.org/common/docs/r1.0.1/single_node_setup.html
Hadoop Cluster Setup, http://hadoop.apache.org/common/docs/r1.0.1/cluster_setup.html
Configuring Eclipse for Hadoop Development (a screencast), http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/
Amazon AWS, https://console.aws.amazon.com/console/home
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Arasanal, R.M., Rumani, D.U. (2013). Improving MapReduce Performance through Complexity and Performance Based Data Placement in Heterogeneous Hadoop Clusters. In: Hota, C., Srimani, P.K. (eds) Distributed Computing and Internet Technology. ICDCIT 2013. Lecture Notes in Computer Science, vol 7753. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36071-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-36071-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36070-1
Online ISBN: 978-3-642-36071-8
eBook Packages: Computer ScienceComputer Science (R0)