Skip to main content

Improving MapReduce Performance through Complexity and Performance Based Data Placement in Heterogeneous Hadoop Clusters

  • Conference paper
Distributed Computing and Internet Technology (ICDCIT 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7753))

Abstract

MapReduce has emerged as an important programming model with clusters having tens of thousands of nodes. Hadoop, an open source implementation of MapReduce may contain various nodes which are heterogeneous in their computing capacity for various reasons. It is important for the data placement algorithms to partition the input and intermediate data based on the computing capacities of the nodes in the cluster. We propose several enhancements to data placing algorithms in Hadoop such that the load is distributed across the nodes evenly. In this work, we propose two techniques to measure the computing capacities of the nodes. Secondly, we propose improvements to the input data distribution algorithm based on the map and reduce function complexities and the measured heterogeneity of nodes. Finally, we evaluate the improvement of the MapReduce performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mustafa Rafique, M., Rose, B., Butt, A.R., Nikolopoulos, D.S.: Supporting MapReduce on Large-Scale Asymmetric Multi-Core Clusters

    Google Scholar 

  2. Babu, S.: Towards automatic optimization of MapReduce programs. In: Proc. SoCC, pp. 137–142 (2010)

    Google Scholar 

  3. Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., Goldberg, A.: Quincy: Fair Scheduling for Distributed Computing Clusters

    Google Scholar 

  4. Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., Qin, X.: Improving MapReduce performance through data placement in heterogeneous Hadoop clusters. In: Proc. IPDPS Workshops, pp. 1–9 (2010)

    Google Scholar 

  5. Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Handling Data Skew in MapReduce. In: Proc. CLOSER, pp. 574–583 (2011)

    Google Scholar 

  6. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI 2004: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA (December 2004)

    Google Scholar 

  7. Hadoop, http://hadoop.apache.org/

  8. Hadoop Single Node Setup, http://hadoop.apache.org/common/docs/r1.0.1/single_node_setup.html

  9. Hadoop Cluster Setup, http://hadoop.apache.org/common/docs/r1.0.1/cluster_setup.html

  10. Configuring Eclipse for Hadoop Development (a screencast), http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/

  11. Amazon AWS, https://console.aws.amazon.com/console/home

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Arasanal, R.M., Rumani, D.U. (2013). Improving MapReduce Performance through Complexity and Performance Based Data Placement in Heterogeneous Hadoop Clusters. In: Hota, C., Srimani, P.K. (eds) Distributed Computing and Internet Technology. ICDCIT 2013. Lecture Notes in Computer Science, vol 7753. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36071-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36071-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36070-1

  • Online ISBN: 978-3-642-36071-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics