MrHeter: improving MapReduce performance in heterogeneous environments

Zhang, Xiao; Wu, Yanjun; Zhao, Chen

doi:10.1007/s10586-016-0625-2

MrHeter: improving MapReduce performance in heterogeneous environments

Published: 01 September 2016

Volume 19, pages 1691–1701, (2016)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Xiao Zhang¹,
Yanjun Wu¹ &
Chen Zhao¹

630 Accesses
24 Citations
Explore all metrics

Abstract

As GPUs, ARM CPUs and even FPGAs are widely used in modern computing, a data center gradually develops towards the heterogeneous clusters. However, many well-known programming models such as MapReduce are designed for homogeneous clusters and have poor performance in heterogeneous environments. In this paper, we reconsider the problem and make four contributions: (1) We analyse the causes of MapReduce poor performance in heterogeneous clusters, and the most important one is unreasonable task allocation between nodes with different computing ability. (2) Based on this, we propose MrHeter, which separates MapReduce process into map-shuffle stage and reduce stage, then constructs optimization model separately for them and gets different task allocation \(ml_{ij}, mr_{ij}, r_{ij}\) for heterogeneous nodes based on computing ability.(3) In order to make it suitable for dynamic execution, we propose D-MrHeter, which includes monitor and feedback mechanism. (4) Finally, we prove that MrHeter and D-MrHeter can greatly decrease total execution time of MapReduce from 30 to 70 % in heterogeneous cluster comparing with original Hadoop, having better performance especially in the condition of heavy-workload and large-difference between nodes computing ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Apache hadoop [online]. http://lucene.apache.org/hadoop/ (2013)
Amazon ec2. http://aws.amazon.com/
Cubie truck. http://forum.cubietech.com/
A.S. foundation. terasort example. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html (2014)
raspberry pi. https://www.raspberrypi.org/
A.S. foundation. wordcount example. http://wiki.apache.org/hadoop/WordCount (2014)
A.S. foundation. grep example. http://wiki.apache.org/hadoop/Grep (2014)
Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Effective straggler mitigation: attack of the clones. NSDI 13, 185–198 (2013)
Google Scholar
Ananthanarayanan, G., Kandula, S., Greenberg, A.G., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: OSDI, vol. 10, pp. 24 (2010)
Babu, S.: Towards automatic optimization of MapReduce programs. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC ’10, pp. 137–142, New York, NY, USA. ACM (2010)
Babu, S., Hamamoto, H., Dong, F.: MapReduce programming and cost-based optimization? crossing this chasm with starfish. Proc. VLDB Endow. 4, 1446–1449 (2011)
Google Scholar
Chen, G., Wu, Y., Wu, J., Zheng, W.: Topcluster: a hybrid cluster model to support dynamic deployment in grid. J. Comput. Syst. Sci. 79(2), 201–215 (2013)
Chen, Q., Yao, J., Xiao, Z.: Libra: lightweight data skew mitigation in MapReduce. IEEE Trans. Parallel Distrib. Syst. 26(9), 2520–2533 (2015)
Article Google Scholar
Cheng, D., Rao, J., Guo, Y., Zhou, X.: Improving MapReduce performance in heterogeneous environments with adaptive task tuning. In: Proceedings of the 15th International Middleware Conference, Middleware ’14, pp. 97–108, New York, NY, USA. ACM (2014)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of MapReduce programs. Proc. VLDB Endow. 4(11), 1111–1122 (2011)
Google Scholar
Hoelzle, U., Barroso, L.A.: The Datacenter As a Computer: An Introduction to the Design of Warehouse-Scale Machines, 1st edn. Morgan & Claypool Publishers, San Rafael (2009)
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. SIGOPS Oper. Syst. Rev. 41(3), 59–72 (2007)
Article Google Scholar
Janapa Reddi, V., Lee, B.C., Chilimbi, T., Vaid, K.: Web search using mobile cores: quantifying and mitigating the price of efficiency. ACM Sigarch Comput. Archit. News 38(3), 314–325 (2010)
Gantz, J., Reinsel, D.: Extracting values from chaos. In: I.D.Corporation (2011)
Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: Skew-resistant parallel processing of feature-extracting scientific user-defined functions. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC ’10, pp. 75–86, New York, NY, USA. ACM (2010)
Lama, P., Zhou, X.: Aroma: Automated resource allocation and configuration of MapReduce environment in the cloud. In: Proceedings of the 9th International Conference on Autonomic Computing, pp. 63–72. ACM (2012)
Li, M., Zeng, L., Meng, S., Tan, J., Zhang, L., Butt, A.R., Fuller, N.: Mronline: MapReduce online performance tuning. In: Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC ’14, pp. 165–176, New York, NY, USA. ACM (2014)
Neill, R., Carloni, L.P., Shabarshin, A., Sigaev, V., Tcherepanov, S.: Embedded processor virtualization for broadband grid computing. In: Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing, pp. 145–156. IEEE Computer Society (2011)
Neill, R., Shabarshin, A., Carloni, L.P.: A heterogeneous parallel system running open mpi on a broadband network of embedded set-top devices. In: Proceedings of the 7th ACM international conference on Computing frontiers, pp. 187–196. ACM (2010)
Papadimitriou, C.H., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Inc., Englewood Cliffs (1982)
Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P.K., Currey, J.: Dryadlinq: A system for general-purpose distributed data-parallel computing using a high-level language. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pp. 1–14, Berkeley, CA, USA. USENIX Association (2008)
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: OSDI, vol. 8, pp. 7 (2008)
Zhao, X., Kang, K., Sun, Y., Song, Y., Xu, M., Pan, T.: Insight and reduction of mapreduce stragglers in heterogeneous environment. In: IEEE International Conference on Cluster Computing (CLUSTER), 2013, pp. 1–8 (2013)
Zhou, J., Bruno, N., Wu, M.-C., Larson, P.-A., Chaiken, R., Shakib, D.: Scope: parallel databases meet MapReduce. VLDB J. 21(5), 611–636 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

No. 4, Zhongguancun South 4th Street, Haidian District, Beijing, China
Xiao Zhang, Yanjun Wu & Chen Zhao

Authors

Xiao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yanjun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Chen Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, X., Wu, Y. & Zhao, C. MrHeter: improving MapReduce performance in heterogeneous environments. Cluster Comput 19, 1691–1701 (2016). https://doi.org/10.1007/s10586-016-0625-2

Download citation

Received: 19 April 2016
Revised: 23 June 2016
Accepted: 22 August 2016
Published: 01 September 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10586-016-0625-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MrHeter: improving MapReduce performance in heterogeneous environments

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

Containerization technologies: taxonomies, applications and challenges

Performance improvement of the triangular matrix product in commodity clusters

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MrHeter: improving MapReduce performance in heterogeneous environments

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

Containerization technologies: taxonomies, applications and challenges

Performance improvement of the triangular matrix product in commodity clusters

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation