Skip to main content
Log in

MrHeter: improving MapReduce performance in heterogeneous environments

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

As GPUs, ARM CPUs and even FPGAs are widely used in modern computing, a data center gradually develops towards the heterogeneous clusters. However, many well-known programming models such as MapReduce are designed for homogeneous clusters and have poor performance in heterogeneous environments. In this paper, we reconsider the problem and make four contributions: (1) We analyse the causes of MapReduce poor performance in heterogeneous clusters, and the most important one is unreasonable task allocation between nodes with different computing ability. (2) Based on this, we propose MrHeter, which separates MapReduce process into map-shuffle stage and reduce stage, then constructs optimization model separately for them and gets different task allocation \(ml_{ij}, mr_{ij}, r_{ij}\) for heterogeneous nodes based on computing ability.(3) In order to make it suitable for dynamic execution, we propose D-MrHeter, which includes monitor and feedback mechanism. (4) Finally, we prove that MrHeter and D-MrHeter can greatly decrease total execution time of MapReduce from 30 to 70 % in heterogeneous cluster comparing with original Hadoop, having better performance especially in the condition of heavy-workload and large-difference between nodes computing ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Apache hadoop [online]. http://lucene.apache.org/hadoop/ (2013)

  2. Amazon ec2. http://aws.amazon.com/

  3. Cubie truck. http://forum.cubietech.com/

  4. A.S. foundation. terasort example. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html (2014)

  5. raspberry pi. https://www.raspberrypi.org/

  6. A.S. foundation. wordcount example. http://wiki.apache.org/hadoop/WordCount (2014)

  7. A.S. foundation. grep example. http://wiki.apache.org/hadoop/Grep (2014)

  8. Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Effective straggler mitigation: attack of the clones. NSDI 13, 185–198 (2013)

    Google Scholar 

  9. Ananthanarayanan, G., Kandula, S., Greenberg, A.G., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: OSDI, vol. 10, pp. 24 (2010)

  10. Babu, S.: Towards automatic optimization of MapReduce programs. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC ’10, pp. 137–142, New York, NY, USA. ACM (2010)

  11. Babu, S., Hamamoto, H., Dong, F.: MapReduce programming and cost-based optimization? crossing this chasm with starfish. Proc. VLDB Endow. 4, 1446–1449 (2011)

    Google Scholar 

  12. Chen, G., Wu, Y., Wu, J., Zheng, W.: Topcluster: a hybrid cluster model to support dynamic deployment in grid. J. Comput. Syst. Sci. 79(2), 201–215 (2013)

  13. Chen, Q., Yao, J., Xiao, Z.: Libra: lightweight data skew mitigation in MapReduce. IEEE Trans. Parallel Distrib. Syst. 26(9), 2520–2533 (2015)

    Article  Google Scholar 

  14. Cheng, D., Rao, J., Guo, Y., Zhou, X.: Improving MapReduce performance in heterogeneous environments with adaptive task tuning. In: Proceedings of the 15th International Middleware Conference, Middleware ’14, pp. 97–108, New York, NY, USA. ACM (2014)

  15. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  16. Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of MapReduce programs. Proc. VLDB Endow. 4(11), 1111–1122 (2011)

    Google Scholar 

  17. Hoelzle, U., Barroso, L.A.: The Datacenter As a Computer: An Introduction to the Design of Warehouse-Scale Machines, 1st edn. Morgan & Claypool Publishers, San Rafael (2009)

  18. Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. SIGOPS Oper. Syst. Rev. 41(3), 59–72 (2007)

    Article  Google Scholar 

  19. Janapa Reddi, V., Lee, B.C., Chilimbi, T., Vaid, K.: Web search using mobile cores: quantifying and mitigating the price of efficiency. ACM Sigarch Comput. Archit. News 38(3), 314–325 (2010)

  20. Gantz, J., Reinsel, D.: Extracting values from chaos. In: I.D.Corporation (2011)

  21. Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: Skew-resistant parallel processing of feature-extracting scientific user-defined functions. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC ’10, pp. 75–86, New York, NY, USA. ACM (2010)

  22. Lama, P., Zhou, X.: Aroma: Automated resource allocation and configuration of MapReduce environment in the cloud. In: Proceedings of the 9th International Conference on Autonomic Computing, pp. 63–72. ACM (2012)

  23. Li, M., Zeng, L., Meng, S., Tan, J., Zhang, L., Butt, A.R., Fuller, N.: Mronline: MapReduce online performance tuning. In: Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC ’14, pp. 165–176, New York, NY, USA. ACM (2014)

  24. Neill, R., Carloni, L.P., Shabarshin, A., Sigaev, V., Tcherepanov, S.: Embedded processor virtualization for broadband grid computing. In: Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing, pp. 145–156. IEEE Computer Society (2011)

  25. Neill, R., Shabarshin, A., Carloni, L.P.: A heterogeneous parallel system running open mpi on a broadband network of embedded set-top devices. In: Proceedings of the 7th ACM international conference on Computing frontiers, pp. 187–196. ACM (2010)

  26. Papadimitriou, C.H., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Inc., Englewood Cliffs (1982)

  27. Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P.K., Currey, J.: Dryadlinq: A system for general-purpose distributed data-parallel computing using a high-level language. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pp. 1–14, Berkeley, CA, USA. USENIX Association (2008)

  28. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: OSDI, vol. 8, pp. 7 (2008)

  29. Zhao, X., Kang, K., Sun, Y., Song, Y., Xu, M., Pan, T.: Insight and reduction of mapreduce stragglers in heterogeneous environment. In: IEEE International Conference on Cluster Computing (CLUSTER), 2013, pp. 1–8 (2013)

  30. Zhou, J., Bruno, N., Wu, M.-C., Larson, P.-A., Chaiken, R., Shakib, D.: Scope: parallel databases meet MapReduce. VLDB J. 21(5), 611–636 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Wu, Y. & Zhao, C. MrHeter: improving MapReduce performance in heterogeneous environments. Cluster Comput 19, 1691–1701 (2016). https://doi.org/10.1007/s10586-016-0625-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-016-0625-2

Keywords

Navigation