Cluster Computing

, Volume 19, Issue 4, pp 1691–1701 | Cite as

MrHeter: improving MapReduce performance in heterogeneous environments

  • Xiao ZhangEmail author
  • Yanjun Wu
  • Chen Zhao


As GPUs, ARM CPUs and even FPGAs are widely used in modern computing, a data center gradually develops towards the heterogeneous clusters. However, many well-known programming models such as MapReduce are designed for homogeneous clusters and have poor performance in heterogeneous environments. In this paper, we reconsider the problem and make four contributions: (1) We analyse the causes of MapReduce poor performance in heterogeneous clusters, and the most important one is unreasonable task allocation between nodes with different computing ability. (2) Based on this, we propose MrHeter, which separates MapReduce process into map-shuffle stage and reduce stage, then constructs optimization model separately for them and gets different task allocation \(ml_{ij}, mr_{ij}, r_{ij}\) for heterogeneous nodes based on computing ability.(3) In order to make it suitable for dynamic execution, we propose D-MrHeter, which includes monitor and feedback mechanism. (4) Finally, we prove that MrHeter and D-MrHeter can greatly decrease total execution time of MapReduce from 30 to 70 % in heterogeneous cluster comparing with original Hadoop, having better performance especially in the condition of heavy-workload and large-difference between nodes computing ability.


MapReduce Heterogeneous cluster Scheduling Performance 


  1. 1.
    Apache hadoop [online]. (2013)
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
    A.S. foundation. wordcount example. (2014)
  7. 7.
    A.S. foundation. grep example. (2014)
  8. 8.
    Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Effective straggler mitigation: attack of the clones. NSDI 13, 185–198 (2013)Google Scholar
  9. 9.
    Ananthanarayanan, G., Kandula, S., Greenberg, A.G., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: OSDI, vol. 10, pp. 24 (2010)Google Scholar
  10. 10.
    Babu, S.: Towards automatic optimization of MapReduce programs. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC ’10, pp. 137–142, New York, NY, USA. ACM (2010)Google Scholar
  11. 11.
    Babu, S., Hamamoto, H., Dong, F.: MapReduce programming and cost-based optimization? crossing this chasm with starfish. Proc. VLDB Endow. 4, 1446–1449 (2011)Google Scholar
  12. 12.
    Chen, G., Wu, Y., Wu, J., Zheng, W.: Topcluster: a hybrid cluster model to support dynamic deployment in grid. J. Comput. Syst. Sci. 79(2), 201–215 (2013)Google Scholar
  13. 13.
    Chen, Q., Yao, J., Xiao, Z.: Libra: lightweight data skew mitigation in MapReduce. IEEE Trans. Parallel Distrib. Syst. 26(9), 2520–2533 (2015)CrossRefGoogle Scholar
  14. 14.
    Cheng, D., Rao, J., Guo, Y., Zhou, X.: Improving MapReduce performance in heterogeneous environments with adaptive task tuning. In: Proceedings of the 15th International Middleware Conference, Middleware ’14, pp. 97–108, New York, NY, USA. ACM (2014)Google Scholar
  15. 15.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  16. 16.
    Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of MapReduce programs. Proc. VLDB Endow. 4(11), 1111–1122 (2011)Google Scholar
  17. 17.
    Hoelzle, U., Barroso, L.A.: The Datacenter As a Computer: An Introduction to the Design of Warehouse-Scale Machines, 1st edn. Morgan & Claypool Publishers, San Rafael (2009)Google Scholar
  18. 18.
    Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. SIGOPS Oper. Syst. Rev. 41(3), 59–72 (2007)CrossRefGoogle Scholar
  19. 19.
    Janapa Reddi, V., Lee, B.C., Chilimbi, T., Vaid, K.: Web search using mobile cores: quantifying and mitigating the price of efficiency. ACM Sigarch Comput. Archit. News 38(3), 314–325 (2010)Google Scholar
  20. 20.
    Gantz, J., Reinsel, D.: Extracting values from chaos. In: I.D.Corporation (2011)Google Scholar
  21. 21.
    Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: Skew-resistant parallel processing of feature-extracting scientific user-defined functions. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC ’10, pp. 75–86, New York, NY, USA. ACM (2010)Google Scholar
  22. 22.
    Lama, P., Zhou, X.: Aroma: Automated resource allocation and configuration of MapReduce environment in the cloud. In: Proceedings of the 9th International Conference on Autonomic Computing, pp. 63–72. ACM (2012)Google Scholar
  23. 23.
    Li, M., Zeng, L., Meng, S., Tan, J., Zhang, L., Butt, A.R., Fuller, N.: Mronline: MapReduce online performance tuning. In: Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC ’14, pp. 165–176, New York, NY, USA. ACM (2014)Google Scholar
  24. 24.
    Neill, R., Carloni, L.P., Shabarshin, A., Sigaev, V., Tcherepanov, S.: Embedded processor virtualization for broadband grid computing. In: Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing, pp. 145–156. IEEE Computer Society (2011)Google Scholar
  25. 25.
    Neill, R., Shabarshin, A., Carloni, L.P.: A heterogeneous parallel system running open mpi on a broadband network of embedded set-top devices. In: Proceedings of the 7th ACM international conference on Computing frontiers, pp. 187–196. ACM (2010)Google Scholar
  26. 26.
    Papadimitriou, C.H., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Inc., Englewood Cliffs (1982)Google Scholar
  27. 27.
    Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P.K., Currey, J.: Dryadlinq: A system for general-purpose distributed data-parallel computing using a high-level language. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pp. 1–14, Berkeley, CA, USA. USENIX Association (2008)Google Scholar
  28. 28.
    Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: OSDI, vol. 8, pp. 7 (2008)Google Scholar
  29. 29.
    Zhao, X., Kang, K., Sun, Y., Song, Y., Xu, M., Pan, T.: Insight and reduction of mapreduce stragglers in heterogeneous environment. In: IEEE International Conference on Cluster Computing (CLUSTER), 2013, pp. 1–8 (2013)Google Scholar
  30. 30.
    Zhou, J., Bruno, N., Wu, M.-C., Larson, P.-A., Chaiken, R., Shakib, D.: Scope: parallel databases meet MapReduce. VLDB J. 21(5), 611–636 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Haidian DistrictChina

Personalised recommendations