Abstract
As GPUs, ARM CPUs and even FPGAs are widely used in modern computing, a data center gradually develops towards the heterogeneous clusters. However, many well-known programming models such as MapReduce are designed for homogeneous clusters and have poor performance in heterogeneous environments. In this paper, we reconsider the problem and make four contributions: (1) We analyse the causes of MapReduce poor performance in heterogeneous clusters, and the most important one is unreasonable task allocation between nodes with different computing ability. (2) Based on this, we propose MrHeter, which separates MapReduce process into map-shuffle stage and reduce stage, then constructs optimization model separately for them and gets different task allocation \(ml_{ij}, mr_{ij}, r_{ij}\) for heterogeneous nodes based on computing ability.(3) In order to make it suitable for dynamic execution, we propose D-MrHeter, which includes monitor and feedback mechanism. (4) Finally, we prove that MrHeter and D-MrHeter can greatly decrease total execution time of MapReduce from 30 to 70 % in heterogeneous cluster comparing with original Hadoop, having better performance especially in the condition of heavy-workload and large-difference between nodes computing ability.
Similar content being viewed by others
References
Apache hadoop [online]. http://lucene.apache.org/hadoop/ (2013)
Amazon ec2. http://aws.amazon.com/
Cubie truck. http://forum.cubietech.com/
A.S. foundation. terasort example. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html (2014)
raspberry pi. https://www.raspberrypi.org/
A.S. foundation. wordcount example. http://wiki.apache.org/hadoop/WordCount (2014)
A.S. foundation. grep example. http://wiki.apache.org/hadoop/Grep (2014)
Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Effective straggler mitigation: attack of the clones. NSDI 13, 185–198 (2013)
Ananthanarayanan, G., Kandula, S., Greenberg, A.G., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: OSDI, vol. 10, pp. 24 (2010)
Babu, S.: Towards automatic optimization of MapReduce programs. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC ’10, pp. 137–142, New York, NY, USA. ACM (2010)
Babu, S., Hamamoto, H., Dong, F.: MapReduce programming and cost-based optimization? crossing this chasm with starfish. Proc. VLDB Endow. 4, 1446–1449 (2011)
Chen, G., Wu, Y., Wu, J., Zheng, W.: Topcluster: a hybrid cluster model to support dynamic deployment in grid. J. Comput. Syst. Sci. 79(2), 201–215 (2013)
Chen, Q., Yao, J., Xiao, Z.: Libra: lightweight data skew mitigation in MapReduce. IEEE Trans. Parallel Distrib. Syst. 26(9), 2520–2533 (2015)
Cheng, D., Rao, J., Guo, Y., Zhou, X.: Improving MapReduce performance in heterogeneous environments with adaptive task tuning. In: Proceedings of the 15th International Middleware Conference, Middleware ’14, pp. 97–108, New York, NY, USA. ACM (2014)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of MapReduce programs. Proc. VLDB Endow. 4(11), 1111–1122 (2011)
Hoelzle, U., Barroso, L.A.: The Datacenter As a Computer: An Introduction to the Design of Warehouse-Scale Machines, 1st edn. Morgan & Claypool Publishers, San Rafael (2009)
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. SIGOPS Oper. Syst. Rev. 41(3), 59–72 (2007)
Janapa Reddi, V., Lee, B.C., Chilimbi, T., Vaid, K.: Web search using mobile cores: quantifying and mitigating the price of efficiency. ACM Sigarch Comput. Archit. News 38(3), 314–325 (2010)
Gantz, J., Reinsel, D.: Extracting values from chaos. In: I.D.Corporation (2011)
Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: Skew-resistant parallel processing of feature-extracting scientific user-defined functions. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC ’10, pp. 75–86, New York, NY, USA. ACM (2010)
Lama, P., Zhou, X.: Aroma: Automated resource allocation and configuration of MapReduce environment in the cloud. In: Proceedings of the 9th International Conference on Autonomic Computing, pp. 63–72. ACM (2012)
Li, M., Zeng, L., Meng, S., Tan, J., Zhang, L., Butt, A.R., Fuller, N.: Mronline: MapReduce online performance tuning. In: Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC ’14, pp. 165–176, New York, NY, USA. ACM (2014)
Neill, R., Carloni, L.P., Shabarshin, A., Sigaev, V., Tcherepanov, S.: Embedded processor virtualization for broadband grid computing. In: Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing, pp. 145–156. IEEE Computer Society (2011)
Neill, R., Shabarshin, A., Carloni, L.P.: A heterogeneous parallel system running open mpi on a broadband network of embedded set-top devices. In: Proceedings of the 7th ACM international conference on Computing frontiers, pp. 187–196. ACM (2010)
Papadimitriou, C.H., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Inc., Englewood Cliffs (1982)
Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P.K., Currey, J.: Dryadlinq: A system for general-purpose distributed data-parallel computing using a high-level language. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pp. 1–14, Berkeley, CA, USA. USENIX Association (2008)
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: OSDI, vol. 8, pp. 7 (2008)
Zhao, X., Kang, K., Sun, Y., Song, Y., Xu, M., Pan, T.: Insight and reduction of mapreduce stragglers in heterogeneous environment. In: IEEE International Conference on Cluster Computing (CLUSTER), 2013, pp. 1–8 (2013)
Zhou, J., Bruno, N., Wu, M.-C., Larson, P.-A., Chaiken, R., Shakib, D.: Scope: parallel databases meet MapReduce. VLDB J. 21(5), 611–636 (2012)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, X., Wu, Y. & Zhao, C. MrHeter: improving MapReduce performance in heterogeneous environments. Cluster Comput 19, 1691–1701 (2016). https://doi.org/10.1007/s10586-016-0625-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-016-0625-2