Advertisement

Boosting MapReduce with Network-Aware Task Assignment

Conference paper
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 133)

Abstract

Running MapReduce in a shared cluster has become a recent trend to process large-scale data analytical applications while improving the cluster utilization. However, the network sharing among various applications can make the network bandwidth for MapReduce applications constrained and heterogeneous. This further increases the severity of network hotspots in racks, and makes existing task assignment policies which focus on the data locality no longer effective. To deal with this issue, this paper develops a model to analyze the relationship between job completion time and the assignment of both map and reduce tasks across racks. We further design a network-aware task assignment strategy to shorten the completion time of MapReduce jobs in shared clusters. It integrates two simple yet effective greedy heuristics that minimize the completion time of map phase and reduce phase, respectively. With large-scale simulations driven by Facebook job traces, we demonstrate that the network-aware strategy can shorten the average completion time of MapReduce jobs, as compared to the state-of-the-art task assignment strategies, yet with an acceptable computational overhead.

Keywords

MapReduce Task assignment Network hotspots 

References

  1. 1.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of OSDI, December 2004Google Scholar
  2. 2.
    Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Proceedings of OSDI, December 2008Google Scholar
  3. 3.
    Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R.H., Shenker, S., Stoica, I.: Mesos: a platform for fine-grained resource sharing in the data center. In: Proceedings of NSDI, March 2011Google Scholar
  4. 4.
    Palanisamy, B., Singh, A., Liu, L., Jain, B.: Purlieus: locality-aware resource allocation for MapReduce in a cloud. In: Proceedings of SC, November 2011Google Scholar
  5. 5.
    Ballani, H., Jang, K., Karagiannis, T., Kim, C., Gunawardena, D., O’Shea, G.: Chatty tenants and the cloud network sharing problem. In: Proceedings of NSDI, April 2013Google Scholar
  6. 6.
    Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in Map-Reduce clusters using mantri. In: Proceedings of OSDI, October 2010Google Scholar
  7. 7.
    Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of Eurosys, April 2010Google Scholar
  8. 8.
    Hammoud, M., Sakr, M.F.: Locality-aware reduce task scheduling for MapReduce. In: Proceedings of CloudCom, November 2011Google Scholar
  9. 9.
    Chen, Y., Ganapathi, A., Griffith, R., Katz, R.: The case for evaluating MapReduce performance using workload suites. In: Proceedings of MASCOTS, July 2011Google Scholar
  10. 10.
    Jalaparti, V., Ballani, H., Costa, P., Karagiannis, T., Rowstron, A.: Bridging the tenant-provider gap in cloud services. In: Proceedings of SOCC, October 2012Google Scholar
  11. 11.
    Aora, S., Puri, M.C.: A variant of time minimizing assignment problem. Eur. J. Oper. Res. 110(2), 314–325 (1998)CrossRefGoogle Scholar
  12. 12.
    Chen, F., Kodialam, M., Lakshman, T.V.: Joint scheduling of processing and shuffle phases in MapReduce Systems. In: Proceedings of Infocom, March 2012Google Scholar
  13. 13.
    Guo, Z., Fox, G., Zhou, M.: Investigation of data locality in MapReduce. In: Proceedings of CCGrid, May 2012Google Scholar

Copyright information

© Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2014

Authors and Affiliations

  1. 1.Services Computing Technology and System Lab, Cluster and Grid Computing LabSchool of Computer Science and Technology, Huazhong University of Science and TechnologyWuhanChina

Personalised recommendations