Skip to main content

Boosting MapReduce with Network-Aware Task Assignment

  • Conference paper
  • First Online:
  • 1533 Accesses

Abstract

Running MapReduce in a shared cluster has become a recent trend to process large-scale data analytical applications while improving the cluster utilization. However, the network sharing among various applications can make the network bandwidth for MapReduce applications constrained and heterogeneous. This further increases the severity of network hotspots in racks, and makes existing task assignment policies which focus on the data locality no longer effective. To deal with this issue, this paper develops a model to analyze the relationship between job completion time and the assignment of both map and reduce tasks across racks. We further design a network-aware task assignment strategy to shorten the completion time of MapReduce jobs in shared clusters. It integrates two simple yet effective greedy heuristics that minimize the completion time of map phase and reduce phase, respectively. With large-scale simulations driven by Facebook job traces, we demonstrate that the network-aware strategy can shorten the average completion time of MapReduce jobs, as compared to the state-of-the-art task assignment strategies, yet with an acceptable computational overhead.

The research was supported in part by a grant from National Natural Science Foundation of China (NSFC) under grant No.61133006.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Note that we fill up the available slots in racks before starting the next wave. Hence, the task computation time (\(w_{i}^{m}\tau _{m}\), \(w_{i}^{r}\tau _{r}\)) in Eq. (4) is fixed as \(\lceil p / \sum _{i \in \mathcal {R}}s_{i}^{m}\rceil \tau _{m}\), \(\lceil q / \sum _{i \in \mathcal {R}}s_{i}^{r}\rceil \tau _{r}\). It is omitted when calculating the phase makespan for simplicity.

References

  1. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of OSDI, December 2004

    Google Scholar 

  2. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Proceedings of OSDI, December 2008

    Google Scholar 

  3. Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R.H., Shenker, S., Stoica, I.: Mesos: a platform for fine-grained resource sharing in the data center. In: Proceedings of NSDI, March 2011

    Google Scholar 

  4. Palanisamy, B., Singh, A., Liu, L., Jain, B.: Purlieus: locality-aware resource allocation for MapReduce in a cloud. In: Proceedings of SC, November 2011

    Google Scholar 

  5. Ballani, H., Jang, K., Karagiannis, T., Kim, C., Gunawardena, D., O’Shea, G.: Chatty tenants and the cloud network sharing problem. In: Proceedings of NSDI, April 2013

    Google Scholar 

  6. Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in Map-Reduce clusters using mantri. In: Proceedings of OSDI, October 2010

    Google Scholar 

  7. Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of Eurosys, April 2010

    Google Scholar 

  8. Hammoud, M., Sakr, M.F.: Locality-aware reduce task scheduling for MapReduce. In: Proceedings of CloudCom, November 2011

    Google Scholar 

  9. Chen, Y., Ganapathi, A., Griffith, R., Katz, R.: The case for evaluating MapReduce performance using workload suites. In: Proceedings of MASCOTS, July 2011

    Google Scholar 

  10. Jalaparti, V., Ballani, H., Costa, P., Karagiannis, T., Rowstron, A.: Bridging the tenant-provider gap in cloud services. In: Proceedings of SOCC, October 2012

    Google Scholar 

  11. Aora, S., Puri, M.C.: A variant of time minimizing assignment problem. Eur. J. Oper. Res. 110(2), 314–325 (1998)

    Article  Google Scholar 

  12. Chen, F., Kodialam, M., Lakshman, T.V.: Joint scheduling of processing and shuffle phases in MapReduce Systems. In: Proceedings of Infocom, March 2012

    Google Scholar 

  13. Guo, Z., Fox, G., Zhou, M.: Investigation of data locality in MapReduce. In: Proceedings of CCGrid, May 2012

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fangming Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Xu, F., Liu, F., Zhu, D., Jin, H. (2014). Boosting MapReduce with Network-Aware Task Assignment. In: Leung, V., Chen, M. (eds) Cloud Computing. CloudComp 2013. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 133. Springer, Cham. https://doi.org/10.1007/978-3-319-05506-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05506-0_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05505-3

  • Online ISBN: 978-3-319-05506-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics