OFScheduler: A Dynamic Network Optimizer for MapReduce in Heterogeneous Cluster

Article

Abstract

MapReduce is a popular programming paradigm in cloud computing due to its excellent scalability for processing large-scale data. However, MapReduce performs poorly in heterogeneous clusters. One of the reasons is that Hadoop’s built-in load balancing algorithm for Map function leads to excessive network traffic. We propose a new dynamic network optimizer called OFScheduler for heterogeneous clusters to relieve the network traffic during the execution of MapReduce jobs. The optimizer focuses on reducing bandwith competition, balancing the workload of network links and increasing bandwidth utilization. The proposed optimizer tags different types of traffic and utilize the Openflow to adjust transfers of flows dynamically. We instantiate a simulator and an OpenFlow testbed for evaluation. The simulation results demonstrate that the proposed optimizer has a significant effect on increasing bandwidth utilization and improving the performance of MapReduce by 24 ~ 63 % for most of jobs in a multi-path heterogeneous cluster. The experiment results show that the proposed optimizer can be deployed into a real environment.

Keywords

MapReduce OpenFlow Optimization Heterogeneous cluster 

Notes

Acknowledgments

This work was supported in part by the 863 Program of China (No. 2011AA01A202), the Doctoral Fund of Ministry of Education of China (No. 20100073120022), Natural Science Foundation of China (No. 61202025) and the STCSM (Grant No. 12ZR1414900). Yao Shen is the corresponding author.

References

  1. 1.
  2. 2.
    Ahmad, F., Chakradhar, S., Raghunathan, A., Vijaykumar, T.: Tarazu: optimizing mapreduce on heterogeneous clusters. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 61–74 (2012)Google Scholar
  3. 3.
    Ahmad, F., Lee, S., Thottethodi, M., Vijaykumar, T.: Puma: Purdue mapreduce benchmarks suite. http://web.ics.purdue.edu/fahmad/benchmarks.htm (2012)
  4. 4.
    Al-Fares, M., Radhakrishnan, S., Raghavan, B., Huang, N., Vahdat, A.: Hedera: dynamic flow scheduling for data center networks. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, pp. 19–19 (2010)Google Scholar
  5. 5.
    Ballani, H., Costa, P., Karagiannis, T., Rowstron, A.: Towards predictable datacenter networks. In: SIGCOMM-Computer Communication Review (2011)Google Scholar
  6. 6.
    Chaiken, R., Jenkins, B., Larson, P.Å., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: Scope: easy and efficient parallel processing of massive data sets. Proc. VLDB Endow. 1(2), 1265–1276 (2008)CrossRefGoogle Scholar
  7. 7.
    Chowdhury, M., Zaharia, M., Ma, J., Jordan, M., Stoica, I.: Managing data transfers in computer clusters with orchestra. SIGCOMM-Comput. Commun. Rev. 41(4), 98 (2011)CrossRefGoogle Scholar
  8. 8.
    Curtis, A., Kim, W., Yalagandula, P.: Mahout: low-overhead datacenter traffic management using end-host-based elephant detection. In: INFOCOM, 2011 Proceedings IEEE, pp. 1629–1637. IEEE (2011)Google Scholar
  9. 9.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  10. 10.
    Dijkstra, E.: A note on two problems in connexion with graphs. Numer. Math. 1(1), 269–271 (1959)CrossRefMATHMathSciNetGoogle Scholar
  11. 11.
    Gude, N., Koponen, T., Pettit, J., Pfaff, B., Casado, M., McKeown, N., Shenker, S.: Nox: towards an operating system for networks. ACM SIGCOMM Comput. Commun. Rev. 38(3), 105–110 (2008)CrossRefGoogle Scholar
  12. 12.
    Handigol, N., Seetharaman, S., Flajslik, M., McKeown, N., Johari, R.: Plug-n-serve: load-balancing web traffic using openflow. In: ACM SIGCOMM Demo (2009)Google Scholar
  13. 13.
    Luo, T., Tan, H.P., Quan, P.C., Law, Y.W., Jin, J.: Enhancing responsiveness and scalability for openflow networks via control-message quenching. In: ICT Convergence (ICTC), 2012 International Conference on, pp. 348–353. IEEE (2012)Google Scholar
  14. 14.
    McKeown, N.: Openflow specification v1.0.0 (2008)Google Scholar
  15. 15.
    McKeown, N., Anderson, T., Balakrishnan, H., Parulkar, G., Peterson, L., Rexford, J., Shenker, S., Turner, J.: Openflow: enabling innovation in campus networks. ACM SIGCOMM Comput. Commun. Rev. 38(2), 69–74 (2008)CrossRefGoogle Scholar
  16. 16.
    MM, O., Okamura, K.: Design and implementation of application based routing using openflow. CFI (2010)Google Scholar
  17. 17.
    Pfaff, B., Pettit, J., Koponen, T., Amidon, K., Casado, M., Shenker, S.: Extending networking into the virtualization layer. In: Proceedings of the HotNets, (Oct 2009) (2009)Google Scholar
  18. 18.
    Shieh, A., Kandula, S., Greenberg, A., Kim, C., Saha, B.: Sharing the data center network. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, pp. 23–23. USENIX Association (2011)Google Scholar
  19. 19.
    Tian, C., Zhou, H., He, Y., Zha, L.: A dynamic mapreduce scheduler for heterogeneous workloads. In: Eighth International Conference on Grid and Cooperative Computing, 2009. GCC’09, pp. 218–224. IEEE (2009)Google Scholar
  20. 20.
    Tootoonchian, A., Ganjali, Y.: Hyperflow: A distributed control plane for openflow. In: Proceedings of the 2010 Internet Network Management Conference on Research on Enterprise Networking, pp. 3–3. USENIX Association (2010)Google Scholar
  21. 21.
    Vahdat, A., Al-Fares, M., Farrington, N., Mysore, R., Porter, G., Radhakrishnan, S.: Scale-out networking in the data center. Micro, IEEE 30(4), 29–41 (2010)CrossRefGoogle Scholar
  22. 22.
    White, T.: Hadoop: the definitive guide. O’Reilly, Media (2012)Google Scholar
  23. 23.
    Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., Qin, X.: Improving mapreduce performance through data placement in heterogeneous hadoop clusters. In: 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW), pp. 1–9 (2010)Google Scholar
  24. 24.
    Zaharia, M., Konwinski, A., Joseph, A., Katz, R., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, pp. 29–42 (2008)Google Scholar
  25. 25.
    Zhang, B., Qiu, J.: Accelerating data transfers in iterative MapReduce framework. Indiana University, USA (2012)Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringShanghai Jiao Tong UniversityShanghaiChina

Personalised recommendations