Capturing Node Resource Status and Classifying Workload for Map Reduce Resource Aware Scheduler

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 309)

Abstract

There has been an enormous growth in the amount of digital data, and numerous software frameworks have been made to process the same. Hadoop MapReduce is one such popular software framework which processes large data on commodity hardware. Job scheduler is a key component of Hadoop for assigning tasks to node. Existing MapReduce scheduler assigns tasks to node without considering node heterogeneity, workload type, and the amount of available resources. This leads to overburdening of node by one type of job and reduces the overall throughput. In this paper, we propose a new scheduler which capture the node resource status after every heartbeat, classifies jobs into two types, CPU bound and IO bound, and assigns task to the node which is having less CPU/IO utilization. The experimental result shows an improvement of 15–20 % on heterogeneous and around 10 % of homogeneous cluster with respect to Hadoop native scheduler.

Keywords

MapReduce Homogeneous cluster Heteregeneous cluster Hadoop Scheduler 

References

  1. 1.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Technical Report, Google (2004)Google Scholar
  2. 2.
  3. 3.
  4. 4.
    Amazon Elastic Map Reduce, http://aws.amazon.com/elasticmapreduce/
  5. 5.
    Joseph, A.D., Katz, R., Zaharia, M., Konwinski, A., Stoica, I.: (2008) Improving MapReduce performance in heterogeneous environments. In: OSDI’08. USENIX Association, Berkeley, pp. 29–42 (2008)Google Scholar
  6. 6.
  7. 7.
  8. 8.
    Chen, Q., Zhang, D., Guo, M., Deng, Q., Guo, S.: SAMR: a self-adaptive MapReduce scheduling algorithm in heterogeneous environment. In: 10th IEEE International Conference on Computer and Information Technology (CIT 2010), pp. 2376–2743 (2010)Google Scholar
  9. 9.
    Arun kumar, K., Konishetty, V.K., Voruganti, K., Prabhakara Rao, G.V.: CASH: context aware scheduler for Hadoop. In: Proceedings of the International Conference on Advances in Computing, Communications and Informatics, New York, 2012, ICACCI ’12. ACM, pp. 52–61Google Scholar
  10. 10.
    Rasooli, A., Down, D.G.: COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems. High Performance Computing, Networking Storage and Analysis, SC Companion. IEEE, pp. 1284–1291 (2013)Google Scholar
  11. 11.
    Lu, P., Lee, Y.C., Wang, C., Zhou, B.B., Chen, J., Zomaya, A.Y.: Workload characteristic oriented scheduler for MapReduce. In: 2012 IEEE 18th International Conference on Parallel and Distributed Systems, pp. 156–163 (2012)Google Scholar
  12. 12.
    He, Y., Tian, C., Zhou, H., Zha, L.: A dynamic MapReduce scheduler for heterogeneous workloads. In: Eighth International Conference on Grid and Cooperative Computing, IEEE 2009, pp. 218–224Google Scholar
  13. 13.
    Hu, W., Tian, C., Liu, X., Qi, H., Zha, L., Liao, H., Zhang, Y., Zhang, J.: Mutiple-job optimization in MapReduce for heterogeneous workloads. In: 2010 Sixth International Conference on Semantics, Knowledge and Grids, IEEE 2010, pp. 135–140Google Scholar
  14. 14.
  15. 15.
    Murthy, A.: Next Generation Hadoop [Online]. Available: http://developer.yahoo.com/blogs/hadoop/posts/2011/03/MapReduce-nextgen-scheduler/

Copyright information

© Springer India 2015

Authors and Affiliations

  • Ravi G. Mude
    • 1
  • Annappa Betta
    • 1
  • Akashdeep Debbarma
    • 1
  1. 1.Department of Computer Science and EngineeringNational Institute of Technology KarnatakaSurathkalIndia

Personalised recommendations