Improved Resource Provisioning in Hadoop

Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 44)

Abstract

Extensive use of the Internet is generating large amount of data. The mechanism to handle and analyze these data is becoming complicated day by day. The Hadoop platform provides a solution to process huge data on large clusters of nodes. Scheduler play a vital role in improving the performance of Hadoop. In this paper, MRPPR: MapReduce Performance Parameter based Resource aware Hadoop Scheduler is proposed. In MRPPR, performance parameters of Map task such as the time required for parsing the data, map, sort and merge the result, and of Reduce task, such as the time to merge, parse and reduce is considered to categorize the job as CPU bound, Disk I/O bound or Network I/O bound. Based on the node status obtained from the TaskTracker’s response, nodes in the cluster are classified as CPU busy, Disk I/O busy or Network I/O busy. A cost model is proposed to schedule a job to the node based on the classification to minimize the makespan and to attain effective resource utilization. A performance improvement of 25–30 % is achieved with our proposed scheduler.

Keywords

Hadoop Job scheduling Resource awareness 

References

  1. 1.
    Dittrich, J., Quiané-Ruiz, J.A.: Efficient big data processing in hadoop mapreduce. Proc. VLDB Endowment 5(12), 2014–2015 (2012)CrossRefGoogle Scholar
  2. 2.
    Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Project Website 11, 21 (2007)Google Scholar
  3. 3.
    McClatchey, R., Anjum, A., Stockinger, H., Ali, A., Willers, I., Thomas, M.: Data intensive and network aware (diana) grid scheduling. J. Grid Comput. 5(1), 43–64 (2007)CrossRefGoogle Scholar
  4. 4.
    Kumar, K.A., Konishetty, V.K., Voruganti, K., Rao, G.: Cash: context aware scheduler for hadoop. In: Proceedings of the International Conference on Advances in Computing, Communications and Informatics, pp. 52–61. ACM (2012)Google Scholar
  5. 5.
    Mude, R.G., Betta, A., Debbarma, A.: Capturing node resource status and classifying workload for map reduce resource aware scheduler. In: Intelligent Computing, Communication and Devices, pp. 247–257. Springer (2015)Google Scholar
  6. 6.
    Ren, Z., Xu, X., Wan, J., Shi, W., Zhou, M.: Workload characterization on a production hadoop cluster: a case study on taobao. In: 2012 IEEE International Symposium on Workload Characterization (IISWC), pp. 3–13. IEEE (2012)Google Scholar
  7. 7.
    Lin, X., Meng, Z., Xu, C., Wang, M.: A practical performance model for hadoop mapreduce. In: Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012 IEEE International Conference on. pp. 231–239. IEEE (2012)Google Scholar
  8. 8.
  9. 9.
  10. 10.
    Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving mapreduce performance in heterogeneous environments. OSDI 8, 7 (2008)Google Scholar
  11. 11.
    Ananthanarayanan, G., Kandula, S., Greenberg, A.G., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. OSDI 10, 24 (2010)Google Scholar
  12. 12.
    Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer systems, pp. 265–278. ACM (2010)Google Scholar
  13. 13.
    Zhang, X., Zhong, Z., Feng, S., Tu, B., Fan, J.: Improving data locality of mapreduce by scheduling in homogeneous computing environments. In: 2011 IEEE 9th International Symposium on Parallel and Distributed Processing with Applications (ISPA), pp. 120–126. IEEE (2011)Google Scholar
  14. 14.
    Song, G., Yu, L., Meng, Z., Lin, X.: A game theory based mapreduce scheduling algorithm. In: Emerging Technologies for Information Systems, Computing, and Management, pp. 287–296. Springer (2013)Google Scholar
  15. 15.
    Hadoop mapreduce: http://hadoop.apache.org/

Copyright information

© Springer India 2016

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringNational Institute of Technology KarnatakaSurathkalIndia

Personalised recommendations