Advertisement

Enhancing the Performance of MapReduce Default Scheduler by Detecting Prolonged TaskTrackers in Heterogeneous Environments

  • Nenavath Srinivas Naik
  • Atul Negi
  • V. N. Sastry
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 380)

Abstract

MapReduce is now a significant parallel processing model for large-scale data-intensive applications using clusters with commodity hardware. Scheduling of jobs and tasks, and identification of TaskTrackers which are slow in Hadoop clusters are the focus research in the recent years. MapReduce performance is currently limited by its default scheduler, which does not adapt well in heterogeneous environments. In this paper, we propose a scheduling method to identify the TaskTrackers which are running slowly in map and reduce phases of the MapReduce framework in a heterogeneous Hadoop cluster. The proposed method is integrated with the MapReduce default scheduling algorithm. The performance of this method is compared with the unmodified MapReduce default scheduler. We observe that the proposed approach shows improvements in performance to the default scheduler in the heterogeneous environments. Performance improvement was observed as the overall job execution times for different workloads from HiBench benchmark suite were reduced.

Keywords

MapReduce Task scheduler TaskTrackers Heterogeneous environment 

Notes

Acknowledgments

Nenavath Srinivas Naik expresses his gratitude to Prof. P.A. Sastry (Principal), Prof. J. Prasanna Kumar (Head of the CSE Department), and Dr. B. Sandhya, MVSR Engineering College, Hyderabad, India for hosting the experimental test bed.

References

  1. 1.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)Google Scholar
  2. 2.
    Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)CrossRefGoogle Scholar
  3. 3.
    Dawei, J., Beng, C.O., Lei, S., Sai, W.: The performance of MapReduce: an in-depth study. VLDB 19, 1–2 (2010)Google Scholar
  4. 4.
    Tian, C., Zhou, H., He, Y., Zha, L.: A dynamic MapReduce scheduler for heterogeneous workloads. In: Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing, pp. 218–224 (2009)Google Scholar
  5. 5.
    Rasooli, A., Down, D.G.: An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems. In: Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, pp. 30–44. Canada (2011)Google Scholar
  6. 6.
    Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Job Scheduling for Multi-user MapReduce clusters. Technical Report, University of California, Berkeley (2009)Google Scholar
  7. 7.
    Chen, Q., Zhang, D., Guo, M., Deng, Q., Guo, S.: SAMR: A self adaptive MapReduce scheduling algorithm in heterogeneous environment. In: Proceedings of the 10th IEEE International Conference on Computer and Information Technology, pp. 2736–2743. Washington, USA (2010)Google Scholar
  8. 8.
    Tan, J., Meng, X., Zhang, L.: Delay tails in MapReduce scheduling. Technical Report, IBM T. J. Watson Research Center, New York (2011)Google Scholar
  9. 9.
    Rasooli, A., Down, D.G.: A hybrid scheduling approach for scalable heterogeneous Hadoop systems. In: Proceeding of the 5th Workshop on Many-Task Computing on Grids and Supercomputers, pp. 1284–1291 (2012)Google Scholar
  10. 10.
    Nanduri, R., Maheshwari, N., Reddyraja, A., Varma, V.: Job aware scheduling algorithm for MapReduce framework. In: Proceedings of the 3rd International Conference on Cloud Computing Technology and Science, pp. 724–729, Washington, USA (2011)Google Scholar
  11. 11.
    Naik, N.S., Negi, A., Sastry, V.N.: A review of adaptive approaches to MapReduce scheduling in heterogeneous environments. In: IEEE International Conference on Advances in Computing, Communications and Informatics, pp. 677–683. Delhi, India (2014)Google Scholar
  12. 12.
    Zhenhua, G., Geo, R.F., Zhou, M., Yang, R.: Improving resource utilization in MapReduce. In; IEEE International Conference on Cluster Computing, pp. 402–410 (2012)Google Scholar
  13. 13.
    Rasooli, A., Down, D.G.: COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems. J. Future Gener. Comput. Syst. 36, 1–15 (2014)Google Scholar
  14. 14.
    Shengsheng, H., Jie, H., Jinquan, D., Tao, X., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: IEEE 26th International Conference on Data Engineering Workshops, pp. 41–51 (2010)Google Scholar

Copyright information

© Springer India 2016

Authors and Affiliations

  • Nenavath Srinivas Naik
    • 1
  • Atul Negi
    • 1
  • V. N. Sastry
    • 2
  1. 1.School of Computer and Information SciencesUniversity of HyderabadHyderabadIndia
  2. 2.Institute for Development and Research in Banking TechnologyHyderabadIndia

Personalised recommendations