Performance Improvement of MapReduce Framework by Identifying Slow TaskTrackers in Heterogeneous Hadoop Cluster

Naik, Nenavath Srinivas; Negi, Atul; Sastry, V. N.

doi:10.1007/978-81-322-2529-4_49

Nenavath Srinivas Naik⁶,
Atul Negi⁶ &
V. N. Sastry⁷

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 44))

887 Accesses

Abstract

MapReduce is presently recognized as a significant parallel and distributed programming model with wide acclaim for large scale computing. MapReduce framework divides a job into map, reduce tasks and schedules these tasks in a distributed manner across the cluster. Scheduling of tasks and identification of “slow TaskTrackers” in heterogeneous Hadoop clusters is the focus of recent research. MapReduce performance is currently limited by its default scheduler, which does not adapt well in heterogeneous environments. In this paper, we propose a scheduling method to identify “slow TaskTrackers” in a heterogeneous Hadoop cluster and implement the proposed method by integrating it with the Hadoop default scheduling algorithm. The performance of this method is compared with the Hadoop default scheduler. We observe that the proposed approach shows modest but consistent improvement against the default Hadoop scheduler in heterogeneous environments. We see that it improves by minimizing the overall job execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)
Article Google Scholar
Rasooli, A., Down, D.G.: An adaptive scheduling algorithm for dynamic heterogeneous hadoop systems. In: Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, pp. 30–44. Canada (2011)
Google Scholar
Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Job Scheduling for Multi-User MapReduce Clusters. Technical Report, University of California, Berkeley (2009)
Google Scholar
Dawei, J., Beng, C.O., Lei, S., Sai, W.: The Performance of MapReduce: An In-depth Study. VLDB (2010)
Google Scholar
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: 8th Usenix Symposium on Operating Systems Design and Implementation, pp. 29–42. ACM Press, New York (2008)
Google Scholar
Tan, J., Meng, X., Zhang, L.: Delay Tails in Mapreduce Scheduling. Technical Report, IBM T. J. Watson Research Center, New York (2011)
Google Scholar
Ekanayake, J., Pallickara, S., Fox, G.: MapReduce for data intensive scientific analyses. In: Proceedings of the 2008 IEEE Fourth International Conference on eScience, pp. 277–284 (2008)
Google Scholar
Rasooli, A., Down, D.G.: A hybrid scheduling approach for scalable heterogeneous Hadoop systems. In: Proceeding of the 5th Workshop on Many-Task Computing on Grids and Supercomputers, pp. 1284–1291 (2012)
Google Scholar
Nanduri, R., Maheshwari, N., Reddyraja, A., Varma, V.: Job aware scheduling algorithm for mapreduce framework. In: Proceedings of the 3rd International Conference on Cloud Computing Technology and Science, pp. 724–729, Washington, USA (2011)
Google Scholar
Zhenhua, G., Geo, R.F., Zhou, M., Yang, R.: Improving resource utilization in MapReduce. In: IEEE International Conference on Cluster Computing, pp. 402–410 (2012)
Google Scholar
Rasooli, A., Down, D.G.: COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems. J. Future Gener. Comput. Syst. 1–15 (2014)
Google Scholar
Naik, N.S., Negi, A., Sastry, V.N.: A review of adaptive approaches to MapReduce scheduling in heterogeneous environments. In: IEEE International Conference on Advances in Computing, Communications and Informatics, pp. 677–683, Delhi, India (2014)
Google Scholar
Shengsheng, H., Jie, H., Jinquan, D., Tao, X., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: IEEE 26th International Conference on Data Engineering Workshops, pp. 41–51 (2010)
Google Scholar

Download references

Acknowledgments

Nenavath Srinivas Naik express his gratitude to Prof. P.A. Sastry (Principal), Prof. J. Prasanna Kumar (Head of the CSE Department) and Dr. B. Sandhya, MVSR Engineering College, Hyderabad, India for hosting the experimental test bed.

Author information

Authors and Affiliations

School of Computer and Information Sciences, University of Hyderabad, Hyderabad, 500046, India
Nenavath Srinivas Naik & Atul Negi
Institute for Development and Research in Banking Technology, Hyderabad, 500057, India
V. N. Sastry

Authors

Nenavath Srinivas Naik
View author publications
You can also search for this author in PubMed Google Scholar
Atul Negi
View author publications
You can also search for this author in PubMed Google Scholar
V. N. Sastry
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nenavath Srinivas Naik .

Editor information

Editors and Affiliations

Department of Computer Science, Liverpool Hope University, Liverpool, United Kingdom
Atulya Nagar
Department of Computer Science and Engineering, National Institute of Technology Rourkela, Rourkela, India
Durga Prasad Mohapatra
Computer Science & Engineering, University of Calcutta, Kolkata, West Bengal, India
Nabendu Chaki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Naik, N.S., Negi, A., Sastry, V.N. (2016). Performance Improvement of MapReduce Framework by Identifying Slow TaskTrackers in Heterogeneous Hadoop Cluster. In: Nagar, A., Mohapatra, D., Chaki, N. (eds) Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics. Smart Innovation, Systems and Technologies, vol 44. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2529-4_49

Download citation

DOI: https://doi.org/10.1007/978-81-322-2529-4_49
Published: 03 September 2015
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2528-7
Online ISBN: 978-81-322-2529-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics