A Review of Scheduling Algorithms in Hadoop

Sharma, Anil; Singh, Gurwinder

doi:10.1007/978-3-030-29407-6_11

Anil Sharma³⁹ &
Gurwinder Singh³⁹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 597))

2360 Accesses
4 Citations

Abstract

In this epoch of data surge, big data is one of the significant areas of research being widely pondered over by computer science research community, and Hadoop is the broadly used tool to store and process it. Hadoop is fabricated to work effectively for the clusters having homogeneous environment but when the cluster environment is heterogeneous then its performance decreases which result in various challenges surfacing in the areas like query execution time, data movement cost, selection of best Cluster and Racks for data placement, preserving privacy, load distribution: imbalance in input splits, computations, partition sizes and heterogeneous hardware, and scheduling. The epicenter of Hadoop is scheduling and all incoming jobs are multiplexed on existing resources by the schedulers. Enhancing the performance of schedulers in Hadoop is very vigorous. Keeping this idea in mind as inspiration, this paper introduces the concept of big data, market share of popular vendors for big data, various tools in Hadoop ecosystem and emphasizing to study various scheduling algorithms for MapReduce model in Hadoop and make a comparison based on varied parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MapReduce scheduling algorithms in Hadoop: a systematic study

Article Open access 10 October 2023

Analysis of Scheduling Algorithms in Hadoop

Performance Analysis of Job Scheduling Algorithms on Hadoop Multi-cluster Environment

References

Cox, M., Ellsworth, D.: Managing big data for scientific visualization. ACM Siggraph. 97, 5.1–5.17 (1997)
Google Scholar
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big Data : The Next Frontier for Innovation, Competition, and Productivity (2011)
Google Scholar
Zikopoulos, P.C., DeRoos, D., Parasuraman, K., Deutsch, T., Corrigan, D., Giles, J.: Harness the Power of Big Data. The McGraw-Hill Companies (2013)
Google Scholar
Berman, J.J.: Principles of Big Data : Preparing, Sharing, and Analyzing Complex Information. Morgan Kaufmann Elsevier (2013)
Google Scholar
Gantz, J., Reinsel, D.: Extracting Value from Chaos (2011)
Google Scholar
Chen, M., Mao, S., Liu, Y.: Big Data: A Survey. Mob Netw Appl 19, 171–209 (2014)
Article Google Scholar
Reinsel, D., Gantz, J., Rydning, J.: The Digitization of the World- From Edge to Core (2018)
Google Scholar
Kelly, J., Vellante, D., Floyer, D.: Big Data Market Size and Vendor Revenues (2012)
Google Scholar
White, T.: Hadoop: The Definitive Guide. O’Reilly Media (2015)
Google Scholar
Saraladevi, B., Pazhaniraja, N., Paul, P.V., Basha, M.S.S., Dhavachelvan, P.: Big Data and Hadoop-A Study in Security Perspective. Procedia Comput. Sci. 50, 596–601 (2015)
Article Google Scholar
Ji, C., Li, Y., Qiu, W., Awada, U., Li, K.: Big data processing in cloud computing environments. In: 2012 International Symposium on Pervasive Systems, Algorithms and Networks. pp. 17–23. IEEE (2012)
Google Scholar
Song, Y.: Storing Big Data—The Rise of the Storage Cloud (2012)
Google Scholar
Ghazi, M.R., Gangodkar, D.: Hadoop, MapReduce and HDFS: a developers perspective. Procedia Comput. Sci. 48, 45–50 (2015)
Article Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST2010, pp. 1–10 (2010)
Google Scholar
Martha, V.: Big Data processing algorithms. In: Mohanty, H., Bhuyan, P., Chenthati, D. (eds.) Studies in Big Data, pp. 61–92. Springer (2015)
Google Scholar
Raj, E.D., Dhinesh Babu, L.D.: A two pass scheduling policy based resource allocation for mapreduce. In: Procedia Computer Science, International Conference on Information and Communication Technologies (ICICT 2014), pp. 627–634. Elsevier B.V. (2015)
Google Scholar
He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques—PACT ’08, p. 260 (2008)
Google Scholar
Marx, V.: Technology feature: the big challenges of Big Data. Nature 498, 255–260 (2013)
Article Google Scholar
Bhosale, H.S., Gadekar, D.P.: A review paper on Big Data and Hadoop. Int. J. Sci. Res. Publ. 4, 1–7 (2014)
Google Scholar
Al-janabi, S.T.F., Rasheed, M.A.: Public-key cryptography enabled kerberos authentication. In: 2011 Developments in E-systems Engineering Public-Key, pp. 209–214. IEEE (2011)
Google Scholar
Fadika, Z., Dede, E., Hartog, J., Govindaraju, M.: MARLA : MapReduce for heterogeneous clusters. In: 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 49–56. ACM (2012)
Google Scholar
Mao, Y., Ling, J.: Research on load balance strategy based on grey prediction theory in cloud storage. In: 2nd International Conference on Electronic & Mechanical Engineering and Information Technology (EMEIT-2012), pp. 199–203. Atlantis Press, Paris, France (2012)
Google Scholar
Ye, X., Huang, M., Zhu, D., Xu, P.: A novel blocks placement strategy for hadoop. In: Proceedings—2012 IEEE/ACIS 11th International Conference on Computer and Information Science, pp. 3–7. IEEE (2012)
Google Scholar
Ling, J., Jiang, X.: Distributed storage method based on information dispersal algorithm. In: Proceedings—2013 2nd International Symposium on Instrumentation and Measurement, Sensor Network and Automation, IMSNA 2013, pp. 624–626. IEEE (2013)
Google Scholar
Kumar, S.D.M., Shabeera, T.P.: Bandwidth-aware data placement scheme for Hadoop. In: 2013 IEEE Recent Advances in Intelligent Computational Systems (RAICS), pp. 64–67. IEEE (2013)
Google Scholar
Fan, K., Zhang, D., Li, H., Yang, Y.: An adaptive feedback load balancing algorithm in HDFS. In: 2013 5th International Conference on Intelligent Networking and Collaborative Systems, pp. 23–29. IEEE (2013)
Google Scholar
Lee, C.W., Hsieh, K.Y., Hsieh, S.Y., Hsiao, H.C.: A dynamic data placement strategy for Hadoop in heterogeneous environments. Big Data Res. 1, 14–22 (2014)
Article Google Scholar
Gao, Z., Liu, D., Yang, Y., Zheng, J., Hao, Y.: A load balance algorithm based on nodes performance in Hadoop cluster. In: APNOMS 2014—16th Asia-Pacific Network Operations and Management Symposium, pp. 1–4. IEEE (2014)
Google Scholar
Lin, C.Y., Lin, Y.C.: A load-balancing algorithm for Hadoop distributed file system. In: Proceedings—2015 18th International Conference on Network-Based Information Systems, pp. 173–179. IEEE (2015)
Google Scholar
Kim, D., Choi, E., Hong, J.: System information-based hadoop load balancing for heterogeneous clusters. In: RACS ’15 International Conference on Research in Adaptive and Convergent Systems, pp. 465–467. ACM (2015)
Google Scholar
Islam, N.S., Lu, X., Shankar, D., Panda, D.K.D.K.: Triple-H : A hybrid approach to accelerate HDFS on HPC clusters with heterogeneous storage architecture. In: 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Triple-H, pp 101–110. ACM (2015)
Google Scholar
Wang, S., Zhou, H.: The research of MapReduce load balancing based on multiple partition algorithm. In: IEEE/ACM 9th International Conference on Utility and Cloud Computing, pp. 339–342. IEEE/ACM (2016)
Google Scholar
Hou, X., Pal, D., Kumar T.K.A., Thomas, J.P., Liu, H.: Privacy preserving rack-based dynamic workload balancing for Hadoop MapReduce. In: IEEE 2nd International Conference on Big Data Security on Cloud, IEEE International Conference on High Performance and Smart Computing, IEEE International Conference on Intelligent Data and Security, pp. 30–35. IEEE (2016)
Google Scholar
Nayahi, J.J.V., Kavitha, V.: Privacy and utility preserving data clustering for data anonymization and distribution on Hadoop. Futur. Gener. Comput. Syst. 74, 393–408 (2016)
Article Google Scholar
Song, Y., Shin, Y., Jang, M., Chang, J.: Design and implementation of HDFS data encryption scheme using ARIA algorithm on Hadoop. In: 4th International Conference on Big Data and Smart Computing (BigComp 2017), pp. 84–90. IEEE (2017)
Google Scholar
Tao, D., Lin, Z., Wang, B.: Load feedback-based resource scheduling and dynamic migration-based data locality for virtual Hadoop clusters in OpenStack-based clouds. Tsinghua Sci. Technol. 22, 149–159 (2017)
Article Google Scholar
Guo, Z., Fox, G., Zhou, M., Ruan, Y.: Improving resource utilization in MapReduce. In: IEEE International Conference on Cluster Computing, pp. 402–410. IEEE (2012)
Google Scholar
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: 8th USENIX Symposium on Operating Systems Design and Implementation, pp. 29–42. USENIX Association (2008)
Google Scholar
Kc, K., Anyanwu, K.: Scheduling Hadoop jobs to meet deadlines. In: 2nd IEEE International Conference on Cloud Computing Technology and Science Scheduling, pp. 388–392. IEEE (2010)
Google Scholar
Dai, X., Bensaou, B.: Scheduling for response time in Hadoop MapReduce. In: IEEE ICC 2016 SAC Cloud Communications and Networking, pp. 3627–3632. IEEE (2016)
Google Scholar
Cheng, D., Rao, J., Jiang, C., Zhou, X.: Resource and deadline-aware job scheduling in dynamic Hadoop Clusters. In: Proceedings—2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015, pp. 956–965 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Applications, Lovely Professional University, Punjab, India
Anil Sharma & Gurwinder Singh

Authors

Anil Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Gurwinder Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gurwinder Singh .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Jaypee University of Information Technology, Waknaghat, Himachal Pradesh, India
Pradeep Kumar Singh
Indian Institute of Technology Delhi, New Delhi, Delhi, India
Arpan Kumar Kar
Central University of Jammu, Jammu, Jammu and Kashmir, India
Yashwant Singh
Indian Institute of Technology Patna, Patna, Bihar, India
Maheshkumar H. Kolekar
Institute of Technology, Nirma University, Ahmedabad, Gujarat, India
Sudeep Tanwar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sharma, A., Singh, G. (2020). A Review of Scheduling Algorithms in Hadoop. In: Singh, P., Kar, A., Singh, Y., Kolekar, M., Tanwar, S. (eds) Proceedings of ICRIC 2019 . Lecture Notes in Electrical Engineering, vol 597. Springer, Cham. https://doi.org/10.1007/978-3-030-29407-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-29407-6_11
Published: 22 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29406-9
Online ISBN: 978-3-030-29407-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

A Review of Scheduling Algorithms in Hadoop

Abstract

Access this chapter

Similar content being viewed by others

MapReduce scheduling algorithms in Hadoop: a systematic study

Analysis of Scheduling Algorithms in Hadoop

Performance Analysis of Job Scheduling Algorithms on Hadoop Multi-cluster Environment

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Review of Scheduling Algorithms in Hadoop

Abstract

Access this chapter

Similar content being viewed by others

MapReduce scheduling algorithms in Hadoop: a systematic study

Analysis of Scheduling Algorithms in Hadoop

Performance Analysis of Job Scheduling Algorithms on Hadoop Multi-cluster Environment

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation