Skip to main content
Log in

A classification of hadoop job schedulers based on performance optimization approaches

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Job scheduling in MapReduce plays a vital role in Hadoop performance. In recent years, many researchers have presented job scheduler algorithms to improve Hadoop performance. Designing a job scheduler that minimizes job execution time with maximum resource utilization is not a straightforward task. The primary purpose of this paper is to investigate agents affecting job scheduler efficiency and present a novel classification for job schedulers based on these factors. We provide a comprehensive overview of existing job schedulers in each group, evaluating their approaches, their effects on Hadoop performance, and comparing their advantages and disadvantages. Finally, we provide recommendations on choosing a preferred job scheduler in different environments for improving Hadoop performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Usama, M., Liu, M., Chen, M.: Job schedulers for big data processing in Hadoop environment: testing real-life schedulers using benchmark programs. Digit. Commun. Netw. 3, 260–273 (2017)

    Article  Google Scholar 

  2. Gautam, J. V., Prajapati, H. B., Dabhi, V. K. & Chaudhary, S.: A survey on job scheduling algorithms in Big data processing. In: Proceedings of 2015 IEEE International Conference on Electrical, Computer and Communication Technologies, ICECCT 2015 (2015). https://doi.org/10.1109/ICECCT.2015.7226035

  3. Abdallat, A.A., Alahmad, A.I., Amimi, D.A.A., AlWidian, J.A.: Hadoop mapreduce job scheduling algorithms survey and use cases. Mod. Appl. Sci. 13, 38 (2019)

    Article  Google Scholar 

  4. Kalia, K., Gupta, N.: Analysis of hadoop mapreduce scheduling in heterogeneous environment. Ain Shams Eng. J. 12, 1101–1110 (2021)

    Article  Google Scholar 

  5. Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R. & Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008. p. 29–42 (2019)

  6. Chen, Q., Zhang, D., Guo, M., Deng, Q. & Guo, S.: SAMR: a self-adaptive mapreduce scheduling algorithm in heterogeneous environment. In: Proceedings—10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010. pp. 2736–2743 (2010). https://doi.org/10.1109/CIT.2010.458

  7. Ananthanarayanan, G. et al.: Reining in the outliers in map-reduce clusters using mantri. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2010. p. 265–278 (2019)

  8. Lei, L., Wo, T. & Hu, C.: CREST: towards fast speculation of straggler tasks in mapreduce. In: Proceedings—2011 8th IEEE International Conference on e-Business Engineering, ICEBE 2011. p. 311–316 (2011)

  9. Sun, X., He, C. & Lu, Y.: ESAMR: an enhanced self-adaptive mapreduce scheduling algorithm. In: Proceedings of the International Conference on Parallel and Distributed Systems—ICPADS. p. 148–155 (2012). https://doi.org/10.1109/ICPADS.2012.30

  10. Naik, N.S., Negi, A., Sastry, V.N.: Performance improvement of mapreduce framework in heterogeneous context using reinforcement learning. Procedia Comput. Sci. 50, 169–175 (2015)

    Article  Google Scholar 

  11. Brahmwar, M., Kumar, M., Sikka, G.: Tolhit—a scheduling algorithm for hadoop cluster. Procedia Comput. Sci. 89, 203–208 (2016)

    Article  Google Scholar 

  12. Ibrahim, I. A. & Bassiouni, M.: Improving mapreduce performance with progress and feedback based speculative execution. In: Proceedings—2nd IEEE International Conference on Smart Cloud, SmartCloud 2017. p. 120–125 (2017). https://doi.org/10.1109/SmartCloud.2017.25

  13. Ananthanarayanan, G., Ghodsi, A., Shenker, S. & Stoica, I.: Effective straggler mitigation: attack of the clones. In: Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2013. p. 185–198 (2013)

  14. Yadwadkar, N. J., Ananthanarayanan, G. & Katz, R.: Wrangler: predictable and faster jobs using fewer resources. In: Proceedings of the 5th ACM Symposium on Cloud Computing, SOCC 2014 (2014). https://doi.org/10.1145/2670979.2671005

  15. Li, Y., Yang, Q., Lai, S., Li, B.: A new speculative execution algorithm based on C4.5 decision tree for hadoop. In: Wang, H., Qi, H., Che, W., Qiu, Z., Kong, L., Han, Z., Lin, J., Lu, Z. (eds.) International Conference of Young Computer Scientists, Engineers and Educators, pp. 284–291. Springer, Berlin (2015)

    Google Scholar 

  16. Yadwadkar, N.J., Hariharan, B., Gonzalez, J.E., Katz, R.: Multi-task learning for straggler avoiding predictive job scheduling. J. Mach. Learn. Res. 17, 1–37 (2016)

    MathSciNet  MATH  Google Scholar 

  17. Zaharia, M. et al.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European conference on Computer systems. p. 265 (2010). https://doi.org/10.1145/1755913.1755940

  18. He, C., Lu, Y. & Swanson, D.: Matchmaking: a new mapreduce scheduling technique. In: Proceedings—2011 3rd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2011. p. 40–47 (2011). https://doi.org/10.1109/CloudCom.2011.16

  19. Zhang, X., Zhong, Z., Feng, S., Tu, B. & Fan, J.: Improving data locality of mapreduce by scheduling in homogeneous computing environments. In: Proceedings—9th IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2011. 2, p. 120–126 (2011).

  20. Ibrahim, S. et al.: Maestro: replica-aware map scheduling for mapreduce. In: Proceedings—12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2012. p. 435–442 (2012). https://doi.org/10.1109/CCGrid.2012.122

  21. Bu, X., Rao, J. & Xu, C. Z.: Interference and locality-aware task scheduling for mapreduce applications in virtual clusters. In: HPDC 2013—Proceedings of the 22nd ACM International Symposium on High-Performance Parallel and Distributed Computing. p. 227–238 (2013). https://doi.org/10.1145/2462902.2462904

  22. Tamil Selvan, S., Dhamotharan, K.A., Saravanan, G., Karunamoorthi, R.: Investigation analysis on data prefetching and mapreduce techniques for user query processing. Int. J. Sci. Technol. Res. 9, 2185–2189 (2020)

    Google Scholar 

  23. Wang, W., Ying, L.: data locality in mapreduce: a network perspective. Perform. Eval. 96, 1–11 (2016)

    Article  Google Scholar 

  24. Bibal Benifa, J.V., Dejey, D.: Performance improvement of mapreduce for heterogeneous clusters based on efficient locality and replica aware scheduling (ELRAS) strategy. Wirel. Pers. Commun. 95, 2709–2733 (2017)

    Article  Google Scholar 

  25. Merabet, M., Benslimane, S.M., Barhamgi, M., Bonnet, C.: A predictive map task scheduler for optimizing data locality in mapreduce clusters. Int. J. Grid High Perform. Comput. 10, 1–14 (2018)

    Article  Google Scholar 

  26. Gandomi, A., Reshadi, M., Movaghar, A., Khademzadeh, A.: HybSMRP: a hybrid scheduling algorithm in hadoop mapreduce framework. J. Big Data (2019). https://doi.org/10.1186/s40537-019-0253-9

    Article  Google Scholar 

  27. Rehman, S. Locality-Aware Reduce Task Scheduling for MapReduce Mohammad Hammoud and Presented By: Problem At Hand. 1–14.

  28. Hammoud, M., Rehman, M. S. & Sakr, M. F.: Center-of-gravity reduce task scheduling to lower MapReduce network traffic. In: Proceedings—2012 IEEE 5th International Conference on Cloud Computing, CLOUD 2012. p. 49–58 (2012). https://doi.org/10.1109/CLOUD.2012.92

  29. Arslan, E., Shekhar, M. & Kosar, T.: Locality and network-aware reduce task scheduling for data-intensive applications. In: Proceedings of DataCloud 2014: 5th International Workshop on Data Intensive Computing in the Clouds—Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis. p. 17–24 (2014). https://doi.org/10.1109/DataCloud.2014.10

  30. Wang, G., Khasymski, A., Krish, K. R. & Butt, A. R.: Towards improving mapreduce task scheduling using online simulation based predictions. In: Proceedings of the International Conference on Parallel and Distributed Systems—ICPADS. p. 299–306 (2013). https://doi.org/10.1109/ICPADS.2013.50

  31. Suresh, S., Gopalan, N.P.: An optimal task selection scheme for hadoop scheduling. IERI Procedia 10, 70–75 (2014)

    Article  Google Scholar 

  32. Adhianto, L., et al.: HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exp. 22, 685–701 (2010)

    Google Scholar 

  33. Lee, M.C., Lin, J.C., Yahyapour, R.: Hybrid job-driven scheduling for virtual mapreduce clusters. IEEE Trans. Parallel Distrib. Syst. 27, 1687–1699 (2016)

    Article  Google Scholar 

  34. Joseph, J.L., Lin, M.A.C., Lin, J., Lin, C.: Joint deadline-constrained and influence-aware design for allocating mapreduce jobs in cloud computing systems. Clust. Comput. 22(3), 6963–6976 (2018)

    Google Scholar 

  35. Goals, F. S. et al. Hadoop fair scheduler design document. p. 1–11 (2010).

  36. Chen, J., Wang, D., Zhao, W.: A task scheduling algorithm for Hadoop platform. J. Comput. 8, 929–936 (2013)

    Google Scholar 

  37. Li, X., Wang, Y., Jiao, Y., Xu, C. & Yu, W.: CooMR: cross-task coordination for efficient data management in mapreduce programs. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC (2013). https://doi.org/10.1145/2503210.2503276

  38. Sagar, A., Moni, R.V.: DynMR: a dynamic slot allocation framework for mapreduce clusters in big data management using DHSA and SEPB. Int. J. Comput. Tech. 2, 142–155 (2017)

    Google Scholar 

  39. Yong, M., Garegrat, N. & Mohan, S. Towards a resource aware scheduler in hadoop. Proc. ICWS 1–10 (2009).

  40. Polo, J. et al. Resource-Aware Adaptive Scheduling for MapReduce Clusters To cite this version: HAL Id: hal- 01597795 Resource-aware Adaptive Scheduling for MapReduce Clusters. 0–20 (2017).

  41. Cassales, G.W., Charão, A.S., Pinheiro, M.K., Souveyet, C., Steffenel, L.A.: Context-aware scheduling for Apache Hadoop over pervasive environments. Procedia Comput. Sci. 52, 202–209 (2015)

    Article  Google Scholar 

  42. Rasooli, A., Down, D.G.: COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems. Future Gener. Comput. Syst. 36, 1–15 (2014)

    Article  Google Scholar 

  43. Zhang, Q., Zhani, M.F., Yang, Y., Boutaba, R., Wong, B.: PRISM: fine-grained resource-aware scheduling for mapreduce. IEEE Trans. Cloud Comput. 3, 182–194 (2015)

    Article  Google Scholar 

  44. Divya, M. & Annappa, B.: Workload characteristics and resource aware Hadoop scheduler. In: 2015 IEEE 2nd International Conference on Recent Trends in Information Systems, ReTIS 2015—Proceedings. p. 163–168 (2015). https://doi.org/10.1109/ReTIS.2015.7232871

  45. Hsieh, S.Y., et al.: Novel scheduling algorithms for efficient deployment of mapreduce applications in heterogeneous computing environments. IEEE Trans. Cloud Comput. 6, 1080–1095 (2018)

    Article  Google Scholar 

  46. Chen, C.T., Hung, L.J., Hsieh, S.Y., Buyya, R., Zomaya, A.Y.: Heterogeneous job allocation scheduler for Hadoop mapreduce using dynamic grouping integrated neighboring search. IEEE Trans. Cloud Comput. 8, 193–206 (2020)

    Article  Google Scholar 

  47. Pandey, V.: A heuristic method towards deadline-aware energy-efficient mapreduce scheduling problem in Hadoop YARN. Clust. Comput. (2020). https://doi.org/10.1007/s10586-020-03146-7

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sahar Adabi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghazali, R., Adabi, S., Down, D.G. et al. A classification of hadoop job schedulers based on performance optimization approaches. Cluster Comput 24, 3381–3403 (2021). https://doi.org/10.1007/s10586-021-03339-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03339-8

Keywords

Navigation