Abstract
Job scheduling in MapReduce plays a vital role in Hadoop performance. In recent years, many researchers have presented job scheduler algorithms to improve Hadoop performance. Designing a job scheduler that minimizes job execution time with maximum resource utilization is not a straightforward task. The primary purpose of this paper is to investigate agents affecting job scheduler efficiency and present a novel classification for job schedulers based on these factors. We provide a comprehensive overview of existing job schedulers in each group, evaluating their approaches, their effects on Hadoop performance, and comparing their advantages and disadvantages. Finally, we provide recommendations on choosing a preferred job scheduler in different environments for improving Hadoop performance.
Similar content being viewed by others
References
Usama, M., Liu, M., Chen, M.: Job schedulers for big data processing in Hadoop environment: testing real-life schedulers using benchmark programs. Digit. Commun. Netw. 3, 260–273 (2017)
Gautam, J. V., Prajapati, H. B., Dabhi, V. K. & Chaudhary, S.: A survey on job scheduling algorithms in Big data processing. In: Proceedings of 2015 IEEE International Conference on Electrical, Computer and Communication Technologies, ICECCT 2015 (2015). https://doi.org/10.1109/ICECCT.2015.7226035
Abdallat, A.A., Alahmad, A.I., Amimi, D.A.A., AlWidian, J.A.: Hadoop mapreduce job scheduling algorithms survey and use cases. Mod. Appl. Sci. 13, 38 (2019)
Kalia, K., Gupta, N.: Analysis of hadoop mapreduce scheduling in heterogeneous environment. Ain Shams Eng. J. 12, 1101–1110 (2021)
Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R. & Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008. p. 29–42 (2019)
Chen, Q., Zhang, D., Guo, M., Deng, Q. & Guo, S.: SAMR: a self-adaptive mapreduce scheduling algorithm in heterogeneous environment. In: Proceedings—10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010. pp. 2736–2743 (2010). https://doi.org/10.1109/CIT.2010.458
Ananthanarayanan, G. et al.: Reining in the outliers in map-reduce clusters using mantri. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2010. p. 265–278 (2019)
Lei, L., Wo, T. & Hu, C.: CREST: towards fast speculation of straggler tasks in mapreduce. In: Proceedings—2011 8th IEEE International Conference on e-Business Engineering, ICEBE 2011. p. 311–316 (2011)
Sun, X., He, C. & Lu, Y.: ESAMR: an enhanced self-adaptive mapreduce scheduling algorithm. In: Proceedings of the International Conference on Parallel and Distributed Systems—ICPADS. p. 148–155 (2012). https://doi.org/10.1109/ICPADS.2012.30
Naik, N.S., Negi, A., Sastry, V.N.: Performance improvement of mapreduce framework in heterogeneous context using reinforcement learning. Procedia Comput. Sci. 50, 169–175 (2015)
Brahmwar, M., Kumar, M., Sikka, G.: Tolhit—a scheduling algorithm for hadoop cluster. Procedia Comput. Sci. 89, 203–208 (2016)
Ibrahim, I. A. & Bassiouni, M.: Improving mapreduce performance with progress and feedback based speculative execution. In: Proceedings—2nd IEEE International Conference on Smart Cloud, SmartCloud 2017. p. 120–125 (2017). https://doi.org/10.1109/SmartCloud.2017.25
Ananthanarayanan, G., Ghodsi, A., Shenker, S. & Stoica, I.: Effective straggler mitigation: attack of the clones. In: Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2013. p. 185–198 (2013)
Yadwadkar, N. J., Ananthanarayanan, G. & Katz, R.: Wrangler: predictable and faster jobs using fewer resources. In: Proceedings of the 5th ACM Symposium on Cloud Computing, SOCC 2014 (2014). https://doi.org/10.1145/2670979.2671005
Li, Y., Yang, Q., Lai, S., Li, B.: A new speculative execution algorithm based on C4.5 decision tree for hadoop. In: Wang, H., Qi, H., Che, W., Qiu, Z., Kong, L., Han, Z., Lin, J., Lu, Z. (eds.) International Conference of Young Computer Scientists, Engineers and Educators, pp. 284–291. Springer, Berlin (2015)
Yadwadkar, N.J., Hariharan, B., Gonzalez, J.E., Katz, R.: Multi-task learning for straggler avoiding predictive job scheduling. J. Mach. Learn. Res. 17, 1–37 (2016)
Zaharia, M. et al.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European conference on Computer systems. p. 265 (2010). https://doi.org/10.1145/1755913.1755940
He, C., Lu, Y. & Swanson, D.: Matchmaking: a new mapreduce scheduling technique. In: Proceedings—2011 3rd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2011. p. 40–47 (2011). https://doi.org/10.1109/CloudCom.2011.16
Zhang, X., Zhong, Z., Feng, S., Tu, B. & Fan, J.: Improving data locality of mapreduce by scheduling in homogeneous computing environments. In: Proceedings—9th IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2011. 2, p. 120–126 (2011).
Ibrahim, S. et al.: Maestro: replica-aware map scheduling for mapreduce. In: Proceedings—12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2012. p. 435–442 (2012). https://doi.org/10.1109/CCGrid.2012.122
Bu, X., Rao, J. & Xu, C. Z.: Interference and locality-aware task scheduling for mapreduce applications in virtual clusters. In: HPDC 2013—Proceedings of the 22nd ACM International Symposium on High-Performance Parallel and Distributed Computing. p. 227–238 (2013). https://doi.org/10.1145/2462902.2462904
Tamil Selvan, S., Dhamotharan, K.A., Saravanan, G., Karunamoorthi, R.: Investigation analysis on data prefetching and mapreduce techniques for user query processing. Int. J. Sci. Technol. Res. 9, 2185–2189 (2020)
Wang, W., Ying, L.: data locality in mapreduce: a network perspective. Perform. Eval. 96, 1–11 (2016)
Bibal Benifa, J.V., Dejey, D.: Performance improvement of mapreduce for heterogeneous clusters based on efficient locality and replica aware scheduling (ELRAS) strategy. Wirel. Pers. Commun. 95, 2709–2733 (2017)
Merabet, M., Benslimane, S.M., Barhamgi, M., Bonnet, C.: A predictive map task scheduler for optimizing data locality in mapreduce clusters. Int. J. Grid High Perform. Comput. 10, 1–14 (2018)
Gandomi, A., Reshadi, M., Movaghar, A., Khademzadeh, A.: HybSMRP: a hybrid scheduling algorithm in hadoop mapreduce framework. J. Big Data (2019). https://doi.org/10.1186/s40537-019-0253-9
Rehman, S. Locality-Aware Reduce Task Scheduling for MapReduce Mohammad Hammoud and Presented By: Problem At Hand. 1–14.
Hammoud, M., Rehman, M. S. & Sakr, M. F.: Center-of-gravity reduce task scheduling to lower MapReduce network traffic. In: Proceedings—2012 IEEE 5th International Conference on Cloud Computing, CLOUD 2012. p. 49–58 (2012). https://doi.org/10.1109/CLOUD.2012.92
Arslan, E., Shekhar, M. & Kosar, T.: Locality and network-aware reduce task scheduling for data-intensive applications. In: Proceedings of DataCloud 2014: 5th International Workshop on Data Intensive Computing in the Clouds—Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis. p. 17–24 (2014). https://doi.org/10.1109/DataCloud.2014.10
Wang, G., Khasymski, A., Krish, K. R. & Butt, A. R.: Towards improving mapreduce task scheduling using online simulation based predictions. In: Proceedings of the International Conference on Parallel and Distributed Systems—ICPADS. p. 299–306 (2013). https://doi.org/10.1109/ICPADS.2013.50
Suresh, S., Gopalan, N.P.: An optimal task selection scheme for hadoop scheduling. IERI Procedia 10, 70–75 (2014)
Adhianto, L., et al.: HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exp. 22, 685–701 (2010)
Lee, M.C., Lin, J.C., Yahyapour, R.: Hybrid job-driven scheduling for virtual mapreduce clusters. IEEE Trans. Parallel Distrib. Syst. 27, 1687–1699 (2016)
Joseph, J.L., Lin, M.A.C., Lin, J., Lin, C.: Joint deadline-constrained and influence-aware design for allocating mapreduce jobs in cloud computing systems. Clust. Comput. 22(3), 6963–6976 (2018)
Goals, F. S. et al. Hadoop fair scheduler design document. p. 1–11 (2010).
Chen, J., Wang, D., Zhao, W.: A task scheduling algorithm for Hadoop platform. J. Comput. 8, 929–936 (2013)
Li, X., Wang, Y., Jiao, Y., Xu, C. & Yu, W.: CooMR: cross-task coordination for efficient data management in mapreduce programs. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC (2013). https://doi.org/10.1145/2503210.2503276
Sagar, A., Moni, R.V.: DynMR: a dynamic slot allocation framework for mapreduce clusters in big data management using DHSA and SEPB. Int. J. Comput. Tech. 2, 142–155 (2017)
Yong, M., Garegrat, N. & Mohan, S. Towards a resource aware scheduler in hadoop. Proc. ICWS 1–10 (2009).
Polo, J. et al. Resource-Aware Adaptive Scheduling for MapReduce Clusters To cite this version: HAL Id: hal- 01597795 Resource-aware Adaptive Scheduling for MapReduce Clusters. 0–20 (2017).
Cassales, G.W., Charão, A.S., Pinheiro, M.K., Souveyet, C., Steffenel, L.A.: Context-aware scheduling for Apache Hadoop over pervasive environments. Procedia Comput. Sci. 52, 202–209 (2015)
Rasooli, A., Down, D.G.: COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems. Future Gener. Comput. Syst. 36, 1–15 (2014)
Zhang, Q., Zhani, M.F., Yang, Y., Boutaba, R., Wong, B.: PRISM: fine-grained resource-aware scheduling for mapreduce. IEEE Trans. Cloud Comput. 3, 182–194 (2015)
Divya, M. & Annappa, B.: Workload characteristics and resource aware Hadoop scheduler. In: 2015 IEEE 2nd International Conference on Recent Trends in Information Systems, ReTIS 2015—Proceedings. p. 163–168 (2015). https://doi.org/10.1109/ReTIS.2015.7232871
Hsieh, S.Y., et al.: Novel scheduling algorithms for efficient deployment of mapreduce applications in heterogeneous computing environments. IEEE Trans. Cloud Comput. 6, 1080–1095 (2018)
Chen, C.T., Hung, L.J., Hsieh, S.Y., Buyya, R., Zomaya, A.Y.: Heterogeneous job allocation scheduler for Hadoop mapreduce using dynamic grouping integrated neighboring search. IEEE Trans. Cloud Comput. 8, 193–206 (2020)
Pandey, V.: A heuristic method towards deadline-aware energy-efficient mapreduce scheduling problem in Hadoop YARN. Clust. Comput. (2020). https://doi.org/10.1007/s10586-020-03146-7
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ghazali, R., Adabi, S., Down, D.G. et al. A classification of hadoop job schedulers based on performance optimization approaches. Cluster Comput 24, 3381–3403 (2021). https://doi.org/10.1007/s10586-021-03339-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-021-03339-8