Abstract
Big Data applications require more energy consumption to process a massive volume of data in a heterogeneous environment. Moreover, reducing energy consumption in Big Data applications is an important research topic. It is one of the challenging issues to conserve energy with a deadline constraint in a heterogeneous environment. In this paper, we formulate scheduling the MapReduce jobs as a minimization problem by considering the decision variables with a user-specified deadline constraint. Further, a Learning Automata-based MapReduce Scheduling (LA-MRS) algorithm has been proposed to identify the resource allocation and save energy consumption of MapReduce tasks in a heterogeneous environment. We perform experimentation on the proposed LA-MRS algorithm using Hibench benchmark workloads such as Enhanced DFSIO, Nutch Indexing, k-mean Clustering and Hive Join. The experimentation illustrates that the proposed LA-MRS algorithm schedules the MapReduce task by saving around 25% of less energy consumed when compared to the existing algorithms.
Similar content being viewed by others
Data availability
None
References
Shao, Y., Li, C., Gu, J., Zhang, J., Luo, Y.: Efficient jobs scheduling approach for big data applications. Comput. Ind. Eng. 117, 249–261 (2018)
Li, H., Wang, H., Xiong, A., Lai, J., Tian, W.: Comparative analysis of energy-efficient scheduling algorithms for big data applications. IEEE Access 6, 40073–40084 (2018)
Yousefi, M.H.N., Goudarzi, M.: A task-based greedy scheduling algorithm for minimizing energy of mapreduce jobs. J. Grid Comput. 16(4), 535–551 (2018)
Pandey, V., Saini, P.: A heuristic method towards deadline-aware energy-efficient mapreduce scheduling problem in hadoop yarn. Clust. Comput. 24(2), 683–699 (2021)
Gregory, A., Majumdar, S.: Resource management for deadline constrained mapreduce jobs for minimising energy consumption. Int. J. Big Data Intell. 5(4), 270–287 (2018)
Zong, Z., Ge, R., Gu, Q.: Marcher: a heterogeneous system supporting energy-aware high performance computing and big data analytics. Big Data Res. 8, 27–38 (2017)
Verma, A., Cherkasova, L., Kumar, V.S., Campbell, R.H.: Deadline-based workload management for mapreduce environments: Pieces of the performance puzzle. In: 2012 IEEE Network Operations and Management Symposium, pp. 900–905. IEEE (2012)
Bhattacharya, A.A., Culler, D., Friedman, E., Ghodsi, A., Shenker, S., Stoica, I.: Hierarchical scheduling for diverse datacenter workloads. In: Proceedings of the 4th Annual Symposium on Cloud Computing, pp. 1–15 (2013)
Zhang, X., Liu, X., Li, W., Zhang, X.: Trade-off between energy consumption and makespan in the mapreduce resource allocation problem. In: International Conference on Artificial Intelligence and Security, pp. 239–250. Springer (2019)
Wang, H., Cao, Y.: An energy efficiency optimization and control model for hadoop clusters. IEEE Access 7, 40534–40549 (2019)
Ahmed, N., Barczak, A.L., Susnjak, T., Rashid, M.A.: A comprehensive performance analysis of apache hadoop and apache spark for large scale data sets using hibench. J. Big Data 7(1), 1–18 (2020)
Hadoop, W., Hadoop, T.: The Definitive Guide. O’Reilly Media Inc, Sebastopol, CA (2015)
Ullah, I., Khan, M.S., Amir, M., Kim, J., Kim, S.M.: Lstpd: least slack time-based preemptive deadline constraint scheduler for hadoop clusters. IEEE Access 8, 111751–111762 (2020)
Gandomi, A., Reshadi, M., Movaghar, A., Khademzadeh, A.: Hybsmrp: a hybrid scheduling algorithm in hadoop mapreduce framework. J. Big Data 6(1), 1–16 (2019)
Sulaiman, M., Halim, Z., Lebbah, M., Waqas, M., Tu, S.: An evolutionary computing-based efficient hybrid task scheduling approach for heterogeneous computing environment. J. Grid Comput. 19(1), 1–31 (2021)
Wu, W., Lin, W., Hsu, C.-H., He, L.: Energy-efficient hadoop for big data analytics and computing: a systematic review and research insights. Futur. Gener. Comput. Syst. 86, 1351–1367 (2018)
Senthilkumar, M., Ilango, P.: Energy aware task scheduling using hybrid firefly-ga in big data. Int. J. Adv. Intell. Paradigms 16(2), 99–112 (2020)
Tran, X.T., Van Do, T., Rotter, C., Hwang, D.: A new data layout scheme for energy-efficient mapreduce processing tasks. J. Grid Comput. 16(2), 285–298 (2018)
Cai, X., Li, F., Li, P., Ju, L., Jia, Z.: Sla-aware energy-efficient scheduling scheme for hadoop yarn. J. Supercomput. 73(8), 3526–3546 (2017)
Jin, P., Hao, X., Wang, X., Yue, L.: Energy-efficient task scheduling for cpu-intensive streaming jobs on hadoop. IEEE Trans. Parallel Distrib. Syst. 30(6), 1298–1311 (2018)
Lingam, G., Rout, R.R., Somayajulu, D., Ghosh, S.K.: Particle swarm optimization on deep reinforcement learning for detecting social spam bots and spam-influential users in twitter network. IEEE Syst. J. 15(2), 2281–2292 (2020)
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), pp. 41–51 (2010)
Pandey, V., Saini, P.: Constraint programming versus heuristic approach to mapreduce scheduling problem in hadoop yarn for energy minimization. J. Supercomput., 1–29 (2021)
Seethalakshmi, V., Govindasamy, V., Akila, V.: Real-coded multi-objective genetic algorithm with effective queuing model for efficient job scheduling in heterogeneous hadoop environment. J. King Saud Univ. (2020)
Li, H., Dai, H., Liu, Z., Fu, H., Zou, Y.: Dynamic energy-efficient scheduling for streaming applications in storm. Computing, 1–20 (2021)
Aggarwal, V., Xu, M., Lan, T., Subramaniam, S.: On the optimality of scheduling dependent mapreduce tasks on heterogeneous machines. arXiv:1711.09964 (2017)
Tang, Z., Jiang, L., Zhou, J., Li, K., Li, K.: A self-adaptive scheduling algorithm for reduce start time. Futur. Gener. Comput. Syst. 43, 51–60 (2015)
Hsu, C.-H., Slagter, K.D., Chung, Y.-C.: Locality and loading aware virtual machine mapping techniques for optimizing communications in mapreduce applications. Futur. Gener. Comput. Syst. 53, 43–54 (2015)
Dong, J., Goebel, R., Hu, J., Lin, G., Su, B.: Minimizing total job completion time in mapreduce scheduling. Comput. Ind. Eng. 158, 107387 (2021)
Maleki, N., Faragardi, H.R., Rahmani, A.M., Conti, M., Lofstead, J.: Tmar: a two-stage mapreduce scheduler for heterogeneous environments. HCIS 10(1), 1–26 (2020)
Mashayekhy, L., Nejad, M.M., Grosu, D., Zhang, Q., Shi, W.: Energy-aware scheduling of mapreduce jobs for big data applications. IEEE Trans. Parallel Distrib. Syst. 26(10), 2720–2733 (2014)
Abualigah, L., Diabat, A., Mirjalili, S., Abd Elaziz, M., Gandomi, A.H.: The arithmetic optimization algorithm. Comput. Methods Appl. Mech. Eng. 376, 113609 (2021)
Abualigah, L., Yousri, D., Abd Elaziz, M., Ewees, A.A., Al-Qaness, M.A., Gandomi, A.H.: Aquila optimizer: a novel meta-heuristic optimization algorithm. Comput. Ind. Eng. 157, 107250 (2021)
Abualigah, L., Abd Elaziz, M., Sumari, P., Geem, Z.W., Gandomi, A.H.: Reptile search algorithm (rsa): a nature-inspired meta-heuristic optimizer. Expert Syst. Appl. 191, 116158 (2022)
Zhang, D., Yao, L., Chen, K., Wang, S., Chang, X., Liu, Y.: Making sense of spatio-temporal preserving representations for eeg-based human intention recognition. IEEE Trans. Cybernet. 50(7), 3033–3044 (2019)
Luo, M., Chang, X., Nie, L., Yang, Y., Hauptmann, A.G., Zheng, Q.: An adaptive semisupervised feature analysis for video semantic recognition. IEEE Trans. Cybernet. 48(2), 648–660 (2017)
Chen, K., Yao, L., Zhang, D., Wang, X., Chang, X., Nie, F.: A semisupervised recurrent convolutional attention model for human activity recognition. IEEE Trans. Neural Netw. Learn. Syst. 31(5), 1747–1756 (2019)
Gao, Y., Huang, C.: Energy-efficient scheduling of mapreduce tasks based on load balancing and deadline constraint in heterogeneous hadoop yarn cluster. In: 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 220–225. IEEE (2021)
Hu, J.: Hybrid dynamic scheduling of mapreduce and spark services based on the profit model in the cloud computing platform. In: 2021 Second International Conference on Intelligent Data Science Technologies and Applications (IDSTA), pp. 114–121. IEEE (2021)
Gao, Y., Zhang, K.: Deadline-aware preemptive job scheduling in hadoop yarn clusters. In: 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 1269–1274. IEEE (2022)
Funding
Not applicable
Author information
Authors and Affiliations
Contributions
The author contributed completely to this work
Corresponding author
Ethics declarations
Conflict of interest
This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue.
Ethical approval
Yes
Consent for publication
Yes
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lingam, G. Reinforcement learning based energy efficient resource allocation strategy of MapReduce jobs with deadline constraint. Cluster Comput 26, 2719–2735 (2023). https://doi.org/10.1007/s10586-022-03761-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-022-03761-6