Abstract
Big data refers to diverse large data types from heterogeneous sources such as mobile devices, the web, social media, and the internet of things. The cloud offers a wide variety of tools to handle the big data on-demand for pay-per-service basis through a cluster of virtual machines that are hosted across cloud datacenter in heterogeneous physical machines. The primary goal tends to analyze the big data at the point of creation and scaling by data-intensive computing. Hadoop MapReduce helps to solve scalability and complexity by adding more jobs in a virtual cluster across the racks in a distributed cloud datacenter. By default, MapReduce schedulers do not perform the computational jobs in heterogeneity and the virtual machines will execute the blocks in equal numbers despite their capacity decreasing the performance dynamically. The virtual machines on a virtual cluster are not aware of energy efficiency, which is highly important in a heterogeneous environment. Hence, we propose the dynamic performance heuristic-based bin packing (DP-HBP) MapReduce scheduler, which increases the utilization of resources in heterogeneous virtual machines. The proposed DP-HBP scheduler improves the makespan and latency by 49% and 39% over the roulette wheel scheme and heuristic-based MapReduce job schedulers from our experimentation. DP-HBP derived the average number of data nonlocal execution as 27%, which is lesser, compared to the existing schedulers. The resource utilization for the average number of unused vCPU and memory is improved by 34% and 41%, which enhances the performance workloads in handling big data in a heterogeneous environment.
Similar content being viewed by others
References
Prajapati, V.: Big Data Analytics with R and Hadoop. Packt Publishing Ltd (2013)
Apache Hadoop. http://hadoop.apache.org/. Accessed 2021/07/05
Magesh, G.: Big data and its applications: a survey. Res. J. Pharm. Biol. Chem. Sci. 8, 2346–2358 (2017). https://doi.org/10.3923/ijscomp.2016.305.311
Jeyaraj, R.; Ananthanarayana, V.S.; Paul, A.: Improving MapReduce scheduler for heterogeneous workloads in a heterogeneous environment. Concurr. Comput. Pract. Exp. (2020). https://doi.org/10.1002/cpe.5558
Yao, Y.; Gao, H.; Wang, J.; Sheng, B.; Mi, N.: New scheduling algorithms for improving performance and resource utilization in Hadoop YARN clusters. IEEE Trans. Cloud Comput. 1, 66 (2019). https://doi.org/10.1109/TCC.2019.2894779
Rathinaraja, J.; Ananthanarayana, V.S.; Paul, A.: Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment. J. Supercomput. 75, 7520–7549 (2019). https://doi.org/10.1007/s11227-019-02960-0
Hsieh, S.-Y.; Chen, C.-T.; Chen, C.-H.; Yen, T.-H.; Hsiao, H.-C.; Buyya, R.: Novel scheduling algorithms for efficient deployment of MapReduce applications in heterogeneous computing environments. IEEE Trans. Cloud Comput. 6, 1080–1095 (2018). https://doi.org/10.1109/TCC.2016.2552518
Wei, L.; Foh, C.H.; He, B.; Cai, J.: Towards efficient resource allocation for heterogeneous workloads in IaaS clouds. IEEE Trans. Cloud Comput. 6, 264–275 (2018). https://doi.org/10.1109/TCC.2015.2481400
Lee, M.-C.; Lin, J.-C.; Yahyapour, R.: Hybrid job-driven scheduling for virtual MapReduce clusters. IEEE Trans. Parallel Distrib. Syst. 27, 1687–1699 (2016). https://doi.org/10.1109/TPDS.2015.2463817
Tesfatsion, S.K.; Wadbro, E.; Tordsson, J.: PerfGreen: performance and energy aware resource provisioning for heterogeneous clouds. In: 2018 IEEE International Conference on Autonomic Computing (ICAC), pp. 81–90. IEEE, Trento (2018). https://doi.org/10.1109/ICAC.2018.00018
Li, X.; Jiang, T.; Ruiz, R.: Heuristics for periodical batch job scheduling in a MapReduce computing framework. Inf. Sci. 326, 119–133 (2016). https://doi.org/10.1016/j.ins.2015.07.040
Ubarhande, V.; Popescu, A.-M.; Gonzalez-Velez, H.: Novel data-distribution technique for hadoop in heterogeneous cloud environments. In: 2015 Ninth International Conference on Complex, Intelligent, and Software Intensive Systems, pp. 217–224. IEEE, Santa Catarina, Brazil (2015). https://doi.org/10.1109/CISIS.2015.37
Yang, S.-J.; Chen, Y.-R.: Design adaptive task allocation scheduler to improve MapReduce performance in heterogeneous clouds. J. Netw. Comput. Appl. 57, 61–70 (2015). https://doi.org/10.1016/j.jnca.2015.07.012
Senthilkumar, M.; Ilango, P.: Energy aware task scheduling using hybrid firefly—GA in big data. Int. J. Adv. Intell. Paradig. 16, 99–112 (2020). https://doi.org/10.1504/IJAIP.2020.107008
Senthilkumar, M.: Energy-Aware Task Scheduling Using Hybrid Firefly-BAT (FFABAT) in big data. Cybern. Inf. Technol. 18, 98–111 (2018). https://doi.org/10.2478/cait-2018-0031
Tang, S.; Lee, B.-S.; He, B.: DynamicMR: a dynamic slot allocation optimization framework for MapReduce clusters. IEEE Trans. Cloud Comput. 2, 333–347 (2014). https://doi.org/10.1109/TCC.2014.2329299
Zhang, Q.; Zhani, M.F.; Boutaba, R.; Hellerstein, J.L.: Dynamic heterogeneity-aware resource provisioning in the cloud. IEEE Trans. Cloud Comput. 2, 14–28 (2014). https://doi.org/10.1109/TCC.2014.2306427
Wei, L.; He, B.; Foh, C.H.: Towards multi-resource physical machine provisioning for IaaS clouds. In: 2014 IEEE International Conference on Communications (ICC), pp. 3469–3472. IEEE, Sydney, NSW (2014). https://doi.org/10.1109/ICC.2014.6883858
Bardhan, S.; Menasce, D.A.: The anatomy of mapreduce jobs, scheduling, and performance challenges. In: International CMG Conference (2013)
Xie, J.; Meng, F.; Wang, H.; Pan, H.; Cheng, J.; Qin, X.: Research on scheduling scheme for Hadoop clusters. Procedia Comput. Sci. 18, 2468–2471 (2013). https://doi.org/10.1016/j.procs.2013.05.423
Senthilkumar, M.; Ilango, P.: A survey on job scheduling in big data. Cybern. Inf. Technol. 16, 35–51 (2016). https://doi.org/10.1515/cait-2016-0033
Wang, M.; Wu, C.Q.; Cao, H.; Liu, Y.; Wang, Y.; Hou, A.: On MapReduce scheduling in hadoop yarn on heterogeneous clusters. In: 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), pp. 1747–1754. IEEE, New York, NY, USA (2018). https://doi.org/10.1109/TrustCom/BigDataSE.2018.00264
PUMA MapReduce Datasets Download. https://engineering.purdue.edu/~puma/datasets.htm. Accessed 2021/10/12
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc. (2012)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Aarthee, S., Prabakaran, R. Energy-Aware Heuristic Scheduling Using Bin Packing MapReduce Scheduler for Heterogeneous Workloads Performance in Big Data. Arab J Sci Eng 48, 1891–1905 (2023). https://doi.org/10.1007/s13369-022-06963-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-022-06963-7