Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment

Rathinaraja, J.; Ananthanarayana, V. S.; Paul, Anand

doi:10.1007/s11227-019-02960-0

Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment

Published: 01 August 2019

Volume 75, pages 7520–7549, (2019)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

330 Accesses
9 Citations
3 Altmetric
Explore all metrics

Abstract

“More data, more information.” Big data helps businesses and research communities to gain insights and increase productivity. Many public cloud service providers offer Hadoop MapReduce as a service based on pay-per-use via infrastructure as a service on clusters of virtual machines promising on-demand horizontal scaling. These clusters of virtual machines are launched in various physical machines across racks in cloud data centers. Such multi-tenancy negatively introduces performance heterogeneity for Hadoop virtual machines due to hardware heterogeneity and interference from co-located virtual machine. Performance heterogeneity largely affects MapReduce job latency and resource utilization of rented Hadoop virtual clusters. Default MapReduce schedulers assign map/reduce tasks assuming the hardware is homogeneous. Interference-aware schedulers perform by only observing the interference pattern generated by co-located virtual machines. These schedulers do not consider the heterogeneous performance of virtual machines. Therefore, we propose a dynamic ranking-based MapReduce job scheduler that places the map and reduces tasks based on a virtual machine’s performance rank to minimize job latency and improve resource utilization. Our proposed approach calculates the performance score for each virtual machine based on hardware heterogeneity and co-located virtual machine interference. Then, it ranks the virtual machines based on the map and reduce performance separately to place map and reduce tasks. To demonstrate our ideas, we have set a test bed with 29 virtual machines on eight physical machines with different configurations and capacities. We modify a default fair scheduler in Hadoop 2.x to incorporate our ideas and evaluate them with different workloads on the PUMA dataset. The proposed method is then compared against a default fair scheduler (resource-aware) and an interference-aware scheduler based on job latency and resource utilization. Finally, we argue in favor of our approach as it improves resource utilization by 30–65% and overall job latency by up to 30%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fine-grained data-locality aware MapReduce job scheduler in a virtualized environment

Article 22 January 2020

Energy-Aware Heuristic Scheduling Using Bin Packing MapReduce Scheduler for Heterogeneous Workloads Performance in Big Data

Article 21 July 2022

A Pareto-based scheduler for exploring cost-performance trade-offs for MapReduce workloads

Article Open access 03 July 2017

References

Guo Y, Rao J, Jiang C, Zhou X (2014) Moving hadoop into the cloud with flexible slot management. In: IEEE Proceedings of the International Conference for High-Performance Computing, Networking, Storage and Analysis, pp 959–969
Vaibhav P, Poonam S (2018) How heterogeneity affects the design of hadoop MapReduce schedulers: a state-of-the-art survey and challenges. Big Data 6(2):72–95
Article Google Scholar
Jackson K (2012) OpenStack cloud computing cookbook. Packt Publishing, Birmingham
Google Scholar
Boutaba R, Cheng L, Zhang Q (2012) On cloud computational models and the heterogeneity challenge. J Internet Ser Appl 3:77–86
Article Google Scholar
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: 6th ACM Conference on Symposium on Operating Systems Design Implementation
Mei Y, Liu L, Pu X, Sivathanu S (2010) Performance measurements and analysis of network I/O applications in virtualized cloud. In: IEEE 3rd International Conference on Cloud Computing, pp 59–66
Chiang RC, Howie Huang H (2014) TRACON: interference-aware scheduling for data-intensive applications in virtualized environments. IEEE Trans Parallel Distrib Syst 25(5):1349–1358
Article Google Scholar
Bu X, Rao J, Xu CZ (2013) Interference and locality-aware task scheduling for MapReduce applications in virtual clusters. In: High-Performance Parallel and Distributed Computing, pp 227–238
Nathuji R, Kansal A, Ghaffarkhah A (2010) Q-clouds: managing performance interference effects for QoS-aware clouds. In: EuroSys, pp 237–250
Cheng D, Rao J, Guo Y, Jiang C, Zhou X (2017) Improving performance of heterogeneous MapReduce clusters with adaptive task tuning. IEEE Trans Parallel Distrib Syst 28:774–786
Article Google Scholar
Lei Yang Y, Dai BZ (2016) MapReduce scheduler by characterizing performance interference. China Commun 13(10):253–262
Article Google Scholar
Vasile M-A, Pop F, Tutueanu R-I, Cristea V, Kolodziej J (2015) Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Future Gener Comput Syst 51:61–71
Article Google Scholar
Ikken S, Renault E, Kechadi MT, Tari A (2015) Toward scheduling I/O request of MapReduce tasks based on the Markov model. Springer, Berlin, pp 78–89
Google Scholar
Zhang Q, Zhani MF, Yang Y, Boutaba R, Wong B (2015) PRISM: fine-grained resource-aware scheduling for MapReduce. IEEE Trans Cloud Comput 3:182–194
Article Google Scholar
Yang S-J, Chen Y-R (2015) Design adaptive task allocation scheduler to improve MapReduce performance in heterogeneous clouds. J Netw Comput Appl 57:61–70
Article Google Scholar
Anjos J, Izurieta IC, Kolberg W, Tibola AL, Arantes L, Geyer C (2015) MRA++: scheduling and data placement on MapReduce for heterogeneous environments. Future Gener Comput Syst 42:22–35
Article Google Scholar
Mao Y, Zhong H, Wang L (2015) A Fine-grained and dynamic MapReduce task scheduling scheme for the heterogeneous cloud environment. In: IEEE 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), pp 155–158
Zhang Z, Cherkasova L, Loo BT (2015) Exploiting cloud heterogeneity to optimize performance and cost of MapReduce processing. ACM Sigmet Perform Eval Rev 42:38–50
Article Google Scholar
Yan F, Cherkasova L, Zhang Z, Smirni E (2017) DyScale: a MapReduce job scheduler for heterogeneous multicore processors. IEEE Trans Cloud Comput 5:317–330
Article Google Scholar
Lin W-H, Lei Z-M, Liu J, Yang J, Liu F, He G, Wang Q (2013) MapReduce optimization algorithm based on machine learning in a heterogeneous cloud environment. J China Univ Posts Telecommun 20:77–121
Article Google Scholar
Verma A, Cherkasova L, Campbell RH (2011) ARIA: automatic resource inference and allocation for MapReduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp 235–244
Xie J, Yin S, Ruan X, Ding Z, Tian Y, Majors J, Manzanares A, Qin X (2010) Improving MapReduce performance through data placement in heterogeneous hadoop clusters. In: Parallel and Distributed Processing, Workshops and Ph.D. Forum (IPDPSW), pp 1–9
Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: ACM Proceedings of the 5th European Conference on Computer Systems, pp 265–278
Tian C, Zhou H, He Y, Zha L (2009) A dynamic MapReduce scheduler for heterogeneous workloads. In: IEEE Eighth International Conference on Grid and Cooperative Computing, pp 218–244
PUMA Dataset. https://engineering.purdue.edu/~puma/datasets.htm Accessed 18 July 2019
Fair scheduler. https://hadoop.apache.org/docs/r1.2.1/fair$_$scheduler.html Accessed 18 July 2019
Chen C-H, Lin J-W, Kuo S-Y (2018) MapReduce scheduling for deadline-constrained jobs in heterogeneous cloud computing systems. IEEE Trans Cloud Comput 6(1):127–140
Article Google Scholar
Hsieh S-Y, Chen C-T, Chen C-H, Yen T-H, Hsiao H-C, Buyya R (2018) Novel scheduling algorithms for efficient deployment of MapReduce applications in heterogeneous computing environments. IEEE Trans Cloud Comput 6(4):1080–1095
Article Google Scholar
Cheng D, Zhou X, Yinggen X, Liu L, Jiang C (2019) Deadline-aware MapReduce job scheduling with dynamic resource availability. IEEE Trans Parallel Distrib Syst 30(4):814–826
Article Google Scholar
Yang Z, Bhimani J, Yao Y, Lin C-H, Wang J, Mi N, Sheng B (2018) AutoAdmin: automatic and dynamic resource reservation admission control in hadoop YARN clusters Scalable Comput Pract Exp 19(1):53–67
Google Scholar
Zeng X, Garg SK, Wen Z, Strazdins P, Zomaya AY, Ranjan R (2018) Cost efficient scheduling of MapReduce applications on public clouds. J Comput Sci 26:375–388
Article Google Scholar
Qureshi B (2019) Profile-based power-aware workflow scheduling framework for energy-efficient data centers. Future Gener Comput Syst 94:453–467
Article Google Scholar
Yao Y, Gao H, Wang J, Sheng B, Mi N (2019) New scheduling algorithms for improving performance and resource utilization in hadoop YARN clusters. IEEE Trans Cloud Comput (2019)
Chen CT, Hung LJ, Hsieh SY, Buyya R, Zomaya AY (2017) Heterogeneous job allocation scheduler for hadoop MapReduce using dynamic grouping integrated neighboring search. IEEE Trans Cloud Comput (2017). https://doi.org/10.1109/TCC.2017.2748586
Naik NS, Negi A, Tapas Bapu BR, Anitha R (2019) A data locality-based scheduler to enhance MapReduce performance in heterogeneous environments. Future Gener Comput Syst 90:423–434
Article Google Scholar
Malik M, Neshatpour K, Rafatirad S, Joshi RV, Mohsenin T, Ghasemzadeh H, Homayoun H (2019) Big versus little core for energy-efficient hadoop computing. J Parallel Distrib Comput 129:110–124
Article Google Scholar
Paul A, Jeyaraj R (2019) Internet of Things: a Primer. Human Behav Emerg Technol (2019). https://doi.org/10.1002/hbe2.133
Article Google Scholar

Download references

Acknowledgements

This study was supported by a National Research Foundation of Korea (NRF) Grant funded by the Korean government (NRF-2017R1C1B5017464).

Author information

Authors and Affiliations

Department of Information Technology, National Institute of Technology Karnataka, Mangalore, India
J. Rathinaraja & V. S. Ananthanarayana
Department of Computer Science and Engineering, Kyungpook National University, Daegu, South Korea
Anand Paul

Authors

J. Rathinaraja
View author publications
You can also search for this author in PubMed Google Scholar
V. S. Ananthanarayana
View author publications
You can also search for this author in PubMed Google Scholar
Anand Paul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anand Paul.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rathinaraja, J., Ananthanarayana, V.S. & Paul, A. Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment. J Supercomput 75, 7520–7549 (2019). https://doi.org/10.1007/s11227-019-02960-0

Download citation

Published: 01 August 2019
Issue Date: November 2019
DOI: https://doi.org/10.1007/s11227-019-02960-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment

Abstract

Access this article

Similar content being viewed by others

Fine-grained data-locality aware MapReduce job scheduler in a virtualized environment

Energy-Aware Heuristic Scheduling Using Bin Packing MapReduce Scheduler for Heterogeneous Workloads Performance in Big Data

A Pareto-based scheduler for exploring cost-performance trade-offs for MapReduce workloads

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment

Abstract

Access this article

Similar content being viewed by others

Fine-grained data-locality aware MapReduce job scheduler in a virtualized environment

Energy-Aware Heuristic Scheduling Using Bin Packing MapReduce Scheduler for Heterogeneous Workloads Performance in Big Data

A Pareto-based scheduler for exploring cost-performance trade-offs for MapReduce workloads

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation