EMM: Extended matching market based scheduling for big data platform hadoop

Singh, Balraj; Verma, Harsh 
                K

doi:10.1007/s11042-021-11283-3

EMM: Extended matching market based scheduling for big data platform hadoop

1174: Futuristic Trends and Innovations in Multimedia Systems Using Big Data, IoT and Cloud Technologies (FTIMS)
Published: 11 August 2021

Volume 81, pages 34823–34847, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

217 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Hadoop has emerged as a popular choice for processing Big data. Its cluster is used to process large scale jobs. The performance of a cluster is largely dependent upon the different kind of scheduling policies employed for job processing. However, a single type of scheduling policy may not be suitable for different kind of jobs. Inefficient performance of a cluster is an apparent outcome of inappropriate scheduling policies. These policies are either too complex or they are too elementary to understand the diverse jobs and their needs. Most of them follow a fixed pattern, which cannot be considered as a common solution for different jobs. The effect of such a non-fitting mechanism is lower resource utilization and poor cluster performance. In this paper, a pluggable scheduling mechanism is proposed for efficient and adaptive processing of the jobs. It utilizes the Matching Market concept for the allocation and further adaptively accommodates the diverse needs of the multiple jobs by understanding the varying requirements of the tasks. The experimental results reveal an enhanced resource utilization and improved cluster performance with an overall reduction in makespan. In certain instances, we have seen resource utilization improved up to 80% and performance improvement up to 60% with the proposed technique. Cluster efficiency is increased up of 31%. The evaluation and comparisons were conducted on various scheduling policies using different benchmarks of Hadoop with the same data and identical configurations. The proposed system has shown significant improvement in cluster efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems

Article 05 November 2020

A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems

Article 19 March 2020

A Review of Scheduling Algorithms in Hadoop

References

Akbarpour M, Li S, Gharan SO (2014) Dynamic matching market design
Apache. Hadoop yarn. http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html, Accessed on: 16-05-2020
Apache H. Capacity scheduler. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html, Accessed on: 15-05-2020
Apache H. Fair scheduler. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html, Accessed on: 16-05-2020
Apache H. Fifo scheduler. https://hadoop.apache.org/docs/r2.8.2/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/apidocs/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.html, Accessed on: 17-05-2020
Baranowski Z, Kleszcz E, Kothuri P, Canali L, Castellotti R, Marquez MM, de Barros NGM, Motesnitsalis E, Mrowczynski P, Duran JCL (2019) Evolution of the hadoop platform and ecosystem for high energy physics. In EPJ Web of Conferences 214:04058. EDP Sciences
Bloch F, Houy N (2012) Optimal assignment of durable objects to successive agents. Economic Theory 51(1):13–33
Article MathSciNet Google Scholar
Bu X, Rao J, Xu C-Z (2013) Interference and locality-aware task scheduling for mapreduce applications in virtual clusters. In Proceedings of the 22nd international symposium on High-performance parallel and distributed computing 227–238
Callan J, Hoy M, Yoo C, Zhao L (2009) Clueweb09 data set
Chen CP, Zhang C-Y (2014) Data-intensive applications, challenges, techniques and technologies: A survey on big data. Inf Sci 275:314–347
Article Google Scholar
Chen J, Wang D, Zhao W (2013) A task scheduling algorithm for hadoop platform. Journal of Computers 8(4):929–936
Article Google Scholar
Cheng D, Rao J, Guo Y, Jiang C, Zhou X (2016) Improving performance of heterogeneous mapreduce clusters with adaptive task tuning. IEEE Trans Parallel Distrib Syst 28(3):774–786
Article Google Scholar
Chugh A, Sharma VK, Jain C (2020) Big data and query optimization techniques. In Advances in Computing and Intelligent Systems 337–345. Springer
Curino C, Difallah DE, Douglas C, Krishnan S, Ramakrishnan R, Rao S (2014) Reservation-based scheduling: If you’re late don’t blame us! In Proceedings of the ACM Symposium on Cloud Computing 1–14
Delimitrou C, Kozyrakis C (2014) Quasar: resource-efficient and qos-aware cluster management. ACM SIGPLAN Notices 49(4):127–144
Article Google Scholar
Dickerson JP, Procaccia AD, Sandholm T (2012) Dynamic matching via weighted myopia with application to kidney exchange. In Twenty-Sixth AAAI Conference on Artificial Intelligence
Easley D, Kleinberg J et al (2010) Networks, crowds, and markets, volume 8. Cambridge university press Cambridge
Ghodsi A, Zaharia M, Hindman B, Konwinski A, Shenker S, Stoica I (2011) Dominant resource fairness: Fair allocation of multiple resource types. In Nsdi 11:24
Glushkova D, Jovanovic P, Abelló A (2019) Mapreduce performance model for hadoop 2. x. Inf Syst 79:32–43
Grandl R, Ananthanarayanan G, Kandula S, Rao S, Akella A (2014) Multi-resource packing for cluster schedulers. ACM SIGCOMM Computer Communication Review 44(4):455–466
Article Google Scholar
Gummaraju J, Mcdougall R, Nelson M, Griffith R, Magdon-Ismail T, Cheveresan R, Du J (2019) Container virtual machines for hadoop. US Patent 10:193-963
Gupta S, Fritz C, Price B, Hoover R, Dekleer J, Witteveen C (2013) Throughputscheduler: Learning to schedule on heterogeneous hadoop clusters. In Proceedings of the 10th International Conference on Autonomic Computing (ICAC 13) 159–165
Hall B, Jaffe A, Trajtenberg M (2001) The nber patent citations data file: Lessons, insights and methodological tools (nber working paper no. 8498
Hindman B, Konwinski A, Zaharia M, Ghodsi A, Joseph AD, Katz RH, Shenker S, Stoica I (2011) Mesos: A platform for fine-grained resource sharing in the data center. In NSDI 11:22
Hsu J-B, Lin C-F, Chang Y-C, Pan R-H (2020) Using independent resource allocation strategies to solve conflicts of hadoop distributed architecture in virtualization. Clust Comput 1–21
Isard M, Prabhakaran V, Currey J, Wieder U, Talwar K, Goldberg A (2009) Quincy: fair scheduling for distributed computing clusters. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles 261–276
Islam MT, Srirama SN, Karunasekera S, Buyya R (2020) Cost-efficient dynamic scheduling of big data applications in apache spark on cloud. J Syst Softw 162:110515
Article Google Scholar
Javanmardi AK, Yaghoubyan SH, BagheriFard K, Nejatian S, Parvin H (2020) An architecture for scheduling with the capability of minimum share to heterogeneous hadoop systems. J Supercomput 1–30
Kc K, Anyanwu K (2010) Scheduling hadoop jobs to meet deadlines. In 2010 IEEE Second International Conference on Cloud Computing Technology and Science 388–392. IEEE
Khelifa A, Hamrouni T, Mokadem R, Charrada FB (2020) Sla-aware task scheduling and data replication for enhancing provider profit in clouds. Prog Comput Sci 176:3143–3152
Article Google Scholar
Lama P, Zhou X (2012) Aroma: Automated resource allocation and configuration of mapreduce environment in the cloud. In Proceedings of the 9th international conference on Autonomic computing 63–72
Lu H-C, Hwang F, Huang Y-H (2020) Parallel and distributed architecture of genetic algorithm on apache hadoop and spark. Appl Soft Comput 95:106497
Article Google Scholar
Naik NS, Negi A, Bapu BRT, Anitha R (2019) A data locality based scheduler to enhance mapreduce performance in heterogeneous environments. Future Gener Comput Syst 90:423–434
Article Google Scholar
Nithyanantham S, Singaravel G (2020) Resource and cost aware glowworm mapreduce optimization based big data processing in geo distributed data center. Wirel Pers Commun 1–22
Niu Z, Tang S, He B (2015) Gemini: An adaptive performance-fairness scheduler for data-intensive cluster computing. In 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom) 66–73. IEEE
Niu Z, Tang S, He B (2016) An adaptive efficiency-fairness meta-scheduler for data-intensive computing. IEEE Trans Serv Comput
Polo J, Castillo C, Carrera D, Becerra Y, Whalley I, Steinder M, Torres J, Ayguadé E (2011) Resource-aware adaptive scheduling for mapreduce clusters. In ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing 187–207. Springer
Rasooli A, Down DG (2012) A hybrid scheduling approach for scalable heterogeneous hadoop systems. In 2012 SC Companion: High Performance Computing, Networking Storage and Analysis 1284–1291. IEEE
Sharma V, Bala M (2020) An improved task allocation strategy in cloud using modified k-means clustering technique. Egyptian Informatics Journal
Shenker AGMZS, Stoica I (2013) Choosy: Max-min fair sharing for datacenter jobs with constraints
Tang Z, Zhou J, Li K, Li R (2012) Mtsd: A task scheduling algorithm for mapreduce base on deadline constraints. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. IEEE
Thaman J, Singh M (2016) Current perspective in task scheduling techniques in cloud computing: A review. International Journal in Foundations of Computer Science & Technology 6(1):65–85
Article Google Scholar
Usama M, Liu M, Chen M (2017) Job schedulers for big data processing in hadoop environment: testing real-life schedulers using benchmark programs. Digital Communications and Networks 3(4):260–273
Article Google Scholar
Verma A, Cherkasova L, Campbell RH (2012) Two sides of a coin: Optimizing the schedule of mapreduce jobs to minimize their makespan and improve cluster performance. In 2012 IEEE 20th international symposium on modeling, analysis and simulation of computer and telecommunication systems 11–18. IEEE
Wang J, Yao Y, Mao Y, Sheng B, Mi N (2014) Fresh: Fair and efficient slot configuration and scheduling for hadoop clusters. In 2014 IEEE 7th International Conference on Cloud Computing 761–768. IEEE
Wang L, Tao J, Ranjan R, Marten H, Streit A, Chen J, Chen D (2013) G-hadoop: Mapreduce across distributed data centers for data-intensive computing. Futur Gener Comput Syst 29(3):739–750
Article Google Scholar
Wang W, Feng C, Li B, Liang B (2014) On the fairness-efficiency tradeoff for packet processing with multiple resources. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, pages 235–248
Wiktorski T (2019) Hadoop architecture. In Data-intensive Systems 51–61. Springer
Wøhlk S, Laporte G (2017) Computational comparison of several greedy algorithms for the minimum cost perfect matching problem on large graphs. Comput Oper Res 87:107–113
Article MathSciNet Google Scholar
Yahoo. Dataset. https://webscope.sandbox.yahoo.com/, Accessed on: 16-05-2020
Yao Y, Wang J, Sheng B, Lin J, Mi N (2014) Haste: Hadoop yarn scheduling based on task-dependency and resource-demand. In 2014 IEEE 7th International Conference on Cloud Computing 184–191. IEEE
Yao Y, Wang J, Sheng B, Mi N (2013) Using a tunable knob for reducing makespan of mapreduce jobs in a hadoop cluster. In 2013 IEEE Sixth International Conference on Cloud Computing 1–8. IEEE
Zacheilas N, Kalogeraki V (2017) A pareto-based scheduler for exploring cost-performance trade-offs for mapreduce workloads. EURASIP J Embed Syst 2017(1):29
Article Google Scholar
Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European conference on Computer systems 265–278

Download references

Acknowledgements

Authors are thankful to the Yahoo! for providing access to the computing data of cluster.

Author information

Authors and Affiliations

Department of Computer Science, NIT, Jalandhar, India
Balraj Singh
School of Computer Science and Engineering, Lovely Professional University, Phagwara, India
Balraj Singh & Harsh K Verma

Authors

Balraj Singh
View author publications
You can also search for this author in PubMed Google Scholar
Harsh K Verma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Balraj Singh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, B., Verma, H. . EMM: Extended matching market based scheduling for big data platform hadoop. Multimed Tools Appl 81, 34823–34847 (2022). https://doi.org/10.1007/s11042-021-11283-3

Download citation

Received: 10 July 2020
Revised: 07 February 2021
Accepted: 08 July 2021
Published: 11 August 2021
Issue Date: October 2022
DOI: https://doi.org/10.1007/s11042-021-11283-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EMM: Extended matching market based scheduling for big data platform hadoop

Abstract

Access this article

Similar content being viewed by others

An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems

A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems

A Review of Scheduling Algorithms in Hadoop

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

EMM: Extended matching market based scheduling for big data platform hadoop

Abstract

Access this article

Similar content being viewed by others

An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems

A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems

A Review of Scheduling Algorithms in Hadoop

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation