Skip to main content
Log in

EMM: Extended matching market based scheduling for big data platform hadoop

  • 1174: Futuristic Trends and Innovations in Multimedia Systems Using Big Data, IoT and Cloud Technologies (FTIMS)
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Hadoop has emerged as a popular choice for processing Big data. Its cluster is used to process large scale jobs. The performance of a cluster is largely dependent upon the different kind of scheduling policies employed for job processing. However, a single type of scheduling policy may not be suitable for different kind of jobs. Inefficient performance of a cluster is an apparent outcome of inappropriate scheduling policies. These policies are either too complex or they are too elementary to understand the diverse jobs and their needs. Most of them follow a fixed pattern, which cannot be considered as a common solution for different jobs. The effect of such a non-fitting mechanism is lower resource utilization and poor cluster performance. In this paper, a pluggable scheduling mechanism is proposed for efficient and adaptive processing of the jobs. It utilizes the Matching Market concept for the allocation and further adaptively accommodates the diverse needs of the multiple jobs by understanding the varying requirements of the tasks. The experimental results reveal an enhanced resource utilization and improved cluster performance with an overall reduction in makespan. In certain instances, we have seen resource utilization improved up to 80% and performance improvement up to 60% with the proposed technique. Cluster efficiency is increased up of 31%. The evaluation and comparisons were conducted on various scheduling policies using different benchmarks of Hadoop with the same data and identical configurations. The proposed system has shown significant improvement in cluster efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Akbarpour M, Li S, Gharan SO (2014) Dynamic matching market design

  2. Apache. Hadoop yarn. http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html, Accessed on: 16-05-2020

  3. Apache H. Capacity scheduler. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html, Accessed on: 15-05-2020

  4. Apache H. Fair scheduler. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html, Accessed on: 16-05-2020

  5. Apache H. Fifo scheduler. https://hadoop.apache.org/docs/r2.8.2/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/apidocs/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.html, Accessed on: 17-05-2020

  6. Baranowski Z, Kleszcz E, Kothuri P, Canali L, Castellotti R, Marquez MM, de Barros NGM, Motesnitsalis E, Mrowczynski P, Duran JCL (2019) Evolution of the hadoop platform and ecosystem for high energy physics. In EPJ Web of Conferences 214:04058. EDP Sciences

  7. Bloch F, Houy N (2012) Optimal assignment of durable objects to successive agents. Economic Theory 51(1):13–33

    Article  MathSciNet  Google Scholar 

  8. Bu X, Rao J, Xu C-Z (2013) Interference and locality-aware task scheduling for mapreduce applications in virtual clusters. In Proceedings of the 22nd international symposium on High-performance parallel and distributed computing 227–238

  9. Callan J, Hoy M, Yoo C, Zhao L (2009) Clueweb09 data set

  10. Chen CP, Zhang C-Y (2014) Data-intensive applications, challenges, techniques and technologies: A survey on big data. Inf Sci 275:314–347

    Article  Google Scholar 

  11. Chen J, Wang D, Zhao W (2013) A task scheduling algorithm for hadoop platform. Journal of Computers 8(4):929–936

    Article  Google Scholar 

  12. Cheng D, Rao J, Guo Y, Jiang C, Zhou X (2016) Improving performance of heterogeneous mapreduce clusters with adaptive task tuning. IEEE Trans Parallel Distrib Syst 28(3):774–786

    Article  Google Scholar 

  13. Chugh A, Sharma VK, Jain C (2020) Big data and query optimization techniques. In Advances in Computing and Intelligent Systems 337–345. Springer

  14. Curino C, Difallah DE, Douglas C, Krishnan S, Ramakrishnan R, Rao S (2014) Reservation-based scheduling: If you’re late don’t blame us! In Proceedings of the ACM Symposium on Cloud Computing 1–14

  15. Delimitrou C, Kozyrakis C (2014) Quasar: resource-efficient and qos-aware cluster management. ACM SIGPLAN Notices 49(4):127–144

    Article  Google Scholar 

  16. Dickerson JP, Procaccia AD, Sandholm T (2012) Dynamic matching via weighted myopia with application to kidney exchange. In Twenty-Sixth AAAI Conference on Artificial Intelligence

  17. Easley D, Kleinberg J et al (2010) Networks, crowds, and markets, volume 8. Cambridge university press Cambridge

  18. Ghodsi A, Zaharia M, Hindman B, Konwinski A, Shenker S, Stoica I (2011) Dominant resource fairness: Fair allocation of multiple resource types. In Nsdi 11:24

  19. Glushkova D, Jovanovic P, Abelló A (2019) Mapreduce performance model for hadoop 2. x. Inf Syst 79:32–43

  20. Grandl R, Ananthanarayanan G, Kandula S, Rao S, Akella A (2014) Multi-resource packing for cluster schedulers. ACM SIGCOMM Computer Communication Review 44(4):455–466

    Article  Google Scholar 

  21. Gummaraju J, Mcdougall R, Nelson M, Griffith R, Magdon-Ismail T, Cheveresan R, Du J (2019) Container virtual machines for hadoop. US Patent 10:193-963

  22. Gupta S, Fritz C, Price B, Hoover R, Dekleer J, Witteveen C (2013) Throughputscheduler: Learning to schedule on heterogeneous hadoop clusters. In Proceedings of the 10th International Conference on Autonomic Computing (ICAC 13) 159–165

  23. Hall B, Jaffe A, Trajtenberg M (2001) The nber patent citations data file: Lessons, insights and methodological tools (nber working paper no. 8498

  24. Hindman B, Konwinski A, Zaharia M, Ghodsi A, Joseph AD, Katz RH, Shenker S, Stoica I (2011) Mesos: A platform for fine-grained resource sharing in the data center. In NSDI 11:22

  25. Hsu J-B, Lin C-F, Chang Y-C, Pan R-H (2020) Using independent resource allocation strategies to solve conflicts of hadoop distributed architecture in virtualization. Clust Comput 1–21

  26. Isard M, Prabhakaran V, Currey J, Wieder U, Talwar K, Goldberg A (2009) Quincy: fair scheduling for distributed computing clusters. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles 261–276

  27. Islam MT, Srirama SN, Karunasekera S, Buyya R (2020) Cost-efficient dynamic scheduling of big data applications in apache spark on cloud. J Syst Softw 162:110515

    Article  Google Scholar 

  28. Javanmardi AK, Yaghoubyan SH, BagheriFard K, Nejatian S, Parvin H (2020) An architecture for scheduling with the capability of minimum share to heterogeneous hadoop systems. J Supercomput 1–30

  29. Kc K, Anyanwu K (2010) Scheduling hadoop jobs to meet deadlines. In 2010 IEEE Second International Conference on Cloud Computing Technology and Science 388–392. IEEE

  30. Khelifa A, Hamrouni T, Mokadem R, Charrada FB (2020) Sla-aware task scheduling and data replication for enhancing provider profit in clouds. Prog Comput Sci 176:3143–3152

    Article  Google Scholar 

  31. Lama P, Zhou X (2012) Aroma: Automated resource allocation and configuration of mapreduce environment in the cloud. In Proceedings of the 9th international conference on Autonomic computing 63–72

  32. Lu H-C, Hwang F, Huang Y-H (2020) Parallel and distributed architecture of genetic algorithm on apache hadoop and spark. Appl Soft Comput 95:106497

    Article  Google Scholar 

  33. Naik NS, Negi A, Bapu BRT, Anitha R (2019) A data locality based scheduler to enhance mapreduce performance in heterogeneous environments. Future Gener Comput Syst 90:423–434

    Article  Google Scholar 

  34. Nithyanantham S, Singaravel G (2020) Resource and cost aware glowworm mapreduce optimization based big data processing in geo distributed data center. Wirel Pers Commun 1–22

  35. Niu Z, Tang S, He B (2015) Gemini: An adaptive performance-fairness scheduler for data-intensive cluster computing. In 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom) 66–73. IEEE

  36. Niu Z, Tang S, He B (2016) An adaptive efficiency-fairness meta-scheduler for data-intensive computing. IEEE Trans Serv Comput

  37. Polo J, Castillo C, Carrera D, Becerra Y, Whalley I, Steinder M, Torres J, Ayguadé E (2011) Resource-aware adaptive scheduling for mapreduce clusters. In ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing 187–207. Springer

  38. Rasooli A, Down DG (2012) A hybrid scheduling approach for scalable heterogeneous hadoop systems. In 2012 SC Companion: High Performance Computing, Networking Storage and Analysis 1284–1291. IEEE

  39. Sharma V, Bala M (2020) An improved task allocation strategy in cloud using modified k-means clustering technique. Egyptian Informatics Journal

  40. Shenker AGMZS, Stoica I (2013) Choosy: Max-min fair sharing for datacenter jobs with constraints

  41. Tang Z, Zhou J, Li K, Li R (2012) Mtsd: A task scheduling algorithm for mapreduce base on deadline constraints. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. IEEE

  42. Thaman J, Singh M (2016) Current perspective in task scheduling techniques in cloud computing: A review. International Journal in Foundations of Computer Science & Technology 6(1):65–85

    Article  Google Scholar 

  43. Usama M, Liu M, Chen M (2017) Job schedulers for big data processing in hadoop environment: testing real-life schedulers using benchmark programs. Digital Communications and Networks 3(4):260–273

    Article  Google Scholar 

  44. Verma A, Cherkasova L, Campbell RH (2012) Two sides of a coin: Optimizing the schedule of mapreduce jobs to minimize their makespan and improve cluster performance. In 2012 IEEE 20th international symposium on modeling, analysis and simulation of computer and telecommunication systems 11–18. IEEE

  45. Wang J, Yao Y, Mao Y, Sheng B, Mi N (2014) Fresh: Fair and efficient slot configuration and scheduling for hadoop clusters. In 2014 IEEE 7th International Conference on Cloud Computing 761–768. IEEE

  46. Wang L, Tao J, Ranjan R, Marten H, Streit A, Chen J, Chen D (2013) G-hadoop: Mapreduce across distributed data centers for data-intensive computing. Futur Gener Comput Syst 29(3):739–750

    Article  Google Scholar 

  47. Wang W, Feng C, Li B, Liang B (2014) On the fairness-efficiency tradeoff for packet processing with multiple resources. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, pages 235–248

  48. Wiktorski T (2019) Hadoop architecture. In Data-intensive Systems 51–61. Springer

  49. Wøhlk S, Laporte G (2017) Computational comparison of several greedy algorithms for the minimum cost perfect matching problem on large graphs. Comput Oper Res 87:107–113

    Article  MathSciNet  Google Scholar 

  50. Yahoo. Dataset. https://webscope.sandbox.yahoo.com/, Accessed on: 16-05-2020

  51. Yao Y, Wang J, Sheng B, Lin J, Mi N (2014) Haste: Hadoop yarn scheduling based on task-dependency and resource-demand. In 2014 IEEE 7th International Conference on Cloud Computing 184–191. IEEE

  52. Yao Y, Wang J, Sheng B, Mi N (2013) Using a tunable knob for reducing makespan of mapreduce jobs in a hadoop cluster. In 2013 IEEE Sixth International Conference on Cloud Computing 1–8. IEEE

  53. Zacheilas N, Kalogeraki V (2017) A pareto-based scheduler for exploring cost-performance trade-offs for mapreduce workloads. EURASIP J Embed Syst 2017(1):29

    Article  Google Scholar 

  54. Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European conference on Computer systems 265–278

Download references

Acknowledgements

Authors are thankful to the Yahoo! for providing access to the computing data of cluster.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Balraj Singh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, B., Verma, H. . EMM: Extended matching market based scheduling for big data platform hadoop. Multimed Tools Appl 81, 34823–34847 (2022). https://doi.org/10.1007/s11042-021-11283-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11283-3

Keywords

Navigation