Advertisement

Cluster Computing

, Volume 18, Issue 3, pp 1011–1024 | Cite as

MOMTH: multi-objective scheduling algorithm of many tasks in Hadoop

  • Mihaela-Catalina Nita
  • Florin Pop
  • Cristiana Voicu
  • Ciprian Dobre
  • Fatos Xhafa
Article

Abstract

A real challenge sits in front of the business solutions these days, in the context of the big amount of data generated by complex software applications: efficiently using the given limited resources to accomplish specific operations and tasks. Depending on the type of application dealing with, when trying to deliver a certain service in a specific time and with a limited budget, a sequential application may be redesigned in a convenient way so that it will become scalable and able to run on multiple resources. Many task computing model brings together loosely coupled applications, composed of many dependent/independent tasks, which will work together for a common result. When asking for a certain service, the most frequently constraints addressed by the user are deadline and budget. This paper elaborates on a multi-objective scheduling algorithm of many tasks in Hadoop for big data processing, named MOMTH. We consider objective functions related to users and resources in the same time with constraints like deadline (scheduling in due time) and budget. The algorithm evaluation was realized in scheduling load simulator, a tool integrated in Hadoop. MobiWay, a collaboration platform that expose interoperability between a large number of sensing mobile devices and a wide-range of mobility applications, was chosen for performance analysis of MOMTH. We compared the proposed algorithm with first in first out and fair schedulers and we obtained similar performance for our approach.

Keywords

Task scheduling Hadoop MapReduce Many task computing Big data Cloud computing 

Mathematics Subject Classification

68M20 68M14 68U20 

Notes

Acknowledgments

The research presented in this paper is supported by Projects: CyberWater Grant of the Romanian National Authority for Scientific Research, CNDI-UEFISCDI, project number 47/2012; MobiWay: Mobility Beyond Individualism: an Integrated Platform for Intelligent Transportation Systems of Tomorrow—PN-II-PT-PCCA-2013-4-0321; clueFarm: Information system based on cloud services accessible through mobile devices, to increase product quality and business development farms—PN-II-PT-PCCA-2013-4-0870. This work was also partially supported by COMMAS Project “Computational Models and Methods for Massive Structured Data” (TIN2013-46181-C2-1-R). We would like to thank the reviewers for their time and expertise, constructive comments and valuable insight.

References

  1. 1.
    Abrishami, S., Naghibzadeh, M., Dick, H.J.: Epema. Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Gener. Comput. Syst. 29(1):158–169 (2013). Including Special section: AIRCC-NetCoM 2009 and Special section: Clouds and Service-Oriented ArchitecturesGoogle Scholar
  2. 2.
    Baptiste, P., Brucker, P., Knust, S., Timkovsky, V.: Ten notes on equal-execution-time scheduling. 4OR, 2:111–127 (2004)Google Scholar
  3. 3.
    Baptiste, P.: Scheduling equal-length jobs on identical parallel machines. Discret. Appl. Math. 103(1), 21–32 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Baptiste, P.: A note on scheduling multiprocessor tasks with identical processing times. Comput. Oper. Res. 30(13), 2071–2078 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Bart, I.L.: Urban sprawl and climate change: a statistical exploration of cause and effect, with policy options for the EU. Land Use Policy 27(2), 283–292 (2010). Forest transitions Wind power planning, landscapes and publicsCrossRefGoogle Scholar
  6. 6.
    Bessis, N., Sotiriadis, S., Pop, F., Cristea, V.: Optimizing the energy efficiency of message exchanging for service distribution in interoperable infrastructures. In: 2012 4th International Conference on Intelligent Networking and Collaborative Systems (INCoS), pp. 105–112 Sept 2012Google Scholar
  7. 7.
    Bessis, N., Sotiriadis, S., Pop, F., Cristea, V.: Using a novel message-exchanging optimization (meo) model to reduce energy consumption in distributed systems. Simul. Model. Pract. Theory 39(0), 104–120 (2013). S.I.Energy efficiency in Grids and CloudsCrossRefGoogle Scholar
  8. 8.
    Błażewicz, J., Liu, Z.: Scheduling multiprocessor tasks with chain constraints. Eur. J. Oper. Res. 94(2), 231–241 (1996)CrossRefzbMATHGoogle Scholar
  9. 9.
    Bourdena, A., Mavromoustakis, C.X., Kormentzas, G., Pallis, E., Mastorakis, G.: A resource intensive traffic-aware scheme using energy-aware routing in cognitive radio networks. Future Gener. Comput. Syst. 39(0), 16–28 (2014). Special Issue on Ubiquitous Computing and Future Communication SystemsCrossRefGoogle Scholar
  10. 10.
    Du, J., Leung, J.Y.-T.: Complexity of scheduling parallel task systems. SIAM J. Discrete Math. 2(4), 473–487 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Du, J., Leung, J.Y.-T., Young, G.H.: Scheduling chain-structured tasks to minimize makespan and mean flow time. Inf. Comput. 92(2), 219–236 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Dufour, B., Driesen, K., Hendren, L., Verbrugge, C.: Dynamic metrics for java. SIGPLAN Not. 38(11), 149–168 (2003)CrossRefGoogle Scholar
  13. 13.
    Durillo, J.J., Nae, V., Prodan, R.: Multi-objective energy-efficient workflow scheduling using list-based heuristics. Future Gener. Comput. Syst. 36(0):221–236 (2014). Special Section: Intelligent Big Data Processing Special Section: Behavior Data Security Issues in Network Information Propagation Special Section: Energy-efficiency in Large Distributed Computing Architectures Special Section: eScience Infrastructure and ApplicationsGoogle Scholar
  14. 14.
    EU Parliament. Resolution of 10 september 2013 on promoting a european transport-technology strategy for europe’s future sustainable mobility. http://bit.ly/1vJm2Ho. Oct 2014
  15. 15.
    Facebook. Under the hood: Scheduling mapreduce jobs more efficiently with corona. http://goo.gl/XW9nD7. Oct 2012
  16. 16.
    Fan, Y., Wei, W., Gao, Y., Wu, W.: Introduction and analysis of simulators of mapreduce. Trustworthy Comput. Serv. pp 345–350. Springer, (2014)Google Scholar
  17. 17.
    Garey, M.R., Johnson, D.S.: “Strong” NP-completeness results: motivation, examples, and implications. J. Assoc. Comput. Mach. 25(3), 499–508 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Guo, Z., Fox, G.: Improving mapreduce performance in heterogeneous network environments and resource utilization. In: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (Ccgrid 2012), CCGRID ’12, pp. 714–716, Washington 2012. IEEE Computer SocietyGoogle Scholar
  19. 19.
    Guo, L., Zhao, S., Shen, S., Jiang, C.: Task scheduling optimization in cloud computing based on heuristic algorithm. J. Netw. 7(3), 547–553 (2012)Google Scholar
  20. 20.
    Ibrahim, S., Phan, T.-D., Carpen-Amarie, A., Chihoub, H.-E., Moise, D., Antoniu, G.: Governing energy consumption in hadoop through CPU frequency scaling: an analysis. Future Gener. Comput. Syst. http://www.sciencedirect.com/science/article/pii/S0167739X15000060 (2015)
  21. 21.
    Kc, K., Anyanwu, K.: Scheduling hadoop jobs to meet deadlines. In: Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science, CLOUDCOM ’10, pp. 388–392, Washington, DC, USA, 2010. IEEE Computer SocietyGoogle Scholar
  22. 22.
    Lawler, E.L., Lenstra, J.K., Kan, A.H.G.R., Shmoys, D.B.: Sequencing and Scheduling: Algorithms and Complexity, Volume 4 of Operations Research and Managment Science. CWI, Amsterdam (1989)Google Scholar
  23. 23.
    Mavromoustakis, C.X., Dimitriou, C., Mastorakis, G., Bourdena, A., Pallis, E.: Using traffic diversities for scheduling wireless interfaces for energy harvesting in wireless devices. In Resource Management in Mobile Computing Environments, volume 3 of Modeling and Optimization in Science and Technologies, pp 481–496. Springer International Publishing (2014)Google Scholar
  24. 24.
    Mavromoustakis, C.X., Pallis, E., Mastorakis, G.: Resource Management in Mobile Computing Environments. Springer, Berlin (2014)Google Scholar
  25. 25.
    Nguyen, P., Simon, T., Halem, M., Chapman, D., Le, Q.: A hybrid scheduling algorithm for data intensive workloads in a mapreduce environment. In: Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing, UCC ’12, pp. 161–167, Washington, DC, USA, 2012. IEEE Computer SocietyGoogle Scholar
  26. 26.
    Nita, M.-C., Chilipirea, C., Dobre, C., Pop, F.: A sla-based method for big-data transfers with multi-criteria optimization constraints for iaas. In: Roedunet International Conference (RoEduNet), 2013 11th, pp 1–6 (2013)Google Scholar
  27. 27.
    Nita, M.-C., Pop, F., Cristea, V.: Scheduling service with sla assurance for private cloud systems. In: 2012 IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), pp. 331–334, Aug 2012Google Scholar
  28. 28.
    Pandey, S., Buyya, R.: Scheduling workflow applications based on multi-source parallel data retrieval in distributed computing networks. Comput. J. 55(11), 1288–1308 (2012)CrossRefGoogle Scholar
  29. 29.
    Raicu, I., Foster, I.T., Zhao, Y.: Many-task computing for grids and supercomputers. In: Workshop on Many-Task Computing on Grids and Supercomputers, 2008. MTAGS 2008. pp. 1–11 (2008)Google Scholar
  30. 30.
    Rong, G., Yang, X., Yan, J., Sun, Y., Wang, B., Yuan, C., Huang, Y.: Shadoop: improving mapreduce performance by optimizing job execution mechanism in hadoop clusters. J. Parallel Distrib. Comput. 74(3), 2166–2179 (2014)Google Scholar
  31. 31.
    Simon, T.A., Nguyen, P., Halem, M.: Multiple objective scheduling of hpc workloads through dynamic prioritization. In: Proceedings of the High Performance Computing Symposium, HPC ’13, pp. 13:1–13:8, San Diego, CA, USA, 2013. Society for Computer Simulation InternationalGoogle Scholar
  32. 32.
    Staples, G.: Torque resource manager. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC ’06, New York, NY, USA, 2006. ACMGoogle Scholar
  33. 33.
    Vasile, M.-A., Pop, F., Tutueanu, R.-I., Cristea, V., Kołodziej, J.: Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Future Gener. Comput. Syst. http://www.sciencedirect.com/science/article/pii/S0167739X14002532 (2014)
  34. 34.
    Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E.: Apache hadoop yarn: Yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC ’13, pp. 5:1–5:16, New York, NY, USA, 2013. ACMGoogle Scholar
  35. 35.
    Voicu, C., Pop, F., Dobre, C., Xhafa, F.: Momc: Multi-objective and multi-constrained scheduling algorithm of many tasks in hadoo. In 3PGCIC-2014, The 9-th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing. IEEE Explore Nov 2014Google Scholar
  36. 36.
    Wang, L., Khan, S.U., Chen, D., Kołodziej, J., Ranjan, R., Xu, C.Z., Zomaya, A.: Energy-aware parallel task scheduling in a cluster. Future Gener. Comput. Syst. 29(7):1661–1670, 2013. Including Special sections: cyber-enabled Distributed Computing for Ubiquitous Cloud and Network Services, Cloud Computing and Scientific Applications—Big Data, Scalable Analytics, and BeyondGoogle Scholar
  37. 37.
    Wang, L., von Laszewski, G., Younge, A., He, X., Kunze, M., Tao, J., Cheng, F.: Cloud computing: a perspective study. New Gener. Comput. 28(2), 137–146 (2010)CrossRefzbMATHGoogle Scholar
  38. 38.
    Wang, L., Tao, J., Ranjan, R., Marten, H., Streit, A., Chen, J., Chen, D.: G-hadoop: mapreduce across distributed data centers for data-intensive computing. Future Gener. Comput. Syst. 29(3), 739–750 (2013). Special Section: Recent Developments in High Performance Computing and SecurityCrossRefGoogle Scholar
  39. 39.
    Xia, Y., Wang, L., Zhao, Q., Zhang, G.: Research on job scheduling algorithm in hadoop. J. Comput. Inf. Syst. 7(16), 5769–5775 (2011)Google Scholar
  40. 40.
    Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pp. 29–42, Berkeley, CA, USA, 2008. USENIX AssociationGoogle Scholar
  41. 41.
    Zhang, F., Cao, J., Li, K., Khan, S.U., Hwang, K.: Multi-objective scheduling of many tasks in cloud platforms. Future Gener. Comput. Syst. 37(0):309–320 (2014). Special Section: Innovative Methods and Algorithms for Advanced Data-Intensive Computing Special Section: Semantics, Intelligent processing and services for big data Special Section: Advances in Data-Intensive Modelling and Simulation Special Section: Hybrid Intelligence for Growing Internet and its ApplicationsGoogle Scholar
  42. 42.
    Zhao, J., Wang, L., Tao, J., Chen, J., Sun, W., Ranjan, R., Kolodziej, J., Streit, A., Georgakopoulos, D.: A security framework in g-hadoop for big data computing across distributed cloud data centres. J. Comput. Syst. Sci. 80(5):994–1007 (2014). cited By (since 1996)0Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Computer Science Department, Faculty of Automatic Control and ComputersUniversity Politehnica of BucharestBucharestRomania
  2. 2.Universitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations