Skip to main content
Log in

Allocating MapReduce workflows with deadlines to heterogeneous servers in a cloud data center

  • Special Issue Paper
  • Published:
Service Oriented Computing and Applications Aims and scope Submit manuscript

Abstract

Total profit is one of the most important factors to be considered from the perspective of resource providers. In this paper, an original MapReduce workflow scheduling with deadline and data locality is proposed to maximize total profit of resource providers. A new workflow conversion based on dynamic programming and ChainMap/ChainReduce is designed to decrease transmission times among MapReduce jobs of workflows. A new deadline division considering execution time, float time and job level is proposed to obtain better deadlines of MapReduce jobs in workflows. With the adapted replica strategy in MapReduce workflow, a new task scheduling is proposed to improve data locality which assigns tasks to servers with the earliest completion time in order to ensure resource providers obtain more profit. Experimental results show that the proposed heuristic results in larger total profit than other adopted algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Zaharia M, Chowdhury M, Franklin M et al (2010) Spark: cluster computing with working sets. In: Usenix conference on hot topics in cloud computing, pp 1765–1773

  2. Li L, Ma Z, Liu L et al (2013) Hadoop-based ARIMA algorithm and its application in weather forecast. Int J Database Theory Appl 6(5):119–132

    Google Scholar 

  3. Xun Y, Zhang J, Qin X (2017) FiDoop: parallel mining of frequent itemsets using MapReduce. IEEE Trans Syst Man Cybern Syst 46(3):313–325

    Google Scholar 

  4. Wang Y, Shi W (2014) Budget-driven scheduling algorithms for batches of MapReduce jobs in heterogeneous clouds. IEEE Trans Cloud Comput 2(3):306–319

    Google Scholar 

  5. Tiwari N, Sarkar S, Bellur U et al (2015) Classification framework of MapReduce scheduling algorithms. ACM Comput Surv 47(3):1–49

    Google Scholar 

  6. Bu Y, Howe B, Balazinska M et al (2012) The HaLoop approach to large-scale iterative data analysis. VLDB J 21(2):169–190

    Google Scholar 

  7. Gunarathne T, Zhang B, Wu T et al (2013) Scalable parallel computing on clouds using Twister4Azure iterative MapReduce. Future Gener Comput Syst 29(4):1035–1048

    Google Scholar 

  8. Zhang Y, Gao Q, Gao L et al (2012) iMapReduce: a distributed computing framework for iterative computation. J Grid Comput 10(1):47–68

    Google Scholar 

  9. Dong X, Wang Y, Liao H (2011) Scheduling mixed real-time and non-real-time applications in MapReduce environment. In: International conference on parallel and distributed systems, pp 9–16

  10. Tang Z, Zhou J, Li K et al (2013) A MapReduce task scheduling algorithm for deadline constraints. Clust Comput 16(4):651–662

    Google Scholar 

  11. Zhang W, Rajasekaran S, Wood T et al (2014) MIMP: deadline and interference aware scheduling of Hadoop virtual machines. In: International symposium on cluster, cloud and grid computing, pp 394–403

  12. Teng F, Magoulès F, Yu L et al (2014) A novel real-time scheduling algorithm and performance analysis of a MapReduce-based cloud. J Supercomput 69(2):739–765

    Google Scholar 

  13. Palanisamy B, Singh A, Liu L (2015) Cost-effective resource provisioning for MapReduce in a cloud. IEEE Trans Parallel Distrib Syst 26(5):1265–1279

    Google Scholar 

  14. Hashem I, Anuar N, Marjani M et al (2018) Multi-objective scheduling of MapReduce jobs in big data processing. Multimed Tools Appl 77(8):9979–9994

    Google Scholar 

  15. Xu X, Tang M, Tian Y (2017) QoS-guaranteed resource provisioning for cloud-based MapReduce in dynamical environments. Future Gener Comput Syst 78(1):18–30

    Google Scholar 

  16. Li H, Wei X, Fu Q et al (2014) MapReduce delay scheduling with deadline constraint. Concurr Comput Pract Exp 26(3):766–778

    Google Scholar 

  17. Polo J, Becerra Y, Carrera D et al (2013) Deadline-based MapReduce workload management. IEEE Trans Netw Serv Manag 10(2):231–244

    Google Scholar 

  18. Chen C, Lin J, Kuo S (2018) MapReduce scheduling for deadline-constrained jobs in heterogeneous cloud computing systems. IEEE Trans Cloud Comput 6(1):127–140

    Google Scholar 

  19. Kao Y, Chen Y (2016) Data-locality-aware MapReduce real-time scheduling framework. J Syst Softw 112:65–77

    Google Scholar 

  20. Bok K, Hwang J, Lim J et al (2017) An efficient MapReduce scheduling scheme for processing large multimedia data. Multimed Tools Appl 76(16):1–24

    Google Scholar 

  21. Chen Y, Borthakur D, Borthakur D et al (2012) Energy efficiency for large-scale MapReduce workloads with significant interactive analysis. In: ACM european conference on computer systems, pp 43–56

  22. Mashayekhy L, Nejad M, Grosu D et al (2015) Energy-aware scheduling of MapReduce jobs for big data applications. IEEE Trans Parallel Distrib Syst 26(10):2720–2733

    Google Scholar 

  23. Lei H, Zhang T, Liu Y et al (2015) SGEESS: smart green energy-efficient scheduling strategy with dynamic electricity price for data center. J Syst Softw 108:23–38

    Google Scholar 

  24. Oliveira D, Ocana K, Baiao F et al (2012) A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J Grid Comput 10(3):521–552

    Google Scholar 

  25. Li S, Hu S, Abdelzaher T (2015) The packing server for real-time scheduling of MapReduce workflows. In: IEEE real-time and embedded technology and applications symposium, pp 51–62

  26. Cai Z, Li X, Ruiz R et al (2017) A delay-based dynamic scheduling algorithm for bag-of-task workflows with stochastic task execution times in clouds. Future Gener Comput Syst 71:57–72

    Google Scholar 

  27. Cai Z, Li X, Ruiz R (2017) Resource provisioning for task-batch based workflows with deadlines in public clouds. IEEE Trans Cloud Comput. https://doi.org/10.1109/TCC.2017.2663426

    Article  Google Scholar 

  28. Cai Z, Li X, Gupta J (2016) Heuristics for provisioning services to workflows in XaaS clouds. IEEE Trans Serv Comput 9(2):250–263

    Google Scholar 

  29. Li X, Cai Z (2017) Elastic resource provisioning for cloud workflow applications. IEEE Trans Autom Sci Eng 14(2):1195–1210

    Google Scholar 

  30. Tang Z, Liu M, Ammar A et al (2014) An optimized MapReduce workflow scheduling algorithm for heterogeneous computing. J Supercomput 72(6):1–21

    Google Scholar 

  31. Xu C, Yang J, Yin K et al (2017) Optimal construction of virtual networks for cloud-based MapReduce workflows. Comput Netw 112:194–207

    Google Scholar 

  32. Chiara S, Danilo A, Gianpaolo C et al (2013) Optimizing service selection and allocation in situational computing applications. IEEE Trans Serv Comput 6(3):414–428

    Google Scholar 

  33. Baresi L, Elisabetta D, Carlo G et al (2007) A framework for the deployment of adaptable web service compositions. Serv Oriented Comput Appl 1(1):75–91

    Google Scholar 

  34. Lim H, Herodotou H, Babu S (2012) Stubby: a transformation-based optimizer for MapReduce workflows. VLDB Endow 5(11):1196–1207

    Google Scholar 

  35. Ke H, Li P, Guo S et al (2016) On traffic-aware partition and aggregation in MapReduce for big data applications. IEEE Trans Parallel Distrib Syst 27(3):818–828

    Google Scholar 

  36. Yu W, Wang Y, Que X et al (2015) Virtual shuffling for efficient data movement in MapReduce. IEEE Trans Comput 64(2):556–568

    MathSciNet  MATH  Google Scholar 

  37. Chowdhury M, Zaharia M, Ma J et al (2011) Managing data transfers in computer clusters with orchestra. ACM SIGCOMM Comput Commun 41(4):98–109

    Google Scholar 

  38. Guo D, Xie J, Zhou X et al (2015) Exploiting efficient and scalable shuffle transfers in future data center network. IEEE Trans Parallel Distrib Syst 26(4):997–1009

    Google Scholar 

  39. Li D, Yu Y, He W et al (2015) Willow: saving data center network energy for network-limited flows. IEEE Trans Parallel Distrib Syst 26(9):2610–2620

    Google Scholar 

  40. Tan J, Meng X, Zhang L (2013) Coupling task progress for MapReduce resource-aware scheduling. In: IEEE INFOCOM, pp 1618–1626

  41. Hammoud M, Rehman M, Sakr M (2012) Center-of-gravity reduce task scheduling to lower MapReduce network traffic. In: International conference on cloud computing, pp 49–58

  42. Guo Z, Fox G, Zhou M et al (2012) Improving resource utilization in MapReduce. In: International conference on cluster computing, pp 402–410

  43. Fischer M, Su X, Yin Y (2010) Assigning tasks for efficiency in Hadoop. In: Proceedings of the 22nd ACM symposium on parallelism in algorithms and architectures, pp 30–39

  44. Zhu Y, Jiang Y, Wu W et al (2014) Minimizing makespan and total completion time in MapReduce-like systems. In: IEEE INFOCOM, pp 2166–2174

  45. Kavulya S, Tan J, Gandhi R et al (2010) An analysis of traces from a production MapReduce cluster. In: IEEE/ACM international conference on cluster, cloud and grid computing, pp 94–103

  46. Abrishami S, Naghibzadeh M, Epema D (2013) Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service clouds. Future Gener Comput Syst 29(1):158–169

    Google Scholar 

  47. Fernando B, Edmundo R (2010) Towards the scheduling of multiple workflows on computational grids. J Grid Comput 8(3):419–441

    Google Scholar 

  48. Tiwari N, Sarkar S, Bellur U et al (2015) Classification framework of MapReduce scheduling algorithms. ACM Comput Surv 47(3):1–38

    Google Scholar 

  49. Verma A, Cherkasova L, Campbell R (2013) Orchestrating an ensemble of MapReduce jobs for minimizing their makespan. IEEE Trans Dependable Secur Comput 10(5):314–327

    Google Scholar 

  50. Heintz B, Chandra A, Sitaraman R et al (2017) End-to-end optimization for geo-distributed MapReduce. IEEE Trans Cloud Comput 4(3):293–306

    Google Scholar 

  51. Chen L, Li X (2018) Cloud workflow scheduling with hybrid resource provisioning. J Supercomput 74(12):6529–6553

    Google Scholar 

  52. Li X, Jiang T, Ruiz R (2016) Heuristics for periodical batch job scheduling in a MapReduce computing framework. Inf Sci 326:119–133

    MATH  Google Scholar 

  53. Vanhoucheabcd M, Maenhout B, Tavares L (2008) An evaluation of the adequacy of project network generators with systematically sampled networks. Eur J Oper Res 187(2):511–524

    MATH  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Key Research and Development Program of China (No. 2017YFB1400801), the National Natural Science Foundation of China (Nos. 61872077, 61832004) and Collaborative Innovation Center of Wireless Communications Technology. Rubén Ruiz is partly supported by the Spanish Ministry of Science, Innovation, and Universities, under the project “OPTEP-Port Terminal Operations Optimization” (No. RTI2018-094940-B-I00) financed with FEDER funds”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoping Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Li, X., Ruiz, R. et al. Allocating MapReduce workflows with deadlines to heterogeneous servers in a cloud data center. SOCA 14, 101–118 (2020). https://doi.org/10.1007/s11761-020-00290-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11761-020-00290-1

Keywords

Navigation