Skip to main content
Log in

Replication and data management-based workflow scheduling algorithm for multi-cloud data centre platform

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Scientific workflow applications have a large amount of tasks and data sets to be processed in a systematic manner. These applications benefit from cloud computing platform that offer access to virtually limitless resources provisioned elastically and on demand. Running data-intensive scientific workflow on geographically distributed data centres faces massive amount of data transfer. That affects the whole execution time and monitory cost of scientific workflows. The existing efforts on scheduling workflow concentrate on decreasing make span and budget; little concern has been paid to contemplate tasks and data sets dependency. In this paper, we introduced workflow scheduling technique to overcome data transfer and execute workflow tasks within deadline and budget constraints. The proposed techniques consist of initial data placement stage, which clusters and distributes datasets based on their dependence and replication-based partial critical path (R-PCP) technique which schedules tasks with data locality and dynamically maintains dependency matrix for the placement of generated data sets. To reduce run time datasets movement, we use interdata centre tasks replication and data sets replication to make sure data sets availability. Simulation results with four workflow applications illustrate that our strategy efficiently reduces data movement and executes all chosen workflows within user specified budget and deadline. Results reveal that R-PCP has 44.93% and 31.37% less data movement compared to random and adaptive data-aware scheduling (ADAS) techniques, respectively. R-PCP has 26.48% less energy consumption compared with ADAS technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Deelman E, Blythe J, Gil Y, Kesselman C, Mehta G, Patil S, Su M, Vahi K, Livny M (2004) Pegasus: mapping scientific workflows onto the grid. Grid Comput, pp 11–20

  2. Oinn T et al (2004) Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics J 20(17):3045–3054

    Article  Google Scholar 

  3. Ludäscher B, Altintas I, Berkley C, Higgins D, Jaeger E, Jones M, Lee E, Tao J, Zhao Y (2006) Scientific workflow management and the Kepler system. Concurr Comput Pract Exp Work Grid Syst 18:1039–1065

    Article  Google Scholar 

  4. Buyya R, Buyya R, Yeo CS, Yeo CS, Venugopal S, Venugopal S, Broberg J, Broberg J, Brandic I, Brandic I (2009) Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Futur Gener Comput Syst 25(6):17

    Article  Google Scholar 

  5. Jackson KR, Ramakrishnan L, Muriki K, Canon S, Cholia S, Shalf J, Wasserman HJ, Wright NJ (2010) Performance analysis of high performance computing applications on the amazon web services cloud. In: 2nd IEEE International Conference on Cloud Computing Technology Science, pp 159–168

  6. Deelman E, Chervenak A (2008) Data management challenges of data-intensive scientific workflows. In: 2008 8th IEEE International Symposium Cluster Computing Grid, pp 687–692

  7. Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows. Futur Gener Comput Syst 26(8):1200–1214

    Article  Google Scholar 

  8. Kosar T, Livny M (2004) Stork: making data placement a first class citizen in the grid. In: ICDCS ’04 24th International Conference Distributed Computer Systems, vol 0, pp 342–349

  9. Casas I, Taheri J, Ranjan R, Wang L, Zomaya AY (2016) A balanced scheduler with data reuse and replication for scientific workflows in cloud computing systems. Future Gener Comput Syst 74:168–178

    Article  Google Scholar 

  10. Ghemawat S, Gobioff H, Leung ST (2003) The google file system. In: ACM SIGOPS Operating Systems Review 37(5), p 43

  11. Shvachko K, Hairong K, Radia S, Chansler R (2010) The hadoop distributed file system, mass storage systems and technologies(MSST). In: 2010 IEEE 26th Symposium on, 2010, pp 1–10

  12. Lee YC, Han H, Zomaya AY, Yousif M (2015) Resource-efficient workflow scheduling in clouds. Knowl Based Syst 80:153–162

    Article  Google Scholar 

  13. Wu F, Wu Q, Tan Y, Li R, Wang W (2016) PCP-B2: partial critical path budget balanced scheduling algorithms for scientific work flow applications. Future Gener Comput Syst 60:22–34

    Article  Google Scholar 

  14. Calheiros RN, Buyya R (2014) Meeting deadlines of scientific workflows in public clouds with tasks replication. In: IEEE Transactions on Parallel and Distributed Systems, vol 25, no 7, July 2014

  15. Andronikou V, Mamouras K, Tserpes K, Kyriazis D, Varvarigou T (2012) Dynamic QoS-aware data replication in grid environments based on data ‘importance.’ Futur Gener Comput Syst 28(3):544–553

    Article  Google Scholar 

  16. Vairavanathan E, Al-Kiswany S, Costa LB, Zhang Z, Katz DS, Wilde M, Ripeanu M (2012) A workflow-aware storage system: an opportunity study. In: Proceedings of 12th IEEE/ACM International Symposium Cluster Cloud and Grid Computing CCGrid 2012, pp 326–334

  17. Abrishami S, Naghibzadeh M, Epema DHJ (2013) Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service Clouds. Futur Gener Comput Syst 29(1):158–169

    Article  Google Scholar 

  18. Rezaeian A, Naghibzadeh M, Epema DHJ (2019) Fair multiple-workflow scheduling with different quality-of-service goals. J Supercomput 75(2):746–769

    Article  Google Scholar 

  19. Chen H, Zhu J, Zhang Z et al (2017) Real-time workflows oriented online scheduling in uncertain cloud environment. J Super Comput 73:4906–4921

    Article  Google Scholar 

  20. Zeng L, Veeravalli B, Li X (2015) SABA: a security-aware and budget-aware workflow scheduling strategy in clouds. J Parallel Distrib Comput 75:141–151

    Article  Google Scholar 

  21. Pandey S, Wu L, Guru SM, Buyya R (2010) A particle swarm optimization-based heuristic for scheduling workflow applications in cloud computing environments. In: Proceedings of International Conference on Advanced Information Networking and Applications AINA, pp 400–407

  22. Lee YC, Zomaya AY (2010) Rescheduling for reliable job completion with the support of clouds. Futur Gener Comput Syst 26(8):1192–1199

    Article  Google Scholar 

  23. Zeng L, Veeravalli B, Zomaya AY (2015) An integrated task computation and data management scheduling strategy for workflow applications in cloud environments. J Netw Comput Appl 50:39–48

    Article  Google Scholar 

  24. Yu J, Buyya R (2004) A novel architecture for realizing grid workflow using tuple spaces. In: GRID ’04: Proceedings of the 5th IEEE/ACM International Workshop on GridComputing. Washington, DC, USA: IEEE, 2004, pp 119–128

  25. Mufti WA (2019) ClientNet cluster an alternative of transferring big data files by use of mobile code. In: Xia Y, Zhang LJ (eds) Services–SERVICES 2019. Lecture notes in computer science, vol 11517. Springer, Cham. https://doi.org/10.1007/978-3-030-23381-5_8

    Chapter  Google Scholar 

  26. Abrishami S, Naghibzadeh M, Epema DHJ (2012) “Cost-driven scheduling of Grid workflows using partial critical paths. IEEE Trans Parallel Distrib Syst 23(8):1400–1414

    Article  Google Scholar 

  27. Chen W, Deelman E (2012) WorkflowSim: a toolkit for simulating scientific workflows in distributed environments. In: 2012 IEEE 8th International Conference on E-Science, e-Science 2012

  28. Palankar MR, Iamnitchi A, Ripeanu M, Garfinkel S (2008) Amazon S3 for science grids: a viable solution? In: Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing, DADC’08, ACM, New York, NY, USA, 2008, pp 55–64

  29. Bharathi S, Chervenak A, Deelman E, Mehta G, Su MH, Vahi K (2008) Characterization of scientific workflows. In: The 3rd Workshop on Workflows in Support of Large Scale Science, (WORKS 08)

  30. Topcuoglu H, Hariri S, Wu M (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274

    Article  Google Scholar 

  31. Chen F, Schneider J-G, Yang Y, Grundy J, He Q (2012) An energy consumption model and analysis tool for cloud computing environments. In: GREENS 2012, Zuricg, Switzerland, pp 45–50

  32. Mustafa S, Nazir B, Hayat A, Madani SA (2015) Resource management in cloud computing: Taxonomy, prospects, and challenges. Comput Elect Eng 47:186–203

    Article  Google Scholar 

  33. Ahmad Z, Nazir B, Umer A (2021) A fault-tolerant workflow management system with Quality of Service-aware scheduling for scientific workflows in cloud computing. Int J Commun Syst 34(1):e4649

    Google Scholar 

  34. Qureshi K, Khan FG, Manuel P, Nazir B (2011) A hybrid fault tolerance technique in grid computing system. J Supercomput 56(1):106–128

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Babar Nazir.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ulabedin, Z., Nazir, B. Replication and data management-based workflow scheduling algorithm for multi-cloud data centre platform. J Supercomput 77, 10743–10772 (2021). https://doi.org/10.1007/s11227-020-03541-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03541-2

Keywords

Navigation