Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Distributed workflow mapping algorithm for maximized reliability under end-to-end delay constraint

  • 308 Accesses

  • 4 Citations

Abstract

A distributed scientific workflow mapping algorithm for maximized reliability under certain end-to-end delay (EED) bound is proposed. It is studied in a heterogeneous distributed computing environment, where computing node and communication link failures are inevitable. The mapping decision and the stored table information is distributed among various nodes in order to achieve scalability and robustness, which are especially important for large-scale distributed systems. This Distributed Reliability Maximization workflow mapping algorithm under End-to-end Delay constraint (dis-DRMED) considers both the maximum reliability and the minimum EED objectives in a two-step procedure. In the first step, a mapping algorithm combining iterative Critical Path search and Layer-based priority assigning techniques (CPL) is adopted to minimize the EED by focusing on the optimal allocation of tasks on the critical path. In the second step, tasks on noncritical paths are remapped to improve the overall execution reliability. Simulation results under various system setups demonstrated that dis-DRMED achieved considerably higher reliability values under the same EED constraint compared with some representative workflow mapping algorithms.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 4
Fig. 7
Fig. 8
Fig. 9

Notes

  1. 1.

    Partial EED of each individual task u i is the end-to-end delay of a path from the starting task u 1 to u i .

References

  1. 1.

    Agarwalla B, Ahmed N, Hilley D, Ramachandran U (2007) Streamline: a scheduling heuristic for streaming application on the grid. In: The 13th multimedia computing and networking conf, pp 69–85

  2. 2.

    Benoit A, Hakem M, Robert Y (2008) Fault tolerant scheduling of precedence task graphs on heterogeneous platforms. In: IEEE international symposium on parallel and distributed processing, pp 1–8

  3. 3.

    Buyya R, Murshed M (2002) GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. Concurr Comput 14(13–15):1175–1220

  4. 4.

    Calheiros RN, Ranjan R, Belglazov A, De Rose CAF, Buyya R (2011) CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41(1):23–50

  5. 5.

    Chen W, Zhang J (2009) An ant colony optimization approach to a grid workflow scheduling problem with various QoS requirements. IEEE Trans Syst Man Cybern, Part C, Appl Rev 39(1):29–43

  6. 6.

    Cirou B, Jeannot E (2001) Triplet: a clustering scheduling algorithm for heterogeneous systems. In: IEEE ICPP international workshop on Metacomputing Systems and Applications (MSA ’2001), pp 231–236

  7. 7.

    Condor. http://www.cs.wisc.edu/condor

  8. 8.

    Dabrowski C (2009) Reliability in grid computing systems. Concurr Comput 21(8):927–959

  9. 9.

    DAGMan. http://www.cs.wisc.edu/condor/dagman

  10. 10.

    DOE UltraScienceNet. http://www.csm.ornl.gov/ultranet

  11. 11.

    Dogan A, Ozguner F (2000) Reliable matching and scheduling of precedence-constrained tasks in heterogeneous distributed computing. In: Proc. of the 29th international conference on parallel processing, pp 307–314

  12. 12.

    Dogan A, Ozguner F (2002) Matching and scheduling algorithms for minimizing execution time and failure probability of applications in heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):308–323

  13. 13.

    Dogan A, Ozguner F (2005) Bi-objective scheduling algorithms for execution time-reliability trade-off in heterogeneous computing systems. Comput J 48(3):300–314

  14. 14.

    Dongarra J, Jeannot E, Saule E, Shi Z (2007) Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems. In: Proc. of the nineteenth annual ACM symposium on parallel algorithms and architectures (SPAA ’07). ACM, New York, pp 280–288

  15. 15.

    ESnet. http://www.es.net/

  16. 16.

    Globus. http://www.globus.org

  17. 17.

    Hakem M, Butelle F (2006) A Bi-objective algorithm for scheduling parallel applications on heterogeneous systems subject to failures. In: Renpar 17, canet en roussillon, pp 280–288

  18. 18.

    Hakem M, Butelle F (2007) Reliability and scheduling on systems subject to failures. In: Proceedings of the 2007 International Conference on Parallel Processing (ICPP ’07). IEEE Comput Soc, Washington, p 38

  19. 19.

    Large Hadron Collider. http://en.wikipedia.org/wiki/Large_Hadron_Collider

  20. 20.

    Lewis EE (1987) Introduction to reliability engineering. Wiley, New York

  21. 21.

    Ma T, Buyya R (2005) Critical-path and priority based algorithms for scheduling workflows with parameter sweep tasks on global grids. In: Proc of the 17th int symp on computer architecture on high performance computing, pp 251–258

  22. 22.

    Plank JS, Elwasif WR (1998) Experimental assessment of workstation failures and their impact on checkpointing systems. In: Intl symp fault-tolerant computing, pp 48–57

  23. 23.

    Rahman M, Ranjan R, Buyya R (2009) A distributed heuristic for decentralized workflow scheduling. In: Global grids, 10th IEEE/ACM international conference on grid computing, pp 163–164

  24. 24.

    Sih G, Lee E (1993) A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Trans Parallel Distrib Syst 4(2):175–187

  25. 25.

    Singh G, Kesselman C, Deelman E (2006) Optimizing grid-based workflow execution. J Grid Comput 3:201–219

  26. 26.

    Sonmez O, Yigitbasi N, Abrishami S, Iosup A, Epema D (2010) Performance analysis of dynamic workflow scheduling in multicluster grids. In: Proceedings of the 19th ACM international symposium on High Performance Distributed Computing (HPDC ’10), pp 49–60

  27. 27.

    Topcuoglu S, Wu M (1999) Task scheduling algorithms for heterogeneous processors. In: 8th IEEE Heterogeneous Computing Workshop (HCW ’99), pp 3–14

  28. 28.

    Wang L, Kunze M, Tao J (2008) Performance evaluation of virtual machine-based grid workflow system. Concurr Comput 20(15):1759–1771

  29. 29.

    Wang L, Chen D, Huang F (2011) Virtual workflow system for distributed collaborative scientific applications on grid. Comput Electr Eng 37(3):300–310

  30. 30.

    Wang X, Yeo CS, Buyya R, Sua J (2011) Optimizing makespan and reliability for workflow applications with reputation and a look-ahead genetic algorithm. Future Gener Comput Syst 27(8):1124–1134

  31. 31.

    Wu Q, Gu Y (2008) Supporting distributed application workflows in heterogeneous computing environments. In: Proc of 14th International Conference on Parallel and Distributed Systems (ICPADS ’08), vol 47, pp 8–22

  32. 32.

    Wu Q, Gu Y (2010) Distributed workflow mapping algorithm for minimum end-to-end delay under fault-tolerance constraint. In: IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS), pp 508–515

  33. 33.

    Wu Q, Gu Y, Zhu M (2008) Optimizing network performance of computing pipelines in distributed environments. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS ’2008), pp 1–8

  34. 34.

    Wu Q, Zhu M, Lu X, Brown P, Lin Y, Gu Y, Cao F, Reuter M (2010) Automation and management of scientific workflows in distributed network environments. In: Proc of the 6th int workshop on sys man tech, proc, and serv, pp 1–8

  35. 35.

    Wu Q, Gu Y, Lin Y, Rao NSV (2011) Latency modeling and minimization for large-scale scientific workflows in distributed network environments. In: Proc. of the 44th Annual Simulation Symposium (ANSS ’2011), pp 205–212

  36. 36.

    Xing L, Shrest A (2006) Algorithms for minimal-length schedules. In: Computer and job-shop scheduling theory, vol 2, pp 473–479

  37. 37.

    Yang X, Bruin RP, Dove MT (2010) Developing an end-to-end scientific workflow. Comput Sci Eng 12(3):52–61

  38. 38.

    Yin PY, Yu SS, Wang PP, Wang YT (2007) Multi-objective task allocation in distributed computing systems by hybrid particle swarm optimization. Appl Math Comput 184:407–420

  39. 39.

    Zhu M, Wu Q, Rao NSV, Iyengar SS (2004) Adaptive visualization pipeline decomposition and mapping onto computer networks. In: Proc. of the IEEE internatioal conference on image and graphics, pp 402–405

  40. 40.

    Zhu M, Cao F, Mi J (2011) A hybrid mapping and scheduling algorithm for distributed workflow applications. In: A heterogeneous computing environment, intelligent distributed computing V, 5th international symposium on Intelligent Distributed Computing (IDC 2011). Springer, Berlin, pp 117–127

Download references

Author information

Correspondence to Fei Cao.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Cao, F., Zhu, M.M. Distributed workflow mapping algorithm for maximized reliability under end-to-end delay constraint. J Supercomput 66, 1462–1488 (2013). https://doi.org/10.1007/s11227-013-0938-3

Download citation

Keywords

  • Workflow mapping
  • Minimum end-to-end delay
  • Maximized reliability
  • Distributed computing