Skip to main content

Advertisement

Log in

A low redundancy and high time efficiency large-scale task assignment strategy for heterogeneous service-oriented cloud computing systems

The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

With a large number of heterogeneous processors are deployed on service-oriented cloud computing systems, the issue of processor random hardware failure is becoming increasingly prominent. Replication-based fault-tolerance task assignment is a common approach to satisfy application’s reliability requirement. However, the state-of-the-art algorithms have either high redundancy or low time efficiency. In this work, we propose a fast task assignment for minimizing redundancy (FTAMR) algorithm to satisfy reliability requirement for a directed acyclic graph-based parallel application on heterogeneous service-oriented cloud computing systems. Firstly, the FTAMR algorithm fast identifies tasks which need to be replicated. Secondly, the FTAMR algorithm fast maps selected tasks to their respective most suitable processors. Then, the FTAMR algorithm repeats above steps until application’s reliability satisfies established reliability requirement. Experimental results on real and synthetic generated parallel applications at different scales, parallelism, and heterogeneity show that the FTAMR algorithm can generate minimum redundancy and maximum time efficiency compared with the state-of-the-art fault-tolerance algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

References

  1. Cai Z, Li X, Gupta JND (2016) Heuristics for provisioning services to workflows in xaas clouds. IEEE Trans Serv Comput 9(2):250–263

    Article  Google Scholar 

  2. Zhou A, Wang S, Cheng B, Zheng Z, Yang F, Chang RN, Lyu MR, Buyya R (2017) Cloud service reliability enhancement via virtual machine placement optimization. IEEE Trans Serv Comput 10(6):902–913

    Article  Google Scholar 

  3. Fu Z, Huang F, Sun X, Vasilakos AV, Yang C (2019) Enabling semantic search based on conceptual graphs over encrypted outsourced data. IEEE Trans Serv Comput 12(5):813–823

    Article  Google Scholar 

  4. Xia Z, Wang X, Sun X, Wang Q (2016) A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans Parallel Distrib Syst 27(2):340–352

    Article  Google Scholar 

  5. Kong Y, Zhang M, Ye D (2017) A belief propagation-based method for task allocation in open and dynamic cloud environments. Knowl Based Syst 115:123–132

    Article  Google Scholar 

  6. Xie G, Zeng G, Chen Y, Bai Y, Zhou Z, Li R, Li K (2018) Minimizing redundancy to satisfy reliability requirement for a parallel application on heterogeneous service-oriented systems. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2017.2665552

    Article  Google Scholar 

  7. Tang Z, Qi L, Cheng Z, Li K, Khan SU, Li K (2016) An energy-efficient task scheduling algorithm in dvfs-enabled cloud environment. J Grid Comput 14(1):55–74

    Article  Google Scholar 

  8. Topcuoglu H, Hariri S, Wu MY (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274

    Article  Google Scholar 

  9. Khan MA (2012) Scheduling for heterogeneous systems using constrained critical paths. Parallel Comput 38(4–5):175–193

    Article  Google Scholar 

  10. Xie G, Li R, Li K (2015) Heterogeneity-driven end-to-end synchronized scheduling for precedence constrained tasks and messages on networked embedded systems. J Parallel Distrib Comput 83:1–12

    Article  Google Scholar 

  11. Bosilca G, Bouteiller A, Danalis A, Herault T, Lemarinier P, Dongarra J (2012) Dague: a generic distributed dag engine for high performance computing. Parallel Comput 38(1):37–51 (Extensions for next-generation parallel programming models)

    Article  Google Scholar 

  12. Leu J, Chen C, Hsu K (2014) Improving heterogeneous soa-based iot message stability by shortest processing time scheduling. IEEE Trans Serv Comput 7(4):575–585

    Article  Google Scholar 

  13. Chtepen M, Claeys FHA, Dhoedt B, De Turck F, Demeester P, Vanrolleghem PA (2009) Adaptive task checkpointing and replication: toward efficient fault-tolerant grids. IEEE Trans Parallel Distrib Syst 20(2):180–190

    Article  Google Scholar 

  14. Zhao L, Ren Y, Xiang Y, Sakurai K (2010) Fault-tolerant scheduling with dynamic number of replicas in heterogeneous systems. In: 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC), p 434–441

  15. Zhao L, Ren Y, Sakurai K (2013) Reliable workflow scheduling with less resource redundancy. Parallel Comput 39(10):567–585

    Article  MathSciNet  Google Scholar 

  16. Tămaş-Selicean D, Pop P (2015) Design optimization of mixed-criticality real-time embedded systems. Acm Trans Embed Comput Syst 14(3):1–29

    Article  Google Scholar 

  17. Zheng Z, T. C Zhou, Lyu M R, King I (2012) Component ranking for fault-tolerant cloud applications. IEEE Trans Serv Comput 5(4):540–550

    Article  Google Scholar 

  18. Qiu W, Zheng Z, Wang X, Yang X, Lyu MR (2014) Reliability-based design optimization for cloud migration. IEEE Trans Serv Comput 7(2):223–236

    Article  Google Scholar 

  19. Available http://www.iec.ch/functionalsafety/

  20. Available http://www.iso.org/iso/iso9000

  21. Girault A, Kalla H (2009) A novel bicriteria scheduling heuristics providing a guaranteed global system failure rate. IEEE Trans Dependable and Secure Comput 6(4):241–254

    Article  Google Scholar 

  22. Benoit A, Hakem M, Robert Y (2008) Fault tolerant scheduling of precedence task graphs on heterogeneous platforms. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, p 1–8

  23. Benoit A, Hakem M, Robert Y (Sept 2009) Optimizing the latency of streaming applications under throughput and reliability constraints. In: 2009 International Conference on Parallel Processing, p 325–332

  24. Xie G, Liu L, Yang L, Li R (2017) Scheduling trade-off of dynamic multiple parallel workflows on heterogeneous distributed computing systems. Concurr Comput Pract Exp 29(2):e3782

    Article  Google Scholar 

  25. Broberg J, Venugopal S, Buyya R (2008) Market-oriented grids and utility computing: the state-of-the-art and future directions. J Grid Comput 6(3):255–276

    Article  Google Scholar 

  26. Available https://en.wikipedia.org/wiki/servicelevelagreement

  27. Bridi T, Bartolini A, Lombardi M, Milano M, Benini L (2016) A constraint programming scheduler for heterogeneous high-performance computing machines. IEEE Trans Parallel Distrib Syst 27(10):2781–2794

    Article  Google Scholar 

  28. Chiang S, Vasupongayya S (2008) Design and potential performance of goal-oriented job scheduling policies for parallel computer workloads. IEEE Trans Parallel Distrib Syst 19(12):1642–1656

    Article  Google Scholar 

  29. Gu Z, Han G, Zeng H, Zhao Q (2016) Security-aware mapping and scheduling with hardware co-processors for flexray-based distributed embedded systems. IEEE Trans Parallel Distrib Syst 27(10):3044–3057

    Article  Google Scholar 

  30. Xie G, Chen Y, Li R, Li K (2018) Hardware cost design optimization for functional safety-critical parallel applications on heterogeneous distributed embedded systems. IEEE Trans Ind Inform 14(6):2418–2431

    Article  Google Scholar 

  31. Xie G, Chen Y, Liu Y, Wei Y, Li R, Li K (2017) Resource consumption cost minimization of reliable parallel applications on heterogeneous embedded systems. IEEE Trans Ind Inform 13(4):1629–1640

    Article  Google Scholar 

  32. Tang X, Li K, Li R, Veeravalli B (2010) Reliability-aware scheduling strategy for heterogeneous distributed computing systems. J Parallel Distrib Comput 70(9):941–952

    Article  Google Scholar 

  33. Tang X, Li K, Qiu M, Sha HM (2012) A hierarchical reliability-driven scheduling algorithm in grid systems. J Parallel Distrib Comput 72(4):525–535

    Article  Google Scholar 

  34. Mei J, Li K, Zhou X, Li K (2015) Fault-tolerant dynamic rescheduling for heterogeneous computing systems. J Grid Comput 13(4):507–525

    Article  Google Scholar 

  35. Qin X, Jiang H, Swanson D. R (2002) An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogeneous systems. In: Proceedings International Conference on Parallel Processing, p 360–368

  36. Qin X, Jiang H (2006) A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems. Parallel Comput 32(5–6):331–356

    Article  MathSciNet  Google Scholar 

  37. Zheng Q, Veeravalli B, Tham C (2009) On the design of fault-tolerant scheduling strategies using primary-backup approach for computational grids with low replication costs. IEEE Trans Comput 58(3):380–393

    Article  MathSciNet  Google Scholar 

  38. Nahir A, Orda A, Raz D (2016) Replication-based load balancing. IEEE Trans Parallel Distrib Syst 27(2):494–507

    Article  Google Scholar 

  39. Qiu Z, Pérez JF (2016) Evaluating replication for parallel jobs: an efficient approach. IEEE Trans Parallel Distrib Syst 27(8):2288–2302

    Article  Google Scholar 

  40. Soniya J, Sujana J. A. J, Revathi T (2016) Dynamic fault tolerant scheduling mechanism for real time tasks in cloud computing. In: 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), p 124–129

  41. Wei M, Liu J, Li T, Xu X, Hu W, Zhao D (2017) Fault-tolerant scheduling of real-time tasks on heterogeneous systems. In: 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), p 1006–1011

  42. Tang Z, Qi L, Cheng Z, Li K, Khan SU, Li K (2016) An energy-efficient task scheduling algorithm in dvfs-enabled cloud environment. J Grid Comput 14(1):55–74

    Article  Google Scholar 

  43. Xie G, Li R, Li K (2015) Heterogeneity-driven end-to-end synchronized scheduling for precedence constrained tasks and messages on networked embedded systems. Academic Press, Inc., Cambridge

    Book  Google Scholar 

  44. Shatz SM, Wang J (1989) Models and algorithms for reliability-oriented task-allocation in redundant distributed-computer systems. IEEE Trans Reliab 38(1):16–27

    Article  Google Scholar 

  45. Verma A, Bhardwaj N (2016) A review on routing information protocol (rip) and open shortest path first (ospf) routing protocol. Int J Future Gener Commun Netw 9(4):161–170

    Article  Google Scholar 

  46. Zheng Q, Veeravalli B (2009) On the design of communication-aware fault-tolerant scheduling algorithms for precedence constrained tasks in grid computing systems with dedicated communication devices. J Parallel Distrib Comput 69(3):282–294

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Natural Science Foundation of Hunan Province, China, under Grant 2020JJ6063 and Grant 2019JJ50592, in part by the National Key Research and Development Program of China under Grant 2018YFB1003702, in part by the National Natural Science Foundation of China under Grant 61902336 and Grant 61703157, in part by the Hunan Province Science and Technology Project Funds under Grant 2018TP1036, and in part by the CERNET Innovation Project under Grant NGII20160310.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tingrui Pei.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, J., Wang, L., Xie, G. et al. A low redundancy and high time efficiency large-scale task assignment strategy for heterogeneous service-oriented cloud computing systems. J Supercomput 77, 3450–3483 (2021). https://doi.org/10.1007/s11227-020-03403-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03403-x

Keywords

Navigation