Skip to main content
Log in

DFARM: a deadline-aware fault-tolerant scheduler for cloud computing

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Cloud computing has become popular for small businesses due to its cost-effectiveness and the ability to acquire necessary on-demand services, including software, hardware, network, etc., anytime around the globe. Efficient job scheduling in the Cloud is essential to optimize operational costs in data centers. Therefore, scheduling should consider assigning tasks to Virtual Machines (VMs) in a Cloud environment in such a manner that could speed up execution, maximize resource utilization, and meet users’ SLA and other constraints such as deadlines. For this purpose, the tasks can be prioritized based on their deadlines and task lengths, and the resources could be provisioned and released as needed. Moreover, to cope with unexpected execution situations or hardware failures, a fault-tolerance mechanism could be employed based on hybrid replication and the re-submission method. Most of the existing techniques tend to improve performance. However, their pitfall lies in certain aspects such as either those techniques prioritize tasks based on a singular value (e.g., usually deadline), only utilize a singular fault tolerance mechanism, or try to release resources that cause more overhead immediately. This research work proposes a new scheduler called the Deadline and fault-aware task Adjusting and Resource Managing (DFARM) scheduler, the scheduler dynamically acquires resources and schedules deadline-constrained tasks by considering both their length and deadlines while providing fault tolerance through the hybrid replication–resubmission method. Besides acquiring resources, it also releases resources based on their boot time to lessen costs due to reboots. The performance of the DFARM scheduler is compared to other scheduling algorithms, such as Random Selection, Round Robin, Minimum Completion Time, RALBA, and OG-RADL. With a comparable execution performance, the proposed DFARM scheduler reduces task-rejection rates by 2.34–9.53 times compared to the state-of-the-art schedulers using two benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Algorithm 6
Algorithm 7
Algorithm 8
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

No datasets were generated or analysed during the current study.

References

  1. Hussain, A., Aleem, M., Khan, A., Iqbal, M., Islam, A.: RALBA: a computation-aware load balancing scheduler for cloud computing. Clust. Comput. 21, 09 (2018)

    Article  Google Scholar 

  2. Hussain, A., Aleem, M., Iqbal, M., Islam, A.: SLA-RALBA: cost-efficient and resource-aware load balancing algorithm for cloud computing. J. Supercomput. 75, 10 (2019)

    Article  Google Scholar 

  3. Marahatta, A., Wang, Y., Zhang, F., Kumar, A., Sah Tyagi, S., Liu, Z.: Energy-aware fault-tolerant dynamic task scheduling scheme for virtualized cloud data centers. Mob. Netw. Appl. 24, 1–15 (2019)

    Article  Google Scholar 

  4. Zhou, A., Wang, S., Cheng, B., Zheng, Z., Yang, F., Chang, R.N., Lyu, M.R., Buyya, R.: Cloud service reliability enhancement via virtual machine placement optimization. IEEE Trans. Serv. Comput. 10(6), 902–913 (2017)

    Article  Google Scholar 

  5. Saidi, K., Bardou, D.: Task scheduling and VM placement to resource allocation in cloud computing: challenges and opportunities. Clust. Comput. 26(5), 3069–3087 (2023)

    Article  Google Scholar 

  6. AbdElfattah, E., Elkawkagy, M., El-Sisi, A.: A reactive fault tolerance approach for cloud computing. In: 2017 13th International Computer Engineering Conference (ICENCO), pp. 190–194 (2017)

  7. Arabnejad, V., Bubendorfer, K., Ng, B.: Scheduling deadline constrained scientific workflows on dynamically provisioned cloud resources. Future Gener. Comput. Syst. 75, 348–364 (2017)

    Article  Google Scholar 

  8. Kumar, M., Sharma, S.: Deadline constrained based dynamic load balancing algorithm with elasticity in cloud environment. Comput. Electr. Eng. 69, 395–411 (2018)

    Article  Google Scholar 

  9. Nabi, S., Ahmed, M.: OG-RADL: overall performance-based resource-aware dynamic load-balancer for deadline constrained cloud tasks. J. Supercomput. 77, 07 (2021)

    Article  Google Scholar 

  10. Garg, N., Singh, D., Singh Goraya, M.: Deadline aware energy-efficient task scheduling model for a virtualized server. SN Comput. Sci. 2(3), 169 (2021). https://doi.org/10.1007/s42979-021-00571-2

    Article  Google Scholar 

  11. Chinnathambi, S., Santhanam, A., Rajarathinam, J., Maruthamuthu, S.: Scheduling and checkpointing optimization algorithm for byzantine fault tolerance in cloud clusters. Clust. Comput. 22, 11 (2019)

    Article  Google Scholar 

  12. Adhikari, M., Amgoth, T.: Heuristic-based load-balancing algorithm for IaaS cloud. Future Gener. Comput. Syst. 81, 156–165 (2018)

    Article  Google Scholar 

  13. Iftikhar, S., Ahmad, M.M.M., Tuli, S., Chowdhury, D., Xu, M., Gill, S.S., Uhlig, S.: HunterPlus: AI based energy-efficient task scheduling for cloud-fog computing environments. Internet Things 21, 100667 (2023)

    Article  Google Scholar 

  14. Nazeri, M., Khorsand, R.: Energy aware resource provisioning for multi-criteria scheduling in cloud computing. Cybern. Syst. (2022). https://doi.org/10.1080/01969722.2022.2071409

    Article  Google Scholar 

  15. Alaei, M., Khorsand, R., Ramezanpour, M.: An adaptive fault detector strategy for scientific workflow scheduling based on improved differential evolution algorithm in cloud. Appl. Soft Comput. 99, 106895 (2021)

    Article  Google Scholar 

  16. Hussain, A., Aleem, M.: GoCJ: Google cloud jobs dataset for distributed and cloud computing infrastructures. Data 3(4), 38 (2018)

    Article  Google Scholar 

  17. Braun, T.D., Siegel, H.J., Beck, N., Bölöni, L.L., Maheswaran, M., Reuther, A.I., Robertson, J.P., Theys, M.D., Yao, B., Hensgen, D., Freund, R.F.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61(6), 810–837 (2001)

    Article  Google Scholar 

  18. Zhu, X., Yang, L.T., Chen, H., Wang, J., Yin, S., Liu, X.: Real-time tasks oriented energy-aware scheduling in virtualized clouds. IEEE Trans. Cloud Comput. 2(2), 168–180 (2014)

    Article  Google Scholar 

  19. Calheiros, R.N., Ranjan, R., Beloglazov, A., De Rose, C.A.F., Buyya, R.: CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw.: Pract. Exp. 41(1), 23–50 (2011). https://doi.org/10.1002/spe.995

    Article  Google Scholar 

Download references

Acknowledgements

The European Union (Horizon Europe Graph-Massivizer, 101093202) and the Austrian Research Promotion Agency (FFG Kärntner Fog, 888098) funded this work.

Funding

The European Union (Horizon Europe Graph-Massivizer, 101093202) and the Austrian Research Promotion Agency (FFG Kärntner Fog, 888098) funded this work.

Author information

Authors and Affiliations

Authors

Contributions

Ahmad Awan: literature review, implementation, detailed solution design, experimentations. Muhammad Aleem: idea formulation, supervision, detailed solution design. Altaf Hussain: proposed architecture, draft review, experimentation validation. Radu Prodan: proposed architecture, experimental plan and design, writeup review.

Corresponding author

Correspondence to Muhammad Aleem.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Awan, A., Aleem, M., Hussain, A. et al. DFARM: a deadline-aware fault-tolerant scheduler for cloud computing. Cluster Comput (2024). https://doi.org/10.1007/s10586-024-04419-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10586-024-04419-1

Keywords

Navigation