Skip to main content

Aggressive Fault Tolerance in Cloud Computing Using Smart Decision Agent

  • Conference paper
  • First Online:
Proceedings of the International Conference on Big Data, IoT, and Machine Learning

Abstract

Application of cloud computing is increasing gradually. It is a useful model for a collection of configurable computing resources such as data-centers, servers, data storage and application services in real-time. Due to the emergence of cloud computing, providing reliable service becomes vital issue. Transient faults may affect temporary unavailability of services and timeout to get response. These types of faults can be catastrophic in cloud applications such as, scientific research, financial and safety critical applications. To reduce the effect of such errors, a fault tolerant mechanism is required. We propose an aggressive fault tolerant (AFT) technique to detect and recover from faults in cloud environment. Aggressive fault detection and recovery module detects faults and recovers from these faults using a smart decision agent. A smart decision agent takes decision on different types of hardware, software and communication faults. It reduces complexity and improves performance of fault tolerant schemes compared with other existing techniques such as checkpointing, resubmission and replication techniques. The proposed scheme achieves 98.7% error coverage while it is 1.5 times faster than checkpointing, 2.0 times faster than resubmission and 2.5 times faster than replication technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Rouf MA, Shahariar Parvez AHM, Robiul Alam Robel M, Podder P, Bharati S (2020) Effect of fault tolerance in the field of cloud computing. Lect Notes Netw Syst 98:297–305

    Google Scholar 

  2. Mittal D, Agarwal N (2015) A review paper on fault tolerance in cloud computing. In: 2015 international conference on computing for sustainable global development, INDIACom 2015, pp 31–34

    Google Scholar 

  3. Ledmi A, Bendjenna H, Hemam SM (2018) Fault tolerance in distributed systems: a survey. In: International conference on pattern analysis and intelligent systems (PAIS), pp 1–5

    Google Scholar 

  4. Gokhroo MK, Govil MC, Pilli ES (2017) Detecting and mitigating faults in cloud computing environment. In: 3rd IEEE international conference on computational intelligence & communication technology (CICT)

    Google Scholar 

  5. Amoon M (2016) Adaptive framework for reliable cloud computing environment. IEEE Access 4(c):9469–9478. https://doi.org/10.1109/ACCESS.2016.2623633

  6. Liu J, Wang S, Zhou A, Kumar SAP, Yang F, Buyya R (2018) Using proactive fault-tolerance approach to enhance cloud service reliability. IEEE Trans Cloud Comput 6(4):1191–1202, Oct.-Dec. https://doi.org/10.1109/TCC.2016.2567392

    Article  Google Scholar 

  7. Talwani S, Singla J (2019) Comparison of various fault tolerance techniques for scientific workflows in cloud computing. In: 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon), 2019, pp 454–459. https://doi.org/10.1109/COMITCon.2019.8862211

  8. Amin Z, Singh H, Sethi N (2015) Review on fault tolerance techniques in cloud computing. Int J Comput Appl 116(18):11–17

    Google Scholar 

  9. Rouf MA, Kim S (2019) A review on fault tolerant techniques and issues in recent generation processors. DUET J 5(ii):1–6. Available at: https://www.duet.ac.bd/wp-content/uploads/2020/10/5.pdf

  10. Dos Santos VA, Manacero A, Lobato RS, Spolon R, Cavenaghi MA (2020) A systematic review of fault tolerance solutions for communication errors in open source cloud computing. In: 2020 15th Iberian conference on information systems and technologies (CISTI), vol 2020, pp 1–6. https://doi.org/10.23919/CISTI49556.2020.9140933

  11. Abdelfattah E, Elkawkagy M, El-Sisi A (2018) A reactive fault tolerance approach for cloud computing. In: 2017 13th international computer engineering conference (ICENCO), vol 2017, pp 190–194. https://doi.org/10.1109/ICENCO.2017.8289786

  12. Villamayor J, Rexachs D, Luque E (2017) A fault tolerance manager with distributed coordinated checkpoints for automatic recovery. In: 2017 International conference on high performance computing & simulation (HPCS) 2017, pp 452–459. https://doi.org/10.1109/HPCS.2017.73

  13. Azaiez M, Chainbi W, Ghedira K (2019) Hybrid fault tolerance model for cloud dependability. In: 2019 IEEE 21st international conference on high performance computing and communications; IEEE 17th international conference on smart city; IEEE 5th international conference on data science and systems (HPCC/SmartCity/DSS 2019), pp 2436–2444. https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00340

  14. Mohammed B, Kiran M, Awan IU, Maiyama KM (2016) An integrated virtualized strategy for fault tolerance in cloud computing environment. In: 2016 Intl IEEE international conference on ubiquitous intelligence and computing, advanced and trusted computing, scalable computing and communications, cloud and big data computing, internet of people, and smart world congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), pp 542–549. https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoPSmartWorld.2016.0094

  15. Swetha S, Kumar DSV (2018) Fault detection and prediction in cloud computing. Int J Trend Sci Res Dev (ijtsrd) 2(6):878–880. ISSN: 2456-6470

    Google Scholar 

  16. Jaswal S, Malhotra M (2019) Trust and fault tolerance models in cloud computing: a review. Int J Trend Sci Res Develop (ijtsrd) 8(11):1273–1285, ISBN: 2277-8616

    Google Scholar 

  17. Qu C, Calheiros RN, Buyya R (2018) Auto-scaling web applications in clouds: a taxonomy and survey. ACM Comput Surv 51(4):1–33

    Google Scholar 

  18. Siddiqui ZA, Lee JA, Park U (2018) SEDC-based hardware-level fault tolerance and fault secure checker design for big data and cloud computing. Sci Program 2018:16 Article ID 7306837

    Google Scholar 

  19. Santiago Pinto VHC, Souza SRS, Souza PSL (2019) A preliminary fault taxonomy for multi-tenant SaaS systems. In: 2019 19th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid) 2019, no 1, pp 178–187. https://doi.org/10.1109/CCGRID.2019.00032

  20. Chinnaiah MR, Niranjan N (2018) Fault tolerant software systems using software configurations for cloud computing. J Cloud Comput 7(1)

    Google Scholar 

  21. Mcmanus JP, Day TG (2019) The effects of latency, bandwidth, and packet loss on cloud-based gaming services. Interact Qualif Proj (All Years), p 58

    Google Scholar 

  22. Giannakopoulos I, Konstantinou I, Tsoumakos D, Koziris N (2017) AURA: recovering from transient failures in cloud deployments. In: Proceedings of 2017 17th IEEE/ACM international symposium on cluster, cloud and grid computing, CCGRID 2017, pp 762–765

    Google Scholar 

  23. Buyya R, Ranjan R, Calheiros RN (2009) Modeling and simulation of scalable cloud computing environments and the cloudsim toolkit: challenges and opportunities. In: Proceedings of 2009 international conference on high performance computing & simulation, HPCS 2009, pp 1–11

    Google Scholar 

  24. Nivitha K, Pabitha P (2020) Fault diagnosis for uncertain cloud environment through fault injection mechanism. In: 2020 4th international conference on intelligent computing, information and control systems (ICICCS) 2020, pp 129–134. https://doi.org/10.1109/ICICCS48265.2020.9121168

  25. Jhawar R, Piuri V, Santambrogio M (2013) Fault tolerance management in cloud computing: a system-level perspective. IEEE Syst J 7(2):288–297, June 2013. https://doi.org/10.1109/JSYST.2012.2221934

    Article  Google Scholar 

  26. Rajesh S, Devi RK (2014) Improving fault tolerance in virtual machine based cloud infrastructure. Int J Innov Res Sci Eng Technol 3(3):2163–2168

    Google Scholar 

  27. Bosilca A, Nita MC, Pop F, Cristea V (2014) Cloud simulation under fault constraints. In: 2014 IEEE 10th international conference on intelligent computer communication and processing, ICCP 2014, pp 341–348. https://doi.org/10.1109/ICCP.2014.6937019

  28. Zhang W, Chen X, Jiang J (2021) A multi-objective optimization method of initial virtual machine fault-tolerant placement for star topological data centers of cloud systems. Tsinghua Sci Technol 26(1):95–111

    Article  Google Scholar 

  29. Wang L (2019) Architecture-based reliability-sensitive criticality measure for fault-tolerance cloud applications. IEEE Trans Parallel Distrib Syst 30(11):2408–2421

    Article  Google Scholar 

  30. De Araujo Neto JP, Pianto DM, Ghedini Ralha C (2018) A resilient agent-based architecture for efficient usage of transient servers in cloud computing. In: 2018 IEEE International conference on cloud computing technology and science, CloudCom, vol 2018, pp 218–22. https://doi.org/10.1109/CloudCom2018.2018.00050

  31. Al Obaidy AT, Al Doori MMS (2014) The future for adaptive software development in cloud computing environment using multi agent system. Eng Tech J 3(1):25–36

    Google Scholar 

  32. Dähling S, Razik L, Monti A (2021) Enabling scalable and fault-tolerant multi-agent systems by utilizing cloud-native computing. Auton Agent Multi Agent Syst 35(1):1–27

    Article  Google Scholar 

  33. Deshkar M (2021) The intelligent agent-based information security model for cloud. Int J Adv Res Ideas Innov Technol 7(3):38–45

    Google Scholar 

  34. Malik MK (2020) Host fault injection using various distribution functions. Int J Comput Sci Mob Comput 9(12):1–10

    Google Scholar 

  35. Feinbube L, Pirl L, Tröger P, Polze A (2017) Software fault injection campaign generation for cloud infrastructures. In: 2017 IEEE international conference on software quality, reliability and security companion, QRS-C 2017, pp 622–623. https://doi.org/10.1109/QRS-C.2017.119

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Abdur Rouf .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rahman, M.M., Rouf, M.A. (2022). Aggressive Fault Tolerance in Cloud Computing Using Smart Decision Agent. In: Arefin, M.S., Kaiser, M.S., Bandyopadhyay, A., Ahad, M.A.R., Ray, K. (eds) Proceedings of the International Conference on Big Data, IoT, and Machine Learning. Lecture Notes on Data Engineering and Communications Technologies, vol 95. Springer, Singapore. https://doi.org/10.1007/978-981-16-6636-0_26

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-6636-0_26

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-6635-3

  • Online ISBN: 978-981-16-6636-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics