Abstract
Application of cloud computing is increasing gradually. It is a useful model for a collection of configurable computing resources such as data-centers, servers, data storage and application services in real-time. Due to the emergence of cloud computing, providing reliable service becomes vital issue. Transient faults may affect temporary unavailability of services and timeout to get response. These types of faults can be catastrophic in cloud applications such as, scientific research, financial and safety critical applications. To reduce the effect of such errors, a fault tolerant mechanism is required. We propose an aggressive fault tolerant (AFT) technique to detect and recover from faults in cloud environment. Aggressive fault detection and recovery module detects faults and recovers from these faults using a smart decision agent. A smart decision agent takes decision on different types of hardware, software and communication faults. It reduces complexity and improves performance of fault tolerant schemes compared with other existing techniques such as checkpointing, resubmission and replication techniques. The proposed scheme achieves 98.7% error coverage while it is 1.5 times faster than checkpointing, 2.0 times faster than resubmission and 2.5 times faster than replication technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Rouf MA, Shahariar Parvez AHM, Robiul Alam Robel M, Podder P, Bharati S (2020) Effect of fault tolerance in the field of cloud computing. Lect Notes Netw Syst 98:297–305
Mittal D, Agarwal N (2015) A review paper on fault tolerance in cloud computing. In: 2015 international conference on computing for sustainable global development, INDIACom 2015, pp 31–34
Ledmi A, Bendjenna H, Hemam SM (2018) Fault tolerance in distributed systems: a survey. In: International conference on pattern analysis and intelligent systems (PAIS), pp 1–5
Gokhroo MK, Govil MC, Pilli ES (2017) Detecting and mitigating faults in cloud computing environment. In: 3rd IEEE international conference on computational intelligence & communication technology (CICT)
Amoon M (2016) Adaptive framework for reliable cloud computing environment. IEEE Access 4(c):9469–9478. https://doi.org/10.1109/ACCESS.2016.2623633
Liu J, Wang S, Zhou A, Kumar SAP, Yang F, Buyya R (2018) Using proactive fault-tolerance approach to enhance cloud service reliability. IEEE Trans Cloud Comput 6(4):1191–1202, Oct.-Dec. https://doi.org/10.1109/TCC.2016.2567392
Talwani S, Singla J (2019) Comparison of various fault tolerance techniques for scientific workflows in cloud computing. In: 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon), 2019, pp 454–459. https://doi.org/10.1109/COMITCon.2019.8862211
Amin Z, Singh H, Sethi N (2015) Review on fault tolerance techniques in cloud computing. Int J Comput Appl 116(18):11–17
Rouf MA, Kim S (2019) A review on fault tolerant techniques and issues in recent generation processors. DUET J 5(ii):1–6. Available at: https://www.duet.ac.bd/wp-content/uploads/2020/10/5.pdf
Dos Santos VA, Manacero A, Lobato RS, Spolon R, Cavenaghi MA (2020) A systematic review of fault tolerance solutions for communication errors in open source cloud computing. In: 2020 15th Iberian conference on information systems and technologies (CISTI), vol 2020, pp 1–6. https://doi.org/10.23919/CISTI49556.2020.9140933
Abdelfattah E, Elkawkagy M, El-Sisi A (2018) A reactive fault tolerance approach for cloud computing. In: 2017 13th international computer engineering conference (ICENCO), vol 2017, pp 190–194. https://doi.org/10.1109/ICENCO.2017.8289786
Villamayor J, Rexachs D, Luque E (2017) A fault tolerance manager with distributed coordinated checkpoints for automatic recovery. In: 2017 International conference on high performance computing & simulation (HPCS) 2017, pp 452–459. https://doi.org/10.1109/HPCS.2017.73
Azaiez M, Chainbi W, Ghedira K (2019) Hybrid fault tolerance model for cloud dependability. In: 2019 IEEE 21st international conference on high performance computing and communications; IEEE 17th international conference on smart city; IEEE 5th international conference on data science and systems (HPCC/SmartCity/DSS 2019), pp 2436–2444. https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00340
Mohammed B, Kiran M, Awan IU, Maiyama KM (2016) An integrated virtualized strategy for fault tolerance in cloud computing environment. In: 2016 Intl IEEE international conference on ubiquitous intelligence and computing, advanced and trusted computing, scalable computing and communications, cloud and big data computing, internet of people, and smart world congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), pp 542–549. https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoPSmartWorld.2016.0094
Swetha S, Kumar DSV (2018) Fault detection and prediction in cloud computing. Int J Trend Sci Res Dev (ijtsrd) 2(6):878–880. ISSN: 2456-6470
Jaswal S, Malhotra M (2019) Trust and fault tolerance models in cloud computing: a review. Int J Trend Sci Res Develop (ijtsrd) 8(11):1273–1285, ISBN: 2277-8616
Qu C, Calheiros RN, Buyya R (2018) Auto-scaling web applications in clouds: a taxonomy and survey. ACM Comput Surv 51(4):1–33
Siddiqui ZA, Lee JA, Park U (2018) SEDC-based hardware-level fault tolerance and fault secure checker design for big data and cloud computing. Sci Program 2018:16 Article ID 7306837
Santiago Pinto VHC, Souza SRS, Souza PSL (2019) A preliminary fault taxonomy for multi-tenant SaaS systems. In: 2019 19th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid) 2019, no 1, pp 178–187. https://doi.org/10.1109/CCGRID.2019.00032
Chinnaiah MR, Niranjan N (2018) Fault tolerant software systems using software configurations for cloud computing. J Cloud Comput 7(1)
Mcmanus JP, Day TG (2019) The effects of latency, bandwidth, and packet loss on cloud-based gaming services. Interact Qualif Proj (All Years), p 58
Giannakopoulos I, Konstantinou I, Tsoumakos D, Koziris N (2017) AURA: recovering from transient failures in cloud deployments. In: Proceedings of 2017 17th IEEE/ACM international symposium on cluster, cloud and grid computing, CCGRID 2017, pp 762–765
Buyya R, Ranjan R, Calheiros RN (2009) Modeling and simulation of scalable cloud computing environments and the cloudsim toolkit: challenges and opportunities. In: Proceedings of 2009 international conference on high performance computing & simulation, HPCS 2009, pp 1–11
Nivitha K, Pabitha P (2020) Fault diagnosis for uncertain cloud environment through fault injection mechanism. In: 2020 4th international conference on intelligent computing, information and control systems (ICICCS) 2020, pp 129–134. https://doi.org/10.1109/ICICCS48265.2020.9121168
Jhawar R, Piuri V, Santambrogio M (2013) Fault tolerance management in cloud computing: a system-level perspective. IEEE Syst J 7(2):288–297, June 2013. https://doi.org/10.1109/JSYST.2012.2221934
Rajesh S, Devi RK (2014) Improving fault tolerance in virtual machine based cloud infrastructure. Int J Innov Res Sci Eng Technol 3(3):2163–2168
Bosilca A, Nita MC, Pop F, Cristea V (2014) Cloud simulation under fault constraints. In: 2014 IEEE 10th international conference on intelligent computer communication and processing, ICCP 2014, pp 341–348. https://doi.org/10.1109/ICCP.2014.6937019
Zhang W, Chen X, Jiang J (2021) A multi-objective optimization method of initial virtual machine fault-tolerant placement for star topological data centers of cloud systems. Tsinghua Sci Technol 26(1):95–111
Wang L (2019) Architecture-based reliability-sensitive criticality measure for fault-tolerance cloud applications. IEEE Trans Parallel Distrib Syst 30(11):2408–2421
De Araujo Neto JP, Pianto DM, Ghedini Ralha C (2018) A resilient agent-based architecture for efficient usage of transient servers in cloud computing. In: 2018 IEEE International conference on cloud computing technology and science, CloudCom, vol 2018, pp 218–22. https://doi.org/10.1109/CloudCom2018.2018.00050
Al Obaidy AT, Al Doori MMS (2014) The future for adaptive software development in cloud computing environment using multi agent system. Eng Tech J 3(1):25–36
Dähling S, Razik L, Monti A (2021) Enabling scalable and fault-tolerant multi-agent systems by utilizing cloud-native computing. Auton Agent Multi Agent Syst 35(1):1–27
Deshkar M (2021) The intelligent agent-based information security model for cloud. Int J Adv Res Ideas Innov Technol 7(3):38–45
Malik MK (2020) Host fault injection using various distribution functions. Int J Comput Sci Mob Comput 9(12):1–10
Feinbube L, Pirl L, Tröger P, Polze A (2017) Software fault injection campaign generation for cloud infrastructures. In: 2017 IEEE international conference on software quality, reliability and security companion, QRS-C 2017, pp 622–623. https://doi.org/10.1109/QRS-C.2017.119
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Rahman, M.M., Rouf, M.A. (2022). Aggressive Fault Tolerance in Cloud Computing Using Smart Decision Agent. In: Arefin, M.S., Kaiser, M.S., Bandyopadhyay, A., Ahad, M.A.R., Ray, K. (eds) Proceedings of the International Conference on Big Data, IoT, and Machine Learning. Lecture Notes on Data Engineering and Communications Technologies, vol 95. Springer, Singapore. https://doi.org/10.1007/978-981-16-6636-0_26
Download citation
DOI: https://doi.org/10.1007/978-981-16-6636-0_26
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-6635-3
Online ISBN: 978-981-16-6636-0
eBook Packages: EngineeringEngineering (R0)