Abstract
The last of the three GAFT processes is called recovery and recovery monitoring. After the detection of an error and possible reconfiguration, the last step is recovering the software, which means that the effect of the error on the software must be eliminated. In line with the previous chapters and [1,2,3,4,5,6], the recovery consists of restoring the last recovery point and continuing the processing. But is this really sufficient? What if latent faults exist in the system and manifest themselves in the system but trigger some detection schemes an arbitrary time later? Assuming this reasonable and unpleasant sequence of events, it becomes clear that just restoring data and program from the last stored recovery point is not enough. We have to admit that we do not have any guarantee that fault is now eliminated: even when hardware is restored or even reconfigured—we have erroneous states of software recorded in recovery points. Thus, we have to consider the recovery process itself and analyze which classic algorithms are applicable and fit the purpose of efficient recovery. We introduce and analyze three recovery algorithms that are able to ensure successful recovery by iteratively go through all stored recovery points.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sogomonian E, Schagaev I (1988) Hardware and software fault tolerance of computer systems. Avtom I Telemekhanika, 3–39
Schagaev I (1989) Computing process recovery algorithms. Avtomat Telemekh (4)
Schagaev I (1990) Using software recovery methods for determining the type of hardware faults. Autom Remote Control 51(3)
Schagaev I (2008) Reliability of malfunction tolerance. In: International multi-conference on computer science and information technology, 2008. IMCSIT 2008, October 2008, pp 733–737
Schagaev I et al (2010) ERA: evolving reconfigurable architecture. In: 11th ACIS International Conference, June 2010, pp 215–220
Castano V, Schagaev I (2015) Resilient computer system design. Springer. ISBN 978-3-319150-68-0
Schagaev I (1986) Algorithms of computation recovery. Autom Remote Control 7:26, 36, 65, 122
Schagaev I (1987) Algorithms for restoring a computing process. Autom Remote Control 48(4):26, 65, 122, 141, 149
Schagaev I (1989) Instructions retry in microprocessor recovery algorithms. In: IMEKO—FTSD symposium
Schagaev I (1990) Yet another approach to classification of redundancy. In: IBID
Schagaev I (1986) Relationship between the formation of program recovery points and equipment reliability indices. Autom Remote Control 47
Kowalk W (2006) CRC cyclic redundancy check. Technical report. Universität Oldenburg Fachbereich Informatik 05.09.06
Hamming R (1950) Error detection and error correction codes. Bell Syst Tech J XXVI:147–160
Moon T (2005) Error correction coding. Wiley, New Jersey
Schagaev I (1986, December) Using data redundancy for program rollback. Autom Remote Control 47(7), Part 2:1009–1016
Schagaev I., Viktorova V., Comparative analysis of the efficiency of computation-process recovery algorithms. Automation and Remote Control, 51(1), 1990
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Schagaev, I., Zouev, E., Thomas, K. (2020). Recovery: Searching and Monitoring of Correct Software States. In: Software Design for Resilient Computer Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-21244-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-21244-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21243-8
Online ISBN: 978-3-030-21244-5
eBook Packages: EngineeringEngineering (R0)