Skip to main content

Recovery: Searching and Monitoring of Correct Software States

  • Chapter
  • First Online:
Software Design for Resilient Computer Systems

Abstract

The last of the three GAFT processes is called recovery and recovery monitoring. After the detection of an error and possible reconfiguration, the last step is recovering the software, which means that the effect of the error on the software must be eliminated. In line with the previous chapters and [1,2,3,4,5,6], the recovery consists of restoring the last recovery point and continuing the processing. But is this really sufficient? What if latent faults exist in the system and manifest themselves in the system but trigger some detection schemes an arbitrary time later? Assuming this reasonable and unpleasant sequence of events, it becomes clear that just restoring data and program from the last stored recovery point is not enough. We have to admit that we do not have any guarantee that fault is now eliminated: even when hardware is restored or even reconfigured—we have erroneous states of software recorded in recovery points. Thus, we have to consider the recovery process itself and analyze which classic algorithms are applicable and fit the purpose of efficient recovery. We introduce and analyze three recovery algorithms that are able to ensure successful recovery by iteratively go through all stored recovery points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sogomonian E, Schagaev I (1988) Hardware and software fault tolerance of computer systems. Avtom I Telemekhanika, 3–39

    Google Scholar 

  2. Schagaev I (1989) Computing process recovery algorithms. Avtomat Telemekh (4)

    Google Scholar 

  3. Schagaev I (1990) Using software recovery methods for determining the type of hardware faults. Autom Remote Control 51(3)

    Google Scholar 

  4. Schagaev I (2008) Reliability of malfunction tolerance. In: International multi-conference on computer science and information technology, 2008. IMCSIT 2008, October 2008, pp 733–737

    Google Scholar 

  5. Schagaev I et al (2010) ERA: evolving reconfigurable architecture. In: 11th ACIS International Conference, June 2010, pp 215–220

    Google Scholar 

  6. Castano V, Schagaev I (2015) Resilient computer system design. Springer. ISBN 978-3-319150-68-0

    Google Scholar 

  7. Schagaev I (1986) Algorithms of computation recovery. Autom Remote Control 7:26, 36, 65, 122

    Google Scholar 

  8. Schagaev I (1987) Algorithms for restoring a computing process. Autom Remote Control 48(4):26, 65, 122, 141, 149

    Google Scholar 

  9. Schagaev I (1989) Instructions retry in microprocessor recovery algorithms. In: IMEKO—FTSD symposium

    Google Scholar 

  10. Schagaev I (1990) Yet another approach to classification of redundancy. In: IBID

    Google Scholar 

  11. Schagaev I (1986) Relationship between the formation of program recovery points and equipment reliability indices. Autom Remote Control 47

    Google Scholar 

  12. Kowalk W (2006) CRC cyclic redundancy check. Technical report. Universität Oldenburg Fachbereich Informatik 05.09.06

    Google Scholar 

  13. Hamming R (1950) Error detection and error correction codes. Bell Syst Tech J XXVI:147–160

    Google Scholar 

  14. Moon T (2005) Error correction coding. Wiley, New Jersey

    Book  Google Scholar 

  15. Schagaev I (1986, December) Using data redundancy for program rollback. Autom Remote Control 47(7), Part 2:1009–1016

    Google Scholar 

  16. Schagaev I., Viktorova V., Comparative analysis of the efficiency of computation-process recovery algorithms. Automation and Remote Control, 51(1), 1990

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Igor Schagaev .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Schagaev, I., Zouev, E., Thomas, K. (2020). Recovery: Searching and Monitoring of Correct Software States. In: Software Design for Resilient Computer Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-21244-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-21244-5_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-21243-8

  • Online ISBN: 978-3-030-21244-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics