Skip to main content

Error Recovery

  • Chapter
Book cover Fault Tolerance

Part of the book series: Dependable Computing and Fault-Tolerant Systems ((DEPENDABLECOMP,volume 3))

Abstract

The previous two chapters have discussed in some detail the first two phases in the provision of fault tolerance in a system, namely, the detection of errors and the subsequent assessment of the extent of damage to the system state. These two phases are passive in the sense that they are not intended to effect any changes to the system. In contrast, the two remaining phases are active since they do change the system and thereby enable faults and their consequences to be tolerated. This chapter addresses the topic of error recovery, the aim of which is to eliminate errors from the system state. Chapter 8 discusses the fault treatment phase of fault tolerance which attempts to clear faults from a system so that further errors are not generated and thus ensure that continued service can be provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. T. Anderson and J.C. Knight, “A Framework for Software Fault Tolerance in Real-Time Systems,” IEEE Transactions on Software Engineering SE-9 (3), pp. 355–364 (May 1983).

    Article  Google Scholar 

  2. P.J. Kennedy and T.M. Quinn, “Recovery Strategies in the No. 2 Electronic Switching System,” Digest of Papers: 1972 International Symposium on Fault-Tolerant Computing, Newton (MA), pp. 165–169 (June 1972).

    Google Scholar 

  3. W.W. Peterson and E.J. Weldon Jr, Error-Correcting Codes, MIT Press, Cambridge (MA ) (1972).

    MATH  Google Scholar 

  4. W.N. Toy, “Fault-Tolerant Design of Local ESS Processors,” Proceedings of the IEEE 66 (10), pp.1126–1145 (October 1978).

    Google Scholar 

  5. D.J. Taylor and J.P. Black, “Principles of Data Structure Error Correction,” IEEE Transactions on Computers C-31 (7), pp. 602–608 (July 1982).

    Google Scholar 

  6. D.J. Taylor and J.P. Black, “Guidelines for Storage Structure Error Correction,” Digest of Papers FTCS-15: Fifteenth Annual International Symposium on Fault-Tolerant Computing, Ann Arbor (MI), pp. 20–22 (June 1985).

    Google Scholar 

  7. D.J. Taylor and C.H. Seger, “Robust Storage Structures for Crash Recovery,” IEEE Transactions on Computers C-35 (4), pp. 288–295 (April 1986).

    Article  Google Scholar 

  8. I.J. Davis, Error Correction in Robust Storage Structures,PhD. Thesis, University of Waterloo (1988)

    Google Scholar 

  9. J.P. Black, D.J. Taylor, and D.E. Morgan, “A Case Study in Fault Tolerant Software, ” Software–Practice and Experience 11(2), pp. 145–157 (February 1981).

    Article  Google Scholar 

  10. J.R. Connet, E.J. Pasternak, and B.D. Wagner, “Software Defenses in Real-Time Control Systems,” Digest of Papers: 1972 International Symposium on Fault-Tolerant Computing, Newton (MA), pp. 94–99 (June 1972).

    Google Scholar 

  11. R.P. Almquist et al., “Software Protection in No. 1 ESS,” International Switching Symposium Record, Cambridge (MA), pp. 565–569 (June 1972).

    Google Scholar 

  12. F. Cristian, “Exception Handling,” pp. 68–97 in Dependability of Resilient Computers, (ed. T. Anderson ), BSP Professional Books, Oxford (1989).

    Google Scholar 

  13. P.A. Bernstein, “Sequoia: A Fault-Tolerant Tightly Coupled Multiprocessor for Transaction Processing,” IEEE Computer 21 (2), pp.37–45 (February 1988).

    Google Scholar 

  14. E. Gelenbe, “On the Optimum Checkpoint Interval,” Journal of the ACM 26 (2), pp.259–270 (April 1979).

    Google Scholar 

  15. A.B. Tonik, “Checkpoint, Restart and Recovery: Selected Annotated Bibliography,” SIGMOD FDT Bulletin 7 (3–4), pp. 72–76 (1975).

    Google Scholar 

  16. L.A. Bjork, “Generalized Audit Trail Requirements and Concepts for Data Base Applications,” IBM Systems Journal 14 (3), pp. 229–245 (1975).

    Article  MathSciNet  Google Scholar 

  17. J.N. Gray, “Notes on Data Base Operating Systems,” pp. 393481 in Lecture Notes in Computer Science 60, (ed. R. Bayer, R.M. Graham and G. Seegmuller ), Springer-Verlag, Berlin (1978).

    Google Scholar 

  18. P.A. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency Control and Recovery in Database Systems, Addison-Wesley, Reading (MA ) (1987).

    Google Scholar 

  19. R. Boyd, “Restoral of a Real Time Operating System,” Proceedings of 1971 ACM Annual Conference, Chicago (IL), pp. 109–111 (August 1971).

    Google Scholar 

  20. J.S.M. Verhofstad, “Recovery Techniques for Data Base Systems,” Computing Surveys 10 (2), pp. 167–195 (June 1978).

    Article  Google Scholar 

  21. D.G. Severance and G.M. Lohman, “Differential Files: their Application to the Maintenance of Large Databases,” ACM Transactions on Database Systems 1 (3), pp. 256–267 (September 1976).

    Article  Google Scholar 

  22. M.M. Astrahan et al.,“System R: Relational Approach to Database Management,” ACM Transactions on Database Systems 1 (2), pp.97–137 (June 1976).

    Article  Google Scholar 

  23. J.N. Gray et al., “The Recovery Manager of a Data Management System,” Report RJ2623, IBM Research Laboratory, San Jose (CA) (August 1979).

    Google Scholar 

  24. J.J. Horning et al., “A Program Structure for Error Detection and Recovery,”pp. 171–187 in Lecture Notes in Computer Science 16, (ed. E. Gelenbe and C. Kaiser), Springer-Verlag, Berlin (1974).

    Google Scholar 

  25. P.A. Lee, N. Ghani, and K. Heron, “A Recovery Cache for the PDP-11,” IEEE Transactions on Computers C -29 (6), pp.546549 (June 1980).

    Google Scholar 

  26. R. Kerr, “An Experimental Processor Architecture for Improved Reliability,” pp. 199–212 in State of the Art Report on System Reliability and Integrity, Infotech, Maidenhead (1978).

    Google Scholar 

  27. T. Anderson and R. Kerr, “Recovery Blocks in Action: A System Supporting High Reliability,” Proceedings of 2nd International Conference on Software Engineering, San Francisco (CA), pp. 447–457 (October 1976).

    Google Scholar 

  28. K.P. Eswaran et al., “The Notion of Consistency and Predicate Locks in a Data Base System,”Communications of the ACM 19 (11), pp.624–633 (November 1976)

    Google Scholar 

  29. T. Anderson, P.A. Lee, and S.K. Shrivastava, “A Model of Recoverability in Multilevel Systems,” IEEE Transactions on Software Engineering SE-4 (6), pp. 486–494 (November 1978).

    Google Scholar 

  30. G.N. Dixon, S.K. Shrivastava, and G.D. Parrington, “Exploiting Type Inheritance Facilities to Implement Recoverability in Object Based Systems,” Proc. of 6th Symposium on Reliability in Distributed Software and Database Systems, Williamsburg, pp. 107–114 (March 1987).

    Google Scholar 

  31. C.A.R. Hoare, “Parallel Programming: An Axiomatic Approach,” pp. 11–42 in Lecture Notes in Computer Science 46, (ed. F.L. Bauer and K. Samelson ), Springer-Verlag, Berlin (1976).

    Google Scholar 

  32. C.A.R. Hoare, “Monitors: An Operating System Structuring Concept,” Communications of the ACM 17 (10), pp. 549–557 (October 1974).

    Google Scholar 

  33. B. Randell, “System Structure for Software Fault Tolerance,” pp. 195–219 in Current Trends in Programming Methodology, Vol. 1, (ed. R.T. Yeh ), Prentice-Hall, Englewood Cliffs (NJ) (1977).

    Google Scholar 

  34. B. Randell, P.A. Lee, and P.C. Treleaven, “Reliability Issues in Computing System Design,” Computing Surveys 10 (2), pp. 123–165 (June 1978).

    Google Scholar 

  35. B. Lampson, “Atomic Transactions,” pp. 246–265 in Distributed Systems - Architecture and Implementation, Lecture Notes in Computer Science 105, (ed. B. Lampson et al.), Springer-Verlag, Berlin (1981).

    Google Scholar 

  36. J.P. Banatre etal., “The Design and Building of ENCHERE, a Distributed Marketing System, ” Communications of the ACM 29(1), PP. 19–29 (January 1986).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 1990 Springer-Verlag/Wien

About this chapter

Cite this chapter

Lee, P.A., Anderson, T. (1990). Error Recovery. In: Fault Tolerance. Dependable Computing and Fault-Tolerant Systems, vol 3. Springer, Vienna. https://doi.org/10.1007/978-3-7091-8990-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-7091-8990-0_7

  • Publisher Name: Springer, Vienna

  • Print ISBN: 978-3-7091-8992-4

  • Online ISBN: 978-3-7091-8990-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics