Error Recovery

Lee, Peter Alan; Anderson, Thomas

doi:10.1007/978-3-7091-8990-0_7

Peter Alan Lee² &
Thomas Anderson²

Part of the book series: Dependable Computing and Fault-Tolerant Systems ((DEPENDABLECOMP,volume 3))

196 Accesses
2 Citations

Abstract

The previous two chapters have discussed in some detail the first two phases in the provision of fault tolerance in a system, namely, the detection of errors and the subsequent assessment of the extent of damage to the system state. These two phases are passive in the sense that they are not intended to effect any changes to the system. In contrast, the two remaining phases are active since they do change the system and thereby enable faults and their consequences to be tolerated. This chapter addresses the topic of error recovery, the aim of which is to eliminate errors from the system state. Chapter 8 discusses the fault treatment phase of fault tolerance which attempts to clear faults from a system so that further errors are not generated and thus ensure that continued service can be provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

T. Anderson and J.C. Knight, “A Framework for Software Fault Tolerance in Real-Time Systems,” IEEE Transactions on Software Engineering SE-9 (3), pp. 355–364 (May 1983).
Article Google Scholar
P.J. Kennedy and T.M. Quinn, “Recovery Strategies in the No. 2 Electronic Switching System,” Digest of Papers: 1972 International Symposium on Fault-Tolerant Computing, Newton (MA), pp. 165–169 (June 1972).
Google Scholar
W.W. Peterson and E.J. Weldon Jr, Error-Correcting Codes, MIT Press, Cambridge (MA ) (1972).
MATH Google Scholar
W.N. Toy, “Fault-Tolerant Design of Local ESS Processors,” Proceedings of the IEEE 66 (10), pp.1126–1145 (October 1978).
Google Scholar
D.J. Taylor and J.P. Black, “Principles of Data Structure Error Correction,” IEEE Transactions on Computers C-31 (7), pp. 602–608 (July 1982).
Google Scholar
D.J. Taylor and J.P. Black, “Guidelines for Storage Structure Error Correction,” Digest of Papers FTCS-15: Fifteenth Annual International Symposium on Fault-Tolerant Computing, Ann Arbor (MI), pp. 20–22 (June 1985).
Google Scholar
D.J. Taylor and C.H. Seger, “Robust Storage Structures for Crash Recovery,” IEEE Transactions on Computers C-35 (4), pp. 288–295 (April 1986).
Article Google Scholar
I.J. Davis, Error Correction in Robust Storage Structures,PhD. Thesis, University of Waterloo (1988)
Google Scholar
J.P. Black, D.J. Taylor, and D.E. Morgan, “A Case Study in Fault Tolerant Software, ” Software–Practice and Experience 11(2), pp. 145–157 (February 1981).
Article Google Scholar
J.R. Connet, E.J. Pasternak, and B.D. Wagner, “Software Defenses in Real-Time Control Systems,” Digest of Papers: 1972 International Symposium on Fault-Tolerant Computing, Newton (MA), pp. 94–99 (June 1972).
Google Scholar
R.P. Almquist et al., “Software Protection in No. 1 ESS,” International Switching Symposium Record, Cambridge (MA), pp. 565–569 (June 1972).
Google Scholar
F. Cristian, “Exception Handling,” pp. 68–97 in Dependability of Resilient Computers, (ed. T. Anderson ), BSP Professional Books, Oxford (1989).
Google Scholar
P.A. Bernstein, “Sequoia: A Fault-Tolerant Tightly Coupled Multiprocessor for Transaction Processing,” IEEE Computer 21 (2), pp.37–45 (February 1988).
Google Scholar
E. Gelenbe, “On the Optimum Checkpoint Interval,” Journal of the ACM 26 (2), pp.259–270 (April 1979).
Google Scholar
A.B. Tonik, “Checkpoint, Restart and Recovery: Selected Annotated Bibliography,” SIGMOD FDT Bulletin 7 (3–4), pp. 72–76 (1975).
Google Scholar
L.A. Bjork, “Generalized Audit Trail Requirements and Concepts for Data Base Applications,” IBM Systems Journal 14 (3), pp. 229–245 (1975).
Article MathSciNet Google Scholar
J.N. Gray, “Notes on Data Base Operating Systems,” pp. 393481 in Lecture Notes in Computer Science 60, (ed. R. Bayer, R.M. Graham and G. Seegmuller ), Springer-Verlag, Berlin (1978).
Google Scholar
P.A. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency Control and Recovery in Database Systems, Addison-Wesley, Reading (MA ) (1987).
Google Scholar
R. Boyd, “Restoral of a Real Time Operating System,” Proceedings of 1971 ACM Annual Conference, Chicago (IL), pp. 109–111 (August 1971).
Google Scholar
J.S.M. Verhofstad, “Recovery Techniques for Data Base Systems,” Computing Surveys 10 (2), pp. 167–195 (June 1978).
Article Google Scholar
D.G. Severance and G.M. Lohman, “Differential Files: their Application to the Maintenance of Large Databases,” ACM Transactions on Database Systems 1 (3), pp. 256–267 (September 1976).
Article Google Scholar
M.M. Astrahan et al.,“System R: Relational Approach to Database Management,” ACM Transactions on Database Systems 1 (2), pp.97–137 (June 1976).
Article Google Scholar
J.N. Gray et al., “The Recovery Manager of a Data Management System,” Report RJ2623, IBM Research Laboratory, San Jose (CA) (August 1979).
Google Scholar
J.J. Horning et al., “A Program Structure for Error Detection and Recovery,”pp. 171–187 in Lecture Notes in Computer Science 16, (ed. E. Gelenbe and C. Kaiser), Springer-Verlag, Berlin (1974).
Google Scholar
P.A. Lee, N. Ghani, and K. Heron, “A Recovery Cache for the PDP-11,” IEEE Transactions on Computers C -29 (6), pp.546549 (June 1980).
Google Scholar
R. Kerr, “An Experimental Processor Architecture for Improved Reliability,” pp. 199–212 in State of the Art Report on System Reliability and Integrity, Infotech, Maidenhead (1978).
Google Scholar
T. Anderson and R. Kerr, “Recovery Blocks in Action: A System Supporting High Reliability,” Proceedings of 2nd International Conference on Software Engineering, San Francisco (CA), pp. 447–457 (October 1976).
Google Scholar
K.P. Eswaran et al., “The Notion of Consistency and Predicate Locks in a Data Base System,”Communications of the ACM 19 (11), pp.624–633 (November 1976)
Google Scholar
T. Anderson, P.A. Lee, and S.K. Shrivastava, “A Model of Recoverability in Multilevel Systems,” IEEE Transactions on Software Engineering SE-4 (6), pp. 486–494 (November 1978).
Google Scholar
G.N. Dixon, S.K. Shrivastava, and G.D. Parrington, “Exploiting Type Inheritance Facilities to Implement Recoverability in Object Based Systems,” Proc. of 6th Symposium on Reliability in Distributed Software and Database Systems, Williamsburg, pp. 107–114 (March 1987).
Google Scholar
C.A.R. Hoare, “Parallel Programming: An Axiomatic Approach,” pp. 11–42 in Lecture Notes in Computer Science 46, (ed. F.L. Bauer and K. Samelson ), Springer-Verlag, Berlin (1976).
Google Scholar
C.A.R. Hoare, “Monitors: An Operating System Structuring Concept,” Communications of the ACM 17 (10), pp. 549–557 (October 1974).
Google Scholar
B. Randell, “System Structure for Software Fault Tolerance,” pp. 195–219 in Current Trends in Programming Methodology, Vol. 1, (ed. R.T. Yeh ), Prentice-Hall, Englewood Cliffs (NJ) (1977).
Google Scholar
B. Randell, P.A. Lee, and P.C. Treleaven, “Reliability Issues in Computing System Design,” Computing Surveys 10 (2), pp. 123–165 (June 1978).
Google Scholar
B. Lampson, “Atomic Transactions,” pp. 246–265 in Distributed Systems - Architecture and Implementation, Lecture Notes in Computer Science 105, (ed. B. Lampson et al.), Springer-Verlag, Berlin (1981).
Google Scholar
J.P. Banatre etal., “The Design and Building of ENCHERE, a Distributed Marketing System, ” Communications of the ACM 29(1), PP. 19–29 (January 1986).
Google Scholar

Download references

Author information

Authors and Affiliations

Computing Laboratory, University of Newcastle upon Tyne, UK
Peter Alan Lee & Thomas Anderson

Authors

Peter Alan Lee
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Anderson
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lee, P.A., Anderson, T. (1990). Error Recovery. In: Fault Tolerance. Dependable Computing and Fault-Tolerant Systems, vol 3. Springer, Vienna. https://doi.org/10.1007/978-3-7091-8990-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-7091-8990-0_7
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-8992-4
Online ISBN: 978-3-7091-8990-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics