Abstract
The previous two chapters have discussed in some detail the first two phases in the provision of fault tolerance in a system, namely, the detection of errors and the subsequent assessment of the extent of damage to the system state. These two phases are passive in the sense that they are not intended to effect any changes to the system. In contrast, the two remaining phases are active since they do change the system and thereby enable faults and their consequences to be tolerated. This chapter addresses the topic of error recovery, the aim of which is to eliminate errors from the system state. Chapter 8 discusses the fault treatment phase of fault tolerance which attempts to clear faults from a system so that further errors are not generated and thus ensure that continued service can be provided.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
T. Anderson and J.C. Knight, “A Framework for Software Fault Tolerance in Real-Time Systems,” IEEE Transactions on Software Engineering SE-9 (3), pp. 355–364 (May 1983).
P.J. Kennedy and T.M. Quinn, “Recovery Strategies in the No. 2 Electronic Switching System,” Digest of Papers: 1972 International Symposium on Fault-Tolerant Computing, Newton (MA), pp. 165–169 (June 1972).
W.W. Peterson and E.J. Weldon Jr, Error-Correcting Codes, MIT Press, Cambridge (MA ) (1972).
W.N. Toy, “Fault-Tolerant Design of Local ESS Processors,” Proceedings of the IEEE 66 (10), pp.1126–1145 (October 1978).
D.J. Taylor and J.P. Black, “Principles of Data Structure Error Correction,” IEEE Transactions on Computers C-31 (7), pp. 602–608 (July 1982).
D.J. Taylor and J.P. Black, “Guidelines for Storage Structure Error Correction,” Digest of Papers FTCS-15: Fifteenth Annual International Symposium on Fault-Tolerant Computing, Ann Arbor (MI), pp. 20–22 (June 1985).
D.J. Taylor and C.H. Seger, “Robust Storage Structures for Crash Recovery,” IEEE Transactions on Computers C-35 (4), pp. 288–295 (April 1986).
I.J. Davis, Error Correction in Robust Storage Structures,PhD. Thesis, University of Waterloo (1988)
J.P. Black, D.J. Taylor, and D.E. Morgan, “A Case Study in Fault Tolerant Software, ” Software–Practice and Experience 11(2), pp. 145–157 (February 1981).
J.R. Connet, E.J. Pasternak, and B.D. Wagner, “Software Defenses in Real-Time Control Systems,” Digest of Papers: 1972 International Symposium on Fault-Tolerant Computing, Newton (MA), pp. 94–99 (June 1972).
R.P. Almquist et al., “Software Protection in No. 1 ESS,” International Switching Symposium Record, Cambridge (MA), pp. 565–569 (June 1972).
F. Cristian, “Exception Handling,” pp. 68–97 in Dependability of Resilient Computers, (ed. T. Anderson ), BSP Professional Books, Oxford (1989).
P.A. Bernstein, “Sequoia: A Fault-Tolerant Tightly Coupled Multiprocessor for Transaction Processing,” IEEE Computer 21 (2), pp.37–45 (February 1988).
E. Gelenbe, “On the Optimum Checkpoint Interval,” Journal of the ACM 26 (2), pp.259–270 (April 1979).
A.B. Tonik, “Checkpoint, Restart and Recovery: Selected Annotated Bibliography,” SIGMOD FDT Bulletin 7 (3–4), pp. 72–76 (1975).
L.A. Bjork, “Generalized Audit Trail Requirements and Concepts for Data Base Applications,” IBM Systems Journal 14 (3), pp. 229–245 (1975).
J.N. Gray, “Notes on Data Base Operating Systems,” pp. 393481 in Lecture Notes in Computer Science 60, (ed. R. Bayer, R.M. Graham and G. Seegmuller ), Springer-Verlag, Berlin (1978).
P.A. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency Control and Recovery in Database Systems, Addison-Wesley, Reading (MA ) (1987).
R. Boyd, “Restoral of a Real Time Operating System,” Proceedings of 1971 ACM Annual Conference, Chicago (IL), pp. 109–111 (August 1971).
J.S.M. Verhofstad, “Recovery Techniques for Data Base Systems,” Computing Surveys 10 (2), pp. 167–195 (June 1978).
D.G. Severance and G.M. Lohman, “Differential Files: their Application to the Maintenance of Large Databases,” ACM Transactions on Database Systems 1 (3), pp. 256–267 (September 1976).
M.M. Astrahan et al.,“System R: Relational Approach to Database Management,” ACM Transactions on Database Systems 1 (2), pp.97–137 (June 1976).
J.N. Gray et al., “The Recovery Manager of a Data Management System,” Report RJ2623, IBM Research Laboratory, San Jose (CA) (August 1979).
J.J. Horning et al., “A Program Structure for Error Detection and Recovery,”pp. 171–187 in Lecture Notes in Computer Science 16, (ed. E. Gelenbe and C. Kaiser), Springer-Verlag, Berlin (1974).
P.A. Lee, N. Ghani, and K. Heron, “A Recovery Cache for the PDP-11,” IEEE Transactions on Computers C -29 (6), pp.546549 (June 1980).
R. Kerr, “An Experimental Processor Architecture for Improved Reliability,” pp. 199–212 in State of the Art Report on System Reliability and Integrity, Infotech, Maidenhead (1978).
T. Anderson and R. Kerr, “Recovery Blocks in Action: A System Supporting High Reliability,” Proceedings of 2nd International Conference on Software Engineering, San Francisco (CA), pp. 447–457 (October 1976).
K.P. Eswaran et al., “The Notion of Consistency and Predicate Locks in a Data Base System,”Communications of the ACM 19 (11), pp.624–633 (November 1976)
T. Anderson, P.A. Lee, and S.K. Shrivastava, “A Model of Recoverability in Multilevel Systems,” IEEE Transactions on Software Engineering SE-4 (6), pp. 486–494 (November 1978).
G.N. Dixon, S.K. Shrivastava, and G.D. Parrington, “Exploiting Type Inheritance Facilities to Implement Recoverability in Object Based Systems,” Proc. of 6th Symposium on Reliability in Distributed Software and Database Systems, Williamsburg, pp. 107–114 (March 1987).
C.A.R. Hoare, “Parallel Programming: An Axiomatic Approach,” pp. 11–42 in Lecture Notes in Computer Science 46, (ed. F.L. Bauer and K. Samelson ), Springer-Verlag, Berlin (1976).
C.A.R. Hoare, “Monitors: An Operating System Structuring Concept,” Communications of the ACM 17 (10), pp. 549–557 (October 1974).
B. Randell, “System Structure for Software Fault Tolerance,” pp. 195–219 in Current Trends in Programming Methodology, Vol. 1, (ed. R.T. Yeh ), Prentice-Hall, Englewood Cliffs (NJ) (1977).
B. Randell, P.A. Lee, and P.C. Treleaven, “Reliability Issues in Computing System Design,” Computing Surveys 10 (2), pp. 123–165 (June 1978).
B. Lampson, “Atomic Transactions,” pp. 246–265 in Distributed Systems - Architecture and Implementation, Lecture Notes in Computer Science 105, (ed. B. Lampson et al.), Springer-Verlag, Berlin (1981).
J.P. Banatre etal., “The Design and Building of ENCHERE, a Distributed Marketing System, ” Communications of the ACM 29(1), PP. 19–29 (January 1986).
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 1990 Springer-Verlag/Wien
About this chapter
Cite this chapter
Lee, P.A., Anderson, T. (1990). Error Recovery. In: Fault Tolerance. Dependable Computing and Fault-Tolerant Systems, vol 3. Springer, Vienna. https://doi.org/10.1007/978-3-7091-8990-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-7091-8990-0_7
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-8992-4
Online ISBN: 978-3-7091-8990-0
eBook Packages: Springer Book Archive