Using relative refinement for fault tolerance
A general refinement methodology is presented based on ideas of Stark, and it is explained how these can be used for the systematic development of fault-tolerant systems. Highlights are: (1) A detailed and comprehensive exposition of Stark's temporal logic and development methodology. (2) A formalization of a general systematic approach to the development of fault-tolerant systems, accomplishing increasing degrees of coverage with each successive refinement stage. That is, faults are already identified and modeled at the first implementation level, which is shown to be a relative refinement, i.e., correct for all computations in which faults do not occur. The second implementation is a fail-stop implementation, i.e., an implementation that stops on the first detected occurrence of a fault. This implementation is also a relative refinement, i.e., correct in all computations in which the program never stops. The final implementation is correct in all computations, except those that display severe faults that violate the fault-tolerance assumptions, such as all n components failing in an n-way redundant way in case of stable storage. (3) A detailed example of a multi-disk system providing stable storage, illustrating this general methodology.
Unable to display preview. Download preview PDF.
- 1.M. Abadi and L. Lamport. The existence of refinement mappings. In Third annual symposium on Logic in Computer Science, pages 165–175, July 1988.Google Scholar
- 3.A. Cau, R. Kuiper, and W.-P. de Roever. Formalising Dijkstra's development strategy within Stark's formalism. In R. C. Shaw C. B. Jones and Tim Denvir, editors, Proc. 5th. BCS-FACS Refinement Workshop, 1992.Google Scholar
- 4.E. Diepstraten and R. Kuiper. Abadi & Lamport and Stark: towards a proof theory for stuttering, dense domains and refinements mappings. In LNCS 430:Proc. of the REX Workshop on Stepwise Refinement of Distributed Systems, Models, Formalisms, Correctness, pages 208–238. Springer-Verlag, 1990.Google Scholar
- 5.E.W. Dijkstra. A tutorial on the split binary semaphore, 1979. EWD 703.Google Scholar
- 7.P.A. Lee and T. Anderson. Fault Tolerance Principles and Practice, volume 3 of Dependable Computing and Fault-Tolerant Systems. Springer-Verlag, second, revised edition, 1990.Google Scholar
- 8.S. Lee, S. Gerhart, and W.-P. de Roever. The evolution of list-copying algorithms and the need for structured program verification. In Proc. of 6th POPL, 1979.Google Scholar
- 9.P.R.H. Place, W.G. Wood, and M. Tudball. Survey of formal specification techniques for reactive systems. Technical Report, 1990.Google Scholar
- 10.H. Schepers. Terminology and Paradigms for Fault Tolerance. Computing Science Notes 91/08 of the Department of Mathematics and Computing Science Eindhoven University of Technology, 1991.Google Scholar
- 11.E.W. Stark. Foundations of a Theory of Specification for Distributed Systems. PhD thesis, Massachusetts Inst. of Technology, 1984. Available as Report No. MIT/LCS/TR-342.Google Scholar
- 12.E.W. Stark. A Proof Technique for Rely/Guarantee Properties. In LNCS 206: Fifth Conference on Foundations of Software Technology and Theoretical Computer Science, pages 369–391. Springer-Verlag, 1985.Google Scholar