Skip to main content
Log in

Transformation of programs for fault-tolerance

  • Published:
Formal Aspects of Computing

Abstract

In this paper we describe how a program constructed for afault-free system can be transformed into afault-tolerant program for execution on a system which is susceptible to failures. A program is described by a set of atomic actions which perform transformations from states to states. We assume that a fault environment is represented by a programF. Interference by the fault environmentF on the execution of a programP can then be described as afault-transformation ℱ which transformsP into a program ℱ(P). This is proved to be equivalent to the programPP F , whereP F is derived fromP andF, and □ defines the union of the sets of actions ofP andF P . A recovery transformation ℛ transformsP into a program ℛ(P) =PR by adding a set ofrecovery actions R, called arecovery program. If the system isfailstop and faults do not affect recovery actions, we have ℱ(ℛ(P))=ℱ(P)□R=P□P F R We illustrate this approach to fault-tolerant programming by considering the problem of designing a protocol that guarantees reliable communication from a sender to a receiver in spite of faults in the communication channel between them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Abadi, M. and Lamport, L.: The Existence of Refinement Mapping. In:Proc. 3rd IEEE Symp. on Logic and Computer Science, 1988.

  2. Anderson, T. and Lee, P. A.:Fault-tolerance: Principles and Practice. Prentice-Hall International, 1981.

  3. Back, R. J. R.: A Calculus of Refinement for Program Derivations. Technical Report 54, Abo Akademi, 1987.

  4. Back, R. J. R.: Refining Atomicity in Parallel Algorithms. Technical Report 57, Abo Akademi, 1988.

  5. Back, R. J. R.: Refinement Calculus, Part II: Parallel and Reactive Programs. In:Lecture Notes in Computer Science 340, pp. 67–93. Springer-Verlag, 1989.

  6. Back, R. J. R. and Kurki-Suonio, R.: Decentralization of Process Nets with Centralized Control. In:Second Annual ACM Symp. on Principles of Distributed Computing, pp. 131–142, 1983.

  7. Back, R. J. R. and Sere, K.: Stepwise Refinement of Parallel Algorithms. Technical Report 64, Abo Akademi, 1988.

  8. Back, R. J. R. and Sere, K.: Stepwise Refinement of Action Systems. Technical Report 78, Abo Akademi, 1989.

  9. Back, R. J. R. and Wright, J. von: Refinement Calculus, Part I: Sequential Nondeterministic Programs. In:Lecture Notes in Computer Science 340, pp. 42–66. Springer-Verlag, 1989.

  10. Best, E. and Randell, B.: A Formal Model of Atomicity in Asynchronous Systems.Acta Informatica,16, 93–124 (1981).

    Google Scholar 

  11. Chandy, K. M. and Misra, J.:Parallel Program Design: A Foundation. Addison-Wesley Publishing Company, 1988.

  12. Cristian, F.: A Rigorous Approach to Fault Tolerant Programming.IEEE Transactions on Software Engineering,SE-11(1), 23–31 (1985).

    Google Scholar 

  13. Dijkstra, E. W.:A Discipline of Programming. Prentice-Hall, Englewood Cliffs, NJ, 1976.

    Google Scholar 

  14. Francez, N.:Fairness. Springer-Verlag, New York, 1986.

    Google Scholar 

  15. Gerth, R. and Pnueli, A.: Rooting UNITY. In:Proc. 5th IEEE International Workshop on Software Specification and Design, February 1989.

  16. He, J. and Hoare, C. A. R.: Algebraic Specification and Proof of a Distributed Recovery Algorithm.Distributed Computing,2, 1–12 (1987).

    Google Scholar 

  17. Herlihy, M. P. and Wing, J. M.: Reasoning About Atomic Objects. In:Lecture Notes in Computer Science 331, pp. 193–208, Springer-Verlag, 1988.

  18. Hoare, C. A. R.: An Axiomatic Basis for Computer Programming.Communications of the ACM,12(10), 576–583 (1969).

    Google Scholar 

  19. Joseph, M. and Goswami, A.: What's ‘Real’ About Real-Time Systems? In:Proc. of IEEE Real-time Systems Symp., pp. 78–85, Huntsville, Alabama, December 1988.

  20. Joseph, M., Moitra, A. and Soundararajan, N.: Proof Rules for Fault Tolerant Distributed Programs.Science of Computer Programming,8(1), 43–67 (1987).

    Google Scholar 

  21. Koo, R. and Toueg, S.: Checkpointing and Rollback-Recovery for Distributed Systems.IEEE Transactions on Software Engineering,SE-13(1), 23–31 (1987).

    Google Scholar 

  22. Lamport, L.: Reasoning about Nonatomic Operations. In:Proc. 10th ACM Conf. on Principles of Programming Languages, pp. 28–37, 1983.

  23. Lamport, L.:win andsin: Predicate Transformers for Concurrency. Technical Report 17, Systems Research Center of Digital Equipment Corporation in Palo Alto, California, May 1987.

    Google Scholar 

  24. Lamport, L.: A Temporal Logic of Actions. Technical report, Digital SRC, California, April 1990.

    Google Scholar 

  25. Lamport, L.: The Temporal Logic of Actions. Digital SRC, California, January 1991.

    Google Scholar 

  26. Liu, Z.: Fault-Tolerant Programming By Transformations. PhD Thesis, Department of Computer Science, University of Warwick, July 1991.

  27. Morgan, C:Programming from Specification. Prentice Hall, 1990.

  28. Manna, Z. and Pnueli, A.: How to Cook a Temporal Proof System for Your Pet Language. In:Proc. 10th Ann. ACM Symp. on Principles of Programming Languages, Austin, Texas, 1983.

  29. Merlin, P. M. and Randell, B.: State Restoration in Distributed Systems. In:Proc. of 8th Ann. Int. Symp. on Fault-Tolerant Computing., pp. 129–134, Toulouse, France, 1978.

  30. Randell, B.: System Structure for Software Fault Tolerance.IEEE Transactions on Software Engineering,SE-1(2), 220–232 (1975).

    Google Scholar 

  31. Schlichting, R. D. and Schneider, F. B.: Fail-Stop Processes: An Approach to Designing Fault-Tolerant Computing Systems.ACM Transactions on Computer Systems,1(3), 222–238 (1983).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Z., Joseph, M. Transformation of programs for fault-tolerance. Formal Aspects of Computing 4, 442–469 (1992). https://doi.org/10.1007/BF01211393

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01211393

Keywords

Navigation