Formalizing inductive proofs of message recovery in distributed systems
If a process fails in a distributed systems, for proper recovery, the messages sent to the process need to be recovered. We present sufficient conditions for recovering the messages for a distributed application. For a general purpose recovery technique these also become necessary conditions. ¿From the conditions it is clear that requiring messages to be recovered in the same order as they were received by a process before failure is a stricter requirement than necessary.
Unable to display preview. Download preview PDF.
- 1.J. F. Bartlett, “A NonStop kernel”, Proc. of 7th ACM Symp. on Operating Systems Principles, 1981, pp. 22–29.Google Scholar
- 2.A. Borg, J. Baumbach and S. Glazer, “A message system supporting fault tolerance”, 9th ACM Symp. on Op, Sys. Principles, Op. Sys. Review, Vol. 17:5, Oct 1983, pp. 90–99.Google Scholar
- 3.P. Jalote, “Fault tolerant processes”, Distributed Computing, Vol 3, pp. 187–195, 1989.Google Scholar
- 4.P. Jalote, “Fault Tolerance in Distributed Systems”, PTR Prentice Hall, Englewood Cliffs, NJ, 1994.Google Scholar
- 5.D. B. Johnson and W. Zwaenepoel, “Sender-based message logging”, Digest of Papers: The 17th Int. Fault Tolerant Computing Symposium, July 1987, Pittsburgh, pp. 14–19.Google Scholar
- 6.D. B. Johnson and W. Zwaenepoel, “Recovery in distributed systems using optimistic message logging and checkpointing”, Journal of Algorithms, Vol 11, pp. 462–491, 1990.Google Scholar
- 7.M. L. Powell and D. L. Presotto, “PUBLISHING: a reliable broadcast communication mechanism”, 9th ACM Symp. on Op. Sys. Principles, Op. Sys. Review, Vol. 17:5, Oct. 1983, pp. 100–109.Google Scholar
- 8.R. D. Schlichting and F. B. Schneider, “Fail-stop processors: an approach to designing fault-tolerant computing systems”, ACM Tran. on Comput. Systems, Vol. 1, no. 3, Aug. 1983, pp. 222–238.Google Scholar
- 9.F. B. Schneider, “Synchronization in distributed programs”, ACM Tran. on Prog. Languages and Systems, Vol. 4, no. 2, April 1982, pp. 179–195.Google Scholar
- 10.R. E. Strom and S. Yemini, “Optimistic recovery: an asynchronous approach to fault-tolerance in distributed systems”, Digest of Papers: The 14th Int. Fault Tolerant Computing Symposium, 1984, Florida, pp. 374–379.Google Scholar
- 11.R. E. Strom and S. Yemini, “Optimistic recovery in distributed systems”, ACM Tran. on Comput. Sys., Vol. 3, no. 3, pp. 204–226, 1985.Google Scholar
- 12.Y. M. Want and W. K. Fuchs, “Optimistic message logging for independent check-pointing in a message passing system”, Proc. 11th Symp. on Reliable Dist. Sys., 1992, pp. 147–154.Google Scholar