Advertisement

Formalizing inductive proofs of message recovery in distributed systems

  • Pankaj Jalote
Concurrency and Networking
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1023)

Abstract

If a process fails in a distributed systems, for proper recovery, the messages sent to the process need to be recovered. We present sufficient conditions for recovering the messages for a distributed application. For a general purpose recovery technique these also become necessary conditions. ¿From the conditions it is clear that requiring messages to be recovered in the same order as they were received by a process before failure is a stricter requirement than necessary.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    J. F. Bartlett, “A NonStop kernel”, Proc. of 7th ACM Symp. on Operating Systems Principles, 1981, pp. 22–29.Google Scholar
  2. 2.
    A. Borg, J. Baumbach and S. Glazer, “A message system supporting fault tolerance”, 9th ACM Symp. on Op, Sys. Principles, Op. Sys. Review, Vol. 17:5, Oct 1983, pp. 90–99.Google Scholar
  3. 3.
    P. Jalote, “Fault tolerant processes”, Distributed Computing, Vol 3, pp. 187–195, 1989.Google Scholar
  4. 4.
    P. Jalote, “Fault Tolerance in Distributed Systems”, PTR Prentice Hall, Englewood Cliffs, NJ, 1994.Google Scholar
  5. 5.
    D. B. Johnson and W. Zwaenepoel, “Sender-based message logging”, Digest of Papers: The 17th Int. Fault Tolerant Computing Symposium, July 1987, Pittsburgh, pp. 14–19.Google Scholar
  6. 6.
    D. B. Johnson and W. Zwaenepoel, “Recovery in distributed systems using optimistic message logging and checkpointing”, Journal of Algorithms, Vol 11, pp. 462–491, 1990.Google Scholar
  7. 7.
    M. L. Powell and D. L. Presotto, “PUBLISHING: a reliable broadcast communication mechanism”, 9th ACM Symp. on Op. Sys. Principles, Op. Sys. Review, Vol. 17:5, Oct. 1983, pp. 100–109.Google Scholar
  8. 8.
    R. D. Schlichting and F. B. Schneider, “Fail-stop processors: an approach to designing fault-tolerant computing systems”, ACM Tran. on Comput. Systems, Vol. 1, no. 3, Aug. 1983, pp. 222–238.Google Scholar
  9. 9.
    F. B. Schneider, “Synchronization in distributed programs”, ACM Tran. on Prog. Languages and Systems, Vol. 4, no. 2, April 1982, pp. 179–195.Google Scholar
  10. 10.
    R. E. Strom and S. Yemini, “Optimistic recovery: an asynchronous approach to fault-tolerance in distributed systems”, Digest of Papers: The 14th Int. Fault Tolerant Computing Symposium, 1984, Florida, pp. 374–379.Google Scholar
  11. 11.
    R. E. Strom and S. Yemini, “Optimistic recovery in distributed systems”, ACM Tran. on Comput. Sys., Vol. 3, no. 3, pp. 204–226, 1985.Google Scholar
  12. 12.
    Y. M. Want and W. K. Fuchs, “Optimistic message logging for independent check-pointing in a message passing system”, Proc. 11th Symp. on Reliable Dist. Sys., 1992, pp. 147–154.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1995

Authors and Affiliations

  • Pankaj Jalote
    • 1
  1. 1.Department of Computer Science and EngineeringIndian Institute of TechnologyKanpurIndia

Personalised recommendations