Advertisement

Significant checkpoint in distributed system

  • Katsuya Tanaka
  • Hiroaki Higaki
  • Makoto Takizawa
Parallel and Distributed Systems
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1134)

Abstract

In distributed applications, a group of multiple objects are cooperated to achieve some objectives. The objects may suffer from kinds of faults. If some object o is faulty, o is rolled back to the checkpoint and objects which have received messages from o are also required to be rolled back. In this paper, on the basis of the message semantics, we define influential messages whose receivers are required to be rolled back from the application point of view if the senders are rolled back. By using the influential messages, a significant checkpoint is defined to denote a consistent global state of the system while being inconsistent from the traditional definition. We would present protocols for taking the significant checkpoint and for rolling back the objects.

Keywords

Global State Significant Message Response Message Data Message Significant Checkpoint 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bernstein, P. A., Hadzilacos, V., and Goodman, N., “Concurrency Control and Recovery in Database Systems,” Addison-Wesley Publishing Company, 1987.Google Scholar
  2. 2.
    Bhargava, B. and Lian, S. R., “Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems — An Optimistic Approach,” Proc. of the 7th Symp. on Reliable Distributed Systems, 1988, pp. 3–12.Google Scholar
  3. 3.
    Birman, K. P. and Joseph, T. A., “Reliable Communication in the Presence of Failures,” ACM Trans. on Computer Systems, Vol.5, No.1, 1987, pp.47–76.CrossRefGoogle Scholar
  4. 4.
    Chandy, K. M. and Lamport, L., “Distributed Snapshots: Determining Global States of Distributed Systems,” ACM Trans. on Computer Systems, Vol. 3, No. 1, 1985, pp. 63–75.CrossRefGoogle Scholar
  5. 5.
    Fischer, M. J., Griffeth, N. D., and Lynch, N. A., “Global States of a Distributed System,” IEEE Trans. on Software Engineering, Vol. 8, No. 3, 1982.Google Scholar
  6. 6.
    Higaki, H. and Soneoka, T., “Group-to-Group Communications for Fault-Tolerance in Distributed Systems,” IEICE Trans. on Information and Systems, Vol.E76-D, No.11, 1993, pp.1348–1357.Google Scholar
  7. 7.
    Higaki, H. and Hirakawa, Y., “Group Communications for Upgrading Distributed Programs,” Proc. of IEEE ICDCS-16, 1996, pp.420–427.Google Scholar
  8. 8.
    Johnson, D. and Zwaenepoel, W., “Recovery in Distributed Systems using Optimistic Message Logging and Checkpointing,” Proc. of ACM Symp. on Principles of Distributed Computing, 1988, pp. 171–180.Google Scholar
  9. 9.
    Koo, R. and Toueg, S., “Checkpointing and Rollback-Recovery for Distributed Systems,” IEEE Trans. on Computers, Vol. SE-13, No. 1, 1987, pp. 23–31.Google Scholar
  10. 10.
    Lamport, L., “Time, Clocks, and the Ordering of Events in a Distributed System,” Comm. ACM, Vol.21, No.7, 1978, pp.558–565.CrossRefGoogle Scholar
  11. 11.
    Leong, H. V. and Agrawal, D., “Using Message Semantics to Reduce Rollback in Optimistic Message Logging Recovery Schemes,” Proc. of IEEE ICDCS-14, 1994, pp.227–234.Google Scholar
  12. 12.
    Manivannan, D. and Singhai, M., “A Low-Overhead Recovery Technique Using Quasi-Synchronous Checkpointing,” Proc. of IEEE ICDCS-16, 1996, pp.100–107.Google Scholar
  13. 13.
    Nakamura, A. and Takizawa, M., “Causally Ordering Broadcast Protocol,” Proc. of IEEE ICDCS-14, 1994, pp.48–55.Google Scholar
  14. 14.
    Ramanathan, P. and Shin K. G., “Checkpointing and Rollback Recovery in a Distributed System Using Common Time Base,” Proc. of the 7th IEEE Symp. on Reliable Distributed Systems, 1988, pp. 13–21.Google Scholar
  15. 15.
    Tachikawa, T. and Takizawa, M., “Communication Protocol for Group of Distributed Objects,” to appear in Proc. of IEEE ICPADS'96, 1996.Google Scholar
  16. 16.
    Tanaka, K. and Takizawa, M., “Distributed Checkpointing Based on Influential Messages,” to appear in Proc. of IEEE ICPADS'96, 1996Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Katsuya Tanaka
    • 1
  • Hiroaki Higaki
    • 1
  • Makoto Takizawa
    • 1
  1. 1.Dept. of Computers and Systems EngineeringTokyo Denki UniversitySaitamaJapan

Personalised recommendations