Advertisement

Using Computing Checkpoints Implement Consistent Low-Cost Non-blocking Coordinated Checkpointing

  • Chaoguang Men
  • Xiaozong Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3320)

Abstract

Two approaches are used to reduce the overhead associated with coordinated checkpointing:one is to reduce the number of synchronization messages and the number of checkpoints;the other is to make the checkpointing process non-blocking.In this paper, we introduce the concept of “computing checkpoint” to design an efficient consistent non-blocking coordinated checkpointing algorithm that combines these two approaches.Through piggybacking the information that which processes have taken new checkpoints in the broadcast committing message, the checkpoint sequence number of every process can be kept consistent in all processes,so that the unnecessary checkpoints and orphan messages can be avoided in the future running.The algorithm needn’t block any process and has lower overhead than other proposed consistent coordinated checkpointing algorithms.

Keywords

Request Message Reply Message Synchronization Message Checkpoint Interval Mutable Checkpoint 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Elnozahy, E.N., Alvisi, L., Wang, Y.M., Johnson, D.B.: A Survey of Rollback-Recovery Protocols in Message-Passing Systems. ACM Computing Surveys 34(3), 375–408 (2002)CrossRefGoogle Scholar
  2. 2.
    Koo, R., Toueg, S.: Checkpointing and Rollback-Recovery for Distributed Systems. IEEE Transactions on Software Engineering 13(1), 23–31 (1987)zbMATHCrossRefGoogle Scholar
  3. 3.
    Kim, J.L., Park, T.: An Efficient Protocol for Checkpointing Recovery in Distributed Systems. IEEE Transactions on Parallel and Distributed Systems 5(8), 955–960 (1993)CrossRefGoogle Scholar
  4. 4.
    Deng, Y., Park, E.K.: Checkpointing and Rollback-Recovery Algorithms in Distributed Systems. Journal of Systems Software 4, 59–71 (1994)CrossRefGoogle Scholar
  5. 5.
    Elnozahy, E.N., Johnson, D.B., Zwaenepoel, W.: The Performance of Consistent Checkpointing. In: Proceedings of 11th Symposium on Reliable Distributed Systems, pp. 39–47. IEEE Press, Houston (1992)CrossRefGoogle Scholar
  6. 6.
    Silva, L.M., Silva, J.G.: Global Checkpointing for Distributed Programs. In: Proceedings of 11th Symposium on Reliable Distributed Systems, pp. 155–162. IEEE Press, Houston (1992)CrossRefGoogle Scholar
  7. 7.
    Helary, J.M., Netzer, R.H.B., Raynal, M.: Consistency Issues in Distributed checkpoints. IEEE Transactions on Software Engineering 25(2), 274–281 (1999)CrossRefGoogle Scholar
  8. 8.
    Helery, J.M., Mostefaoui, A., Raynal, M.: Communication-Induced Determination of Consistent Snapshots. IEEE Transactions on Parallel and Distributed Systems 10(9), 865–877 (1999)CrossRefGoogle Scholar
  9. 9.
    Netzer, R.H.B., Xu, J.: Necessary and Sufficient Conditions for Consistent Global Snapshots. IEEE Transactions on Parallel and Distributed Systems 6(2), 165–169 (1995)zbMATHCrossRefGoogle Scholar
  10. 10.
    Helary, J.M., Mostefaoui, A., Netzer, R.H.B., Raynal, M.: Preventing Useless Checkpoints in Distributed Computations. In: Proceedings of 16th Symposium on Reliable Distributed Systems, pp. 183–190. IEEE Press, Durham (1997)Google Scholar
  11. 11.
    Cao, G., Singhal, M.: Checkpointing with Mutable Checkpoints. Theoretical Computer Science 290, 1127–1148 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Prakash, R., Singhal, M.: Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems. IEEE Transactions on Parallel and Distributed System 7(10), 1035–1048 (1996)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Chaoguang Men
    • 1
    • 2
  • Xiaozong Yang
    • 1
  1. 1.School of Computer Science and TechnologyHarbin Institute of TechnologyHarbinP.R.China
  2. 2.School of Computer Science and TechnologyHarbin Engineering UniversityHarbinP.R.China

Personalised recommendations