An Efficient Computing-Checkpoint Based Coordinated Checkpoint Algorithm

  • Men Chaoguang
  • Wang Dongsheng
  • Zhao Yunlong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4096)


In this paper, the concept of “computing checkpoint” is introduced, and then an efficient coordinated checkpoint algorithm is proposed. The algorithm combines the two approaches of reducing the overhead associated with coordinated checkpointing, which one is to minimize the processes which take checkpoints and the other is to make the checkpointing process non-blocking. Through piggybacking the information including which processes have taken new checkpoint in the broadcast committing message, the checkpoint sequence number of every process can be kept consistent in all processes, so that the unnecessary checkpoints and orphan messages can be avoided in the future running. Evaluation result shows that the number of redundant computing checkpoints is less than 1/10 of the number of tentative checkpoints. Analyses and experiments show that the overhead of our algorithm is lower than that of other coordinated checkpoint algorithms.


Dependent Relation Request Message Residuary Weight Checkpoint Interval Mutable Checkpoint 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Elnozahy, E.N., Alvisi, L., Wang, Y.M., Johnson, D.B.: A Survey of Rollback-Recovery Protocols in Message-Passing Systems. ACM Computing Surveys 34(3), 375–408 (2002)CrossRefGoogle Scholar
  2. 2.
    Kalaiselvi, S., Rajaramana, V.: A Survey of Checkpointing Algorithms for Parallel and Distributed Computers. Sadhana Academy Proceedings in Engineering Sciences 25(5), 489–510 (2000)Google Scholar
  3. 3.
    Koo, R., Toueg, S.: Checkpointing and Rollback-Recovery for Distributed Systems. IEEE Transactions on Software Engineering 13, 23–31 (1987)MATHCrossRefGoogle Scholar
  4. 4.
    Kim, J.L., Park, T.: An Efficient Protocol for Checkpointing Recovery in Distributed Systems. IEEE Transactions on Parallel and Distributed Systems 5(8), 955–960 (1993)CrossRefGoogle Scholar
  5. 5.
    Deng, Y., Park, E.K.: Checkpointing and Rollback-Recovery Algorithms in Distributed Systems. Journal of Systems Software 4, 59–71 (1994)CrossRefGoogle Scholar
  6. 6.
    Guohong, C., Singhal, M.: On the Impossibility of Min-Process Non-Blocking Checkpointing and an Efficient Checkpointing Algorithm for Mobile Computing Systems. In: Proceedings of the 27th int’l International Conference on Parallel Processing, Minneapolis, USA, pp. 37–44 (1998)Google Scholar
  7. 7.
    Elnozahy, E.N., Johnson, D.B., Zwaenepoel, W.: The Performance of Consistent Checkpointing. In: Proceedings of the 11th Symposium on Reliable Distributed Systems, Houston, pp. 39–47 (1992)Google Scholar
  8. 8.
    Silva, L.M., Silva, J.G.: Global Checkpointing for Distributed Programs. In: Proceedings of the 11th Symposium on Reliable Distributed Systems, Houston, pp. 155–162 (1992)Google Scholar
  9. 9.
    Helery, J.M., Mostefaoui, A., Raynal, M.: Communication-Induced Determination of Consistent Snapshots. IEEE Transactions on Parallel and Distributed Systems 10(9), 865–877 (1999)CrossRefGoogle Scholar
  10. 10.
    Helary, J.M., Mostefaoui, A., Netzer, R.H.B., Raynal, M.: Preventing Useless Checkpoints in Distributed Computations. In: Proceedings of the 16th Symposium on Reliable Distributed Systems, pp. 183–190 (1997)Google Scholar
  11. 11.
    Guohong, C., Singhal, M.: Checkpointing with Mutable Checkpoints. Theoretical Computer Science 290, 1127–1148 (2003)MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Prakash, R., Singhal, M.: Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems. IEEE Transactions on Parallel Distributed System 7(10), 1035–1048 (1996)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Men Chaoguang
    • 1
    • 2
  • Wang Dongsheng
    • 1
    • 2
  • Zhao Yunlong
    • 2
  1. 1.Department of Computer Science and TechnologyTsinghua UniversityBeijingP.R. China
  2. 2.Research Center of High Dependability Computing TechnologyHarbin Engineering UniversityHarbin, HeilongjiangP.R. China

Personalised recommendations