Equidistant Checkpoint Placement for Checkpointing and Rollback Recovery
To derive the proper equidistant checkpoint interval for log-based checkpointing and rollback recovery mechanism, a directed state transition model of the system execution is presented under the assumption that the inter-failure time follows the exponential distribution. Various related essential factors are considered synthetically in this model. Combined with Laplace transform, the fault-tolerant overhead ratio is derived by evaluating the expected total execution overhead of a single checkpoint interval. Finally, the optimal equidistant checkpoint interval can be obtained. The metrics show that the derived formula is more practical to determine the checkpoint placement for log-based fault-tolerant performance optimization and the degenerated formula agrees with the previous model.
The authors would like to thank the anonymous reviewers and the editor for carefully reading the chapter and for their great help in improving the chapter.
- 9.Daly JT (2004) A strategy for running large scale applications based on a model that optimizes the checkpoint interval for restart dumps. In: Proc. 26th international conf. on software engineering. Edinburgh, Scotland, UK, pp 70–74Google Scholar
- 16.Dohi T, Ozaki T, Kaio N (2006) Optimal checkpoint placement with equality constraints. In: Proc. 2nd IEEE international symposium on dependable, autonomic and secure computing, DASC 2006. pp 77–84Google Scholar
- 18.Liu Y, Nassa R, Leangsuksun C et al (2007) A reliability-aware approach for an optimal checkpoint/restart model in HPC environments. In: Proc. 2007 I.E. international conf. on cluster computing. pp 452–457Google Scholar