A generalized forward recovery checkpointing scheme
We propose a generalized forward recovery checkpointing scheme, with lookahead execution and rollback validation. This method takes advantage of voting and comparison on multiple versions of the executing task. The proposed scheme is evaluated and compared with other existing checkpointing techniques. The processor assignment problem is studied and an optimal processor assignment is identified. Details on how to use this approach for tolerating both software and hardware faults are also discussed.
Index termsCheckpointing fault tolerance forward recovery reliability analysis
Unable to display preview. Download preview PDF.
- [1 ]
- [2 ]C. M. Krishna, G. S. Kang, and Y.-H. Lee, “Optimization Criteria for Checkpoint Placement”, CALM, Vol. 27, No. 6, pp.1008–1012, October 1984.Google Scholar
- [3 ]C. C. Li and W. K. Fuchs, “CATCH: Compiler-Assisted Techniques for Checkpointing”, Proc. 20th Int'l. Symp. on Fault-Tolerant Computing Systems, pp.74–81, 1990.Google Scholar
- [4 ]J. Long, W. K. Fuchs, and J. A. Abraham, “Forward Recovery Using Checkpointing in Parallel Systems”, Proc. Int'l. Conf. Parallel Processing, pp.272–275, August 1990.Google Scholar
- [5 ]J. Long, W. K. Fuchs, and J. A. Abraham, “Implementing Forward Recovery Using Checkpoints in Distributed Systems”, Proc. IFIP Working Conf. on Dependable Comp. for Critical Appl., 1991.Google Scholar
- [6 ]D. K. Pradhan and N. H. Vaidya, “Roll-Forward Checkpointing Scheme: Concurrent Retry with Nondedicated Spares”, Proc. IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems, pp. 166–173, 1992.Google Scholar
- [7 ]
- [8 ]A. Ziv and J. Bruck, “Analysis of Checkpointing Schemes for Multiprocessor Systems”, Proc. of the 13th Symp. on Reliable Distributed Systems, pp. 52–61, 1994Google Scholar