A generalized forward recovery checkpointing scheme

  • Ke Huang
  • Jie Wu
  • Eduardo B. Fernandez
Workshop on Fault-Tolerant Parallel and Distributed Systems Dimiter Avresky, Boston University David R. Kaeli, Northeastern University
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1388)


We propose a generalized forward recovery checkpointing scheme, with lookahead execution and rollback validation. This method takes advantage of voting and comparison on multiple versions of the executing task. The proposed scheme is evaluated and compared with other existing checkpointing techniques. The processor assignment problem is studied and an optimal processor assignment is identified. Details on how to use this approach for tolerating both software and hardware faults are also discussed.

Index terms

Checkpointing fault tolerance forward recovery reliability analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1 ]
    N. S. Bowen and D. K. Pradhan, “Processor-and-Memory-Based Checkpoint Rollback Recovery”, Computer, Vol. 26, No. 2, pp.22–31, February 1993.CrossRefGoogle Scholar
  2. [2 ]
    C. M. Krishna, G. S. Kang, and Y.-H. Lee, “Optimization Criteria for Checkpoint Placement”, CALM, Vol. 27, No. 6, pp.1008–1012, October 1984.Google Scholar
  3. [3 ]
    C. C. Li and W. K. Fuchs, “CATCH: Compiler-Assisted Techniques for Checkpointing”, Proc. 20th Int'l. Symp. on Fault-Tolerant Computing Systems, pp.74–81, 1990.Google Scholar
  4. [4 ]
    J. Long, W. K. Fuchs, and J. A. Abraham, “Forward Recovery Using Checkpointing in Parallel Systems”, Proc. Int'l. Conf. Parallel Processing, pp.272–275, August 1990.Google Scholar
  5. [5 ]
    J. Long, W. K. Fuchs, and J. A. Abraham, “Implementing Forward Recovery Using Checkpoints in Distributed Systems”, Proc. IFIP Working Conf. on Dependable Comp. for Critical Appl., 1991.Google Scholar
  6. [6 ]
    D. K. Pradhan and N. H. Vaidya, “Roll-Forward Checkpointing Scheme: Concurrent Retry with Nondedicated Spares”, Proc. IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems, pp. 166–173, 1992.Google Scholar
  7. [7 ]
    A. Tantwai and M. Ruschitzka, “Performance Analysis of Checkpointing Strategies”, ACM Trans. on Computer Systems, Vol. 2, No. 2, pp.123–144, May 1984CrossRefGoogle Scholar
  8. [8 ]
    A. Ziv and J. Bruck, “Analysis of Checkpointing Schemes for Multiprocessor Systems”, Proc. of the 13th Symp. on Reliable Distributed Systems, pp. 52–61, 1994Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Ke Huang
    • 1
  • Jie Wu
    • 1
  • Eduardo B. Fernandez
    • 1
  1. 1.Department of Computer Science and EngineeringFlorida Atlantic UniversityBoca Raton

Personalised recommendations