Performance Evaluation of Parallel Systems Employing Roll-Forward Checkpoint Schemes
High performance and reliability are the main goals of parallel and distributed computing systems. To increase the performance and reliability of the systems, various checkpoint schemes have been proposed in the literature for decades. However, the lack of general analytical models has been an obstacle to compare the performance of systems employing different checkpoint schemes. This paper develops an analytical model to evaluate the relative response time of systems employing checkpoint schemes. The model has been applied to evaluate the relative response time of systems employing RFC (Roll-Forward Checkpoint), DMR-F (Double Modular Redundancy for Forward recovery), and DST (Duplex with Self-Test) schemes. The result shows the feasibility of the model developed in the paper.
KeywordsHigh Performance Computing Consecutive Interval Task Migration Duplex System Validation Task
Unable to display preview. Download preview PDF.
- 1.Park, G.-L., Youn, H.Y., Choo, H.-S.: Optimal Checkpoint Analysis Using Stochastic Petri Net. In: IEEE Pacific Rim Int. Symp. Dependable Computing, pp. 57–60 (2001)Google Scholar
- 2.Baldoni, R., Helary, J.M., Raynal, M.: Rollback-dependency trackability: A Minimal Characterization and Its Protocol. In: Inform, and Comput. (2001)Google Scholar
- 5.Long, J., Fuchs, W.K., Abraham, J.A.: Compiler-Assisted Static Checkpoint Insertion. In: 22nd Int. Symp. Fault-Tolerant Computing, pp. 58–65 (1992)Google Scholar
- 6.Gray, J.: Why do computers stop and what can be done about it. In: 5th Symp. Reliability in Dist. Software and Database Syst., pp. 3–12 (1986)Google Scholar
- 7.Park, G.-L., Youn, H.Y.: A New Approach for High Performance Computing Systems with Various Checkpointing Schemes. Journal of Supercomputing 33, 65–78 (2005)Google Scholar
- 8.Long, J., Fuchs, W.K., Abraham, J.A.: Forward Recovery Using Checkpointing in Parallel Systems. In: Proc. Int. Conf. Parallel Proc., pp. 272–275 (1990)Google Scholar
- 9.Pradhan, D.K., Vaidya, N.H.: Roll-forward Checkpoint Scheme: Concurrent Retry with Nondedicated Spares. In: Proc. of 1992 IEEE Workshop on Fault-Tolerant Parallel and Dist. Syst., pp. 166–174 (1992)Google Scholar
- 10.Park, G.-L., Youn, H.Y., Shirazi, B.: Duplex with Self-Test: A Roll Forward Checkpoint Scheme for High Performance Computing. In: High Performance Comp. Symp., pp. 314–319 (1996)Google Scholar