Advertisement

Performance Evaluation of Parallel Systems Employing Roll-Forward Checkpoint Schemes

  • Gyung-Leen Park
  • Hee Yong Youn
  • Junghoon Lee
  • Chul Soo Kim
  • Bongkyu Lee
  • Sang Joon Lee
  • Wang-Cheol Song
  • Yung-Cheol Byun
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3984)

Abstract

High performance and reliability are the main goals of parallel and distributed computing systems. To increase the performance and reliability of the systems, various checkpoint schemes have been proposed in the literature for decades. However, the lack of general analytical models has been an obstacle to compare the performance of systems employing different checkpoint schemes. This paper develops an analytical model to evaluate the relative response time of systems employing checkpoint schemes. The model has been applied to evaluate the relative response time of systems employing RFC (Roll-Forward Checkpoint), DMR-F (Double Modular Redundancy for Forward recovery), and DST (Duplex with Self-Test) schemes. The result shows the feasibility of the model developed in the paper.

Keywords

High Performance Computing Consecutive Interval Task Migration Duplex System Validation Task 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Park, G.-L., Youn, H.Y., Choo, H.-S.: Optimal Checkpoint Analysis Using Stochastic Petri Net. In: IEEE Pacific Rim Int. Symp. Dependable Computing, pp. 57–60 (2001)Google Scholar
  2. 2.
    Baldoni, R., Helary, J.M., Raynal, M.: Rollback-dependency trackability: A Minimal Characterization and Its Protocol. In: Inform, and Comput. (2001)Google Scholar
  3. 3.
    Gao, G., Singhal, M.: Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing Systems. IEEE Trans. Parallel Dist. Syst. 12(2), 157–172 (2001)CrossRefGoogle Scholar
  4. 4.
    Rao, S., Alvisi, L., Vin, H.M.: The Cost of Recovery in Message logging Protocols. IEEE Trans. Knowledge Data Eng. 12(2), 160–173 (2000)CrossRefGoogle Scholar
  5. 5.
    Long, J., Fuchs, W.K., Abraham, J.A.: Compiler-Assisted Static Checkpoint Insertion. In: 22nd Int. Symp. Fault-Tolerant Computing, pp. 58–65 (1992)Google Scholar
  6. 6.
    Gray, J.: Why do computers stop and what can be done about it. In: 5th Symp. Reliability in Dist. Software and Database Syst., pp. 3–12 (1986)Google Scholar
  7. 7.
    Park, G.-L., Youn, H.Y.: A New Approach for High Performance Computing Systems with Various Checkpointing Schemes. Journal of Supercomputing 33, 65–78 (2005)Google Scholar
  8. 8.
    Long, J., Fuchs, W.K., Abraham, J.A.: Forward Recovery Using Checkpointing in Parallel Systems. In: Proc. Int. Conf. Parallel Proc., pp. 272–275 (1990)Google Scholar
  9. 9.
    Pradhan, D.K., Vaidya, N.H.: Roll-forward Checkpoint Scheme: Concurrent Retry with Nondedicated Spares. In: Proc. of 1992 IEEE Workshop on Fault-Tolerant Parallel and Dist. Syst., pp. 166–174 (1992)Google Scholar
  10. 10.
    Park, G.-L., Youn, H.Y., Shirazi, B.: Duplex with Self-Test: A Roll Forward Checkpoint Scheme for High Performance Computing. In: High Performance Comp. Symp., pp. 314–319 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Gyung-Leen Park
    • 1
  • Hee Yong Youn
    • 2
  • Junghoon Lee
    • 1
  • Chul Soo Kim
    • 1
  • Bongkyu Lee
    • 1
  • Sang Joon Lee
    • 3
  • Wang-Cheol Song
    • 3
  • Yung-Cheol Byun
    • 3
  1. 1.Dept. of Computer Science and StatisticsCheju National UniversityChejuKorea
  2. 2.School of Information and Communications EngineeringSungkyunkwan UniversitySuwonKorea
  3. 3.Faculty of Telecommunication and Computer EngineeringCheju National UniversityChejuKorea

Personalised recommendations