Skip to main content
Log in

Computation algorithms for workload-dependent optimal checkpoint placement

  • Original article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

In this paper we revisit a checkpoint/restart model by Slim et al (A new flexible 542 checkpoint/restart model, INRIA Technical Report, 6751, Centre 543 de recherche INRIA Grenoble, 2008) and derive the workload-dependent optimal checkpoint placement policies. Two cases are considered, where the system overhead parameters are independent and dependent of the cumulative workload. It is shown that the periodic and aperiodic checkpoint placement policies are always optimal in independent and dependent cases respectively, in terms of the minimization of expected total processing time. We provide the Lagrange algorithms to determine the optimal checkpoint sequences in the respective cases. Numerical examples are presented to investigate the sensitivity of system-failure parameter on the optimal checkpoint placement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Ansel J, Arya K, Cooperman, G (2009) DMTCP: transparent checkpointing for cluster computations and the desktop, proceedings of 2009 ieee international symposium on parallel & distributed processing, IEEE CPS

  • Bajunaid N, Menasce DA (2018) Efficient modeling and optimizing of checkpointing in concurrent component-based software systems. J. Syst Software 139:1–13

    Article  Google Scholar 

  • Bent J, Gibson G, Grider G, McClelland B, Nowoczynski P, Nunez J, Polte M and Wingate M (2009) PLFS: A checkpoint filesystem for parallel applications, Proceedings of the conference on high performance computing networking, storage and analysis (SC 2009), pp 1—12

  • Chandy KM (1975) A survey of analytic models of roll-back and recovery strategies. IEEE Comput 8:40–47

    Article  Google Scholar 

  • Chandy KM, Browne JC, Dissly CW, Uhrig WR (1975) Analytic models for rollback and recovery strategies in database systems. IEEE Transac. Softw. Eng. SE–1:100–110

    Article  Google Scholar 

  • Daly JT (2006) A higher order estimate of the optimum checkpoint interval for restart dumps. Future Gener Comput Syst 22:303–312

    Article  Google Scholar 

  • Dohi T, Aoki T, Kaio N, Osaki S (1997) Computational aspects of optimal checkpoint strategy in fault-tolerant database management. IEICE Transac Fundam Electr Commun Comput Sci (A) E80–A(10):2006–2015

    Google Scholar 

  • Dohi T, Kaio N, Osaki S (2000) The optimal age-dependent checkpoint strategy for a stochastic system subject to general failure mode. J Math Analy Appl 249:80–94

    Article  MathSciNet  Google Scholar 

  • Dohi T, Kaio N and Trivedi KS (2002a) Availability models with age dependent-checkpointing, Proceedings of the 21st symposium on reliable distributed systems (SRDS-2002), pp 130-139, IEEE CPS

  • Dohi T, Okamura H and Kaio N (2002b) Optimal age-dependent checkpoint strategy with retry of rollback recovery, Proceedings of the 2nd international workshop on autonomous decentralized systems (IWADS-2002), pp 113–118, IEEE CPS

  • Dohi T (2011) Environmental diversity techniques of software systems - from checkpoint restart to software rejuvenation -, Future Generation Information Technology (FGIT-2011) ( Kim TH, Adeli H, Slezak D, Sandnes FE, Song X, Chung KI and Arnett KP, eds.), Lecture note in computer science, vol 7105, pp 37–38, Springer-Verlag

  • Dohi T, Trivedi K, Avritzer A (eds) (2020) Handbook of Software Aging and Rejuvenation. World Scientific, Singapore

    Google Scholar 

  • Hiroyama S, Dohi T, Okamura H (2013) Aperiodic checkpoint placement algorithms - survey and comparison. J Softw Eng Appl 6(4A):41–53

    Article  Google Scholar 

  • Hussain Z, Znati T, Melhem R (2009) Optimal placement of in-memory checkpoints under heterogeneous failure likelihoods, Proceedings of 2019 IEEE international parallel and distributed processing symposium (IPDPS)

  • Javaid U, Sikdar B (2021) A checkpoint enabled scalable blockchain architecture for industrial internet of things. IEEE Transac Indus Inf 17(11):7679–7687

    Article  Google Scholar 

  • Jayasekara S, Harwood A, Karunasekera S (2020) A utilization model for optimization of checkpoint intervals in distributed stream processing systems. Futur Gener Comput Syst 110:68–79

    Article  Google Scholar 

  • Levitin G, Xing L, Dai Y, Vokkarane VM (2017) Dynamic checkpointing policy in heterogeneous real-time standby systems. IEEE Transac Comput 66(8):1449–1456

    Article  MathSciNet  Google Scholar 

  • Levitin G, Xing L, Luo L (2019) Joint optimal checkpointing and rejuvenation policy for real-time computing tasks. Reliab Eng Syst Safety 182:63–72

    Article  Google Scholar 

  • Okamura H, Iwamoto K, Dohi T (2006) A DP-based optimal checkpointing algorithm for real-time applications. Int J Reliab Qual Saf Eng 13:323–340

    Article  Google Scholar 

  • Okamura H, Dohi T (2010) Comprehensive evaluation of aperiodic checkpointing and rejuvenation schemes in operational software system. J Syst Softw 83:1591–1604

    Article  Google Scholar 

  • Ozaki T, Dohi T, Okamura H, Kaio N (2006) Distribution-free checkpoint placement algorithms based on min-max principle. IEEE Transac Dependable Secure Comput 3:130–140

    Article  Google Scholar 

  • Ozaki T, Dohi T, Kaio N (2009) Numerical computation algorithms for sequential checkpoint placement. Perform Eval 66:311–326

    Article  Google Scholar 

  • Ranganathan A, Upadhyaya SJ (1993) Performance evaluation of rollback-recovery techniques in computer programs. IEEE Transac Reliab 42:220–226

    Article  Google Scholar 

  • Sigdel P, Tzeng NF (2018) Coalescing and deduplicating incremental checkpoint files for restore-express multi-level checkpointing. IEEE Transac Paral Distrib Syst 29:2713–2727

    Article  Google Scholar 

  • Slim BM, Gautier T, Trystram D and Vincent JM (2008) A new flexible checkpoint/restart model, INRIA Technical Report, 6751, Centre de recherche INRIA Grenoble

  • Zhang, Y. and Chakrabarty, K (2003) Fault recovery based on checkpointing for hard real-time embedded systems, Proceedings of the 18th IEEE symposium on defect and fault tolerance in VLSI systems (DFT-2003), pp 320–327

  • Zheng J, Okamura H, Dohi T (2020) A phase expansion for non-Markovian availability models with time-based aperiodic rejuvenation and checkpointing. Commun Statist Theory and Methods 49:3712–3729

    Article  MathSciNet  Google Scholar 

  • Zheng J, Okamura H, Dohi T (2020) Optimal rejuvenation policies for non-Markovian availability models with aperiodic checkpointing. IEICE Transac Inform Syst E103–D:2133–2142

    Article  Google Scholar 

  • Zheng J, Okamura H, Dohi T (2021) Availability analysis of software systems with rejuvenation and checkpointing. Mathematics 9(8):846–861

    Article  Google Scholar 

  • Young JW (1974) A first order approximation to the optimum checkpoint interval. Commun ACM 17:530–531

    Article  Google Scholar 

Download references

Funding

This work was partially supported by JSPS KAKENHI Grant Number JP19K04905.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tadashi Dohi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dohi, T., Okamura, H. & Qian, CH. Computation algorithms for workload-dependent optimal checkpoint placement. Int J Syst Assur Eng Manag 13 (Suppl 2), 788–796 (2022). https://doi.org/10.1007/s13198-021-01522-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-021-01522-z

Keywords

Navigation