Abstract
In this paper we revisit a checkpoint/restart model by Slim et al (A new flexible 542 checkpoint/restart model, INRIA Technical Report, 6751, Centre 543 de recherche INRIA Grenoble, 2008) and derive the workload-dependent optimal checkpoint placement policies. Two cases are considered, where the system overhead parameters are independent and dependent of the cumulative workload. It is shown that the periodic and aperiodic checkpoint placement policies are always optimal in independent and dependent cases respectively, in terms of the minimization of expected total processing time. We provide the Lagrange algorithms to determine the optimal checkpoint sequences in the respective cases. Numerical examples are presented to investigate the sensitivity of system-failure parameter on the optimal checkpoint placement.
Similar content being viewed by others
References
Ansel J, Arya K, Cooperman, G (2009) DMTCP: transparent checkpointing for cluster computations and the desktop, proceedings of 2009 ieee international symposium on parallel & distributed processing, IEEE CPS
Bajunaid N, Menasce DA (2018) Efficient modeling and optimizing of checkpointing in concurrent component-based software systems. J. Syst Software 139:1–13
Bent J, Gibson G, Grider G, McClelland B, Nowoczynski P, Nunez J, Polte M and Wingate M (2009) PLFS: A checkpoint filesystem for parallel applications, Proceedings of the conference on high performance computing networking, storage and analysis (SC 2009), pp 1—12
Chandy KM (1975) A survey of analytic models of roll-back and recovery strategies. IEEE Comput 8:40–47
Chandy KM, Browne JC, Dissly CW, Uhrig WR (1975) Analytic models for rollback and recovery strategies in database systems. IEEE Transac. Softw. Eng. SE–1:100–110
Daly JT (2006) A higher order estimate of the optimum checkpoint interval for restart dumps. Future Gener Comput Syst 22:303–312
Dohi T, Aoki T, Kaio N, Osaki S (1997) Computational aspects of optimal checkpoint strategy in fault-tolerant database management. IEICE Transac Fundam Electr Commun Comput Sci (A) E80–A(10):2006–2015
Dohi T, Kaio N, Osaki S (2000) The optimal age-dependent checkpoint strategy for a stochastic system subject to general failure mode. J Math Analy Appl 249:80–94
Dohi T, Kaio N and Trivedi KS (2002a) Availability models with age dependent-checkpointing, Proceedings of the 21st symposium on reliable distributed systems (SRDS-2002), pp 130-139, IEEE CPS
Dohi T, Okamura H and Kaio N (2002b) Optimal age-dependent checkpoint strategy with retry of rollback recovery, Proceedings of the 2nd international workshop on autonomous decentralized systems (IWADS-2002), pp 113–118, IEEE CPS
Dohi T (2011) Environmental diversity techniques of software systems - from checkpoint restart to software rejuvenation -, Future Generation Information Technology (FGIT-2011) ( Kim TH, Adeli H, Slezak D, Sandnes FE, Song X, Chung KI and Arnett KP, eds.), Lecture note in computer science, vol 7105, pp 37–38, Springer-Verlag
Dohi T, Trivedi K, Avritzer A (eds) (2020) Handbook of Software Aging and Rejuvenation. World Scientific, Singapore
Hiroyama S, Dohi T, Okamura H (2013) Aperiodic checkpoint placement algorithms - survey and comparison. J Softw Eng Appl 6(4A):41–53
Hussain Z, Znati T, Melhem R (2009) Optimal placement of in-memory checkpoints under heterogeneous failure likelihoods, Proceedings of 2019 IEEE international parallel and distributed processing symposium (IPDPS)
Javaid U, Sikdar B (2021) A checkpoint enabled scalable blockchain architecture for industrial internet of things. IEEE Transac Indus Inf 17(11):7679–7687
Jayasekara S, Harwood A, Karunasekera S (2020) A utilization model for optimization of checkpoint intervals in distributed stream processing systems. Futur Gener Comput Syst 110:68–79
Levitin G, Xing L, Dai Y, Vokkarane VM (2017) Dynamic checkpointing policy in heterogeneous real-time standby systems. IEEE Transac Comput 66(8):1449–1456
Levitin G, Xing L, Luo L (2019) Joint optimal checkpointing and rejuvenation policy for real-time computing tasks. Reliab Eng Syst Safety 182:63–72
Okamura H, Iwamoto K, Dohi T (2006) A DP-based optimal checkpointing algorithm for real-time applications. Int J Reliab Qual Saf Eng 13:323–340
Okamura H, Dohi T (2010) Comprehensive evaluation of aperiodic checkpointing and rejuvenation schemes in operational software system. J Syst Softw 83:1591–1604
Ozaki T, Dohi T, Okamura H, Kaio N (2006) Distribution-free checkpoint placement algorithms based on min-max principle. IEEE Transac Dependable Secure Comput 3:130–140
Ozaki T, Dohi T, Kaio N (2009) Numerical computation algorithms for sequential checkpoint placement. Perform Eval 66:311–326
Ranganathan A, Upadhyaya SJ (1993) Performance evaluation of rollback-recovery techniques in computer programs. IEEE Transac Reliab 42:220–226
Sigdel P, Tzeng NF (2018) Coalescing and deduplicating incremental checkpoint files for restore-express multi-level checkpointing. IEEE Transac Paral Distrib Syst 29:2713–2727
Slim BM, Gautier T, Trystram D and Vincent JM (2008) A new flexible checkpoint/restart model, INRIA Technical Report, 6751, Centre de recherche INRIA Grenoble
Zhang, Y. and Chakrabarty, K (2003) Fault recovery based on checkpointing for hard real-time embedded systems, Proceedings of the 18th IEEE symposium on defect and fault tolerance in VLSI systems (DFT-2003), pp 320–327
Zheng J, Okamura H, Dohi T (2020) A phase expansion for non-Markovian availability models with time-based aperiodic rejuvenation and checkpointing. Commun Statist Theory and Methods 49:3712–3729
Zheng J, Okamura H, Dohi T (2020) Optimal rejuvenation policies for non-Markovian availability models with aperiodic checkpointing. IEICE Transac Inform Syst E103–D:2133–2142
Zheng J, Okamura H, Dohi T (2021) Availability analysis of software systems with rejuvenation and checkpointing. Mathematics 9(8):846–861
Young JW (1974) A first order approximation to the optimum checkpoint interval. Commun ACM 17:530–531
Funding
This work was partially supported by JSPS KAKENHI Grant Number JP19K04905.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dohi, T., Okamura, H. & Qian, CH. Computation algorithms for workload-dependent optimal checkpoint placement. Int J Syst Assur Eng Manag 13 (Suppl 2), 788–796 (2022). https://doi.org/10.1007/s13198-021-01522-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13198-021-01522-z