Abstract
We present a formal characterization of fault-tolerant behaviors of computing systems via simulation relations. This formalization makes use of variations of standard simulation relations in order to compare the executions of a system that exhibits faults with executions where no faults occur; intuitively, the latter can be understood as a specification of the system and the former as a fault-tolerant implementation. By employing variations of standard simulation algorithms, our characterization enables us to algorithmically check fault-tolerance in polynomial time, i.e., to verify that a system behaves in an acceptable way even subject to the occurrence of faults. Furthermore, the use of simulation relations in this setting allows us to distinguish between the different levels of fault-tolerance exhibited by systems during their execution. We prove that each kind of simulation relation preserves a corresponding class of temporal properties expressed in CTL; more precisely, masking fault-tolerance preserves liveness and safety properties, nonmasking fault-tolerance preserves liveness properties, while failsafe fault-tolerance guarantees the preservation of safety properties. We illustrate the suitability of this formal framework through its application to standard examples of fault-tolerance.
Similar content being viewed by others
References
Attie PC, Arora A, Emerson EA (2004) Synthesis of fault-tolerant concurrent programs. ACM Trans Program Lang Syst (TOPLAS) 26(1): 125–185
Abdelouahab Z, Braga RI (2008) An adaptive train traffic controller. In: An adaptive train traffic controller. Springer, Netherlands, pp 550–555
Aminof B, Ball T, Kupferman O (2004) Reasoning about systems with transition fairness. In: Baader F, Voronkov A (eds) Logic for programming, artificial intelligence, and reasoning, 11th international conference, LPAR 2004, volume 3452 of Lecture Notes in Computer Science, pp 194–208. Springer
Abrial J-R (2006). Train systems. In: Butler MJ, Jones CB, Romanovsky A, Troubitsynalena (eds) Rigorous development of complex fault-tolerant systems [FP6 IST-511599 RODIN project], RODIN Book, volume 4157 of Lecture Notes in Computer Science, pp 1–36. Springer
Abrial J-R (2010) Modeling in event-B—system and software engineering. Cambridge University Press, Cambridge
Arora A, Gouda MG (1993) Closure and convergence: a foundation of fault-tolerant computing. IEEE Trans Softw Eng 19(11): 1015–1027
Arora A, Kulkarni SS (1998) Component based design of multitolerant systems. IEEE Trans Softw Eng 24(1): 63–78
Arora A, Kulkarni SS (1998) Detectors and correctors: a theory of fault-tolerance components. In: 18th international conference on distributed computing systems, ICDCS 1998, pp 436–443. IEEE Computer Society
Avizienis AA (1995) Software fault tolerance, volume 2, chapter the methodology of N-version programming, pp 22–45. Wiley
Banach R, Bozzano M (2006) Retrenchment, and the generation of fault trees for static, dynamic and cyclic systems. In: Computer safety, reliability, and security, 25th international conference, SAFECOMP 2006, Gdansk, Poland, 27–29 Sept 2006, Proceedings, pp 127–141
Banach R, Cross R (2004) Safety requirements and fault trees using retrenchment. In: Computer safety, reliability, and security, 23rd international conference, SAFECOMP 2004, Potsdam, Germany, 21–24 Sept, 2004, Proceedings, pp 210–223
Bernardeschi C, Fantechi A, Gnesi S (2002) Model checking fault tolerant systems. Softw Test Verif Reliab 12(4): 251–275
Back R-J, Kurki-Suonio R (1988) Distributed cooperation with action systems. ACM Trans Program Lang Syst 10(4): 513–554
Baier C, Katoen J-P (2008) Principles of model checking. MIT Press, Cambridge
Bonakdarpour B (2008) Automated revision of distributed and real-time programs. PhD thesis, Michigan State University
Banach R, Poppleton M (1998) Retrenchment: an engineering variation on refinement. In: B’98: recent advances in the development and use of the B method, second International B conference, Montpellier, France, 22–24 April 1998, Proceedings, pp 129–147
Banach R, Poppleton M (1999) Retrenchment and punctured simulation. In: Integrated formal methods, proceedings of the 1st international conference on integrated formal methods, IFM 99, York, UK, 28–29 June 1999, pp 457–476
Banach R, Poppleton M, Jeske C, Stepney S (2007) Engineering and theoretical underpinnings of retrenchment. Sci Comput Program 67(2-3): 301–329
Braun B (2006) Implementing automatic addition and verification of fault tolerance. Master’s thesis, RWTH Aachen University
Bradfield J, Stirling C (2007) 12 modal mu-calculi. In: Van Benthem J, Blackburn P, Wolter F (eds) Handbook of modal logic, volume 3 of studies in logic and practical reasoning, pp 721–756. Elsevier
Castro PF, Kilmurray C, Acosta A, Aguirre N (2011) dCTL: a branching time temporal logic for fault-tolerant system verification. In: Barthe G, Pardo A, Schneider G (eds) Software engineering and formal methods—9th international conference, SEFM 2011, volume 7041 of Lecture Notes in Computer Science, pp 106–121. Springer
Clarke EM (1999) Model checking. MIT Press, Cambridge
Chandy KM, Misra J (1989) Parallel program design—a foundation. Addison-Wesley, Reading
Castro PF, Maibaum TSE (2009) Deontic logic, contrary to duty reasoning and fault tolerance. Electr Notes Theor Comput Sci 258(2): 17–34
Cristian F (1985) A rigorous approach to fault-tolerant programming. IEEE Trans Softw Eng 11(1): 23–31
Demasi R, Castro PF, Maibaum TSE, Aguirre N (2013) Characterizing fault-tolerant systems by means of simulation relations. In: Johnsen EB, Petre L (eds) Integrated formal methods, 10th international conference, IFM 2013, volume 7940 of Lecture Notes in Computer Science, pp 428–442. Springer
Demasi R, Castro PF, Maibaum TSE, Aguirre N (2013) Synthesizing masking fault-tolerant systems from deontic specifications. In: Van Hung D, Ogawa M (eds) Automated technology for verification and analysis—11th international symposium, ATVA 2013, volume 8172 of Lecture Notes in Computer Science, pp 163–177. Springer
Demasi R, Castro PF, Ricci N, Maibaum TSE, Aguirre N (2015) syntmaskft: a tool for synthesizing masking fault-tolerant programs from deontic specifications. In: Tools and algorithms for the construction and analysis of systems—21st international conference, TACAS 2015, Held as part of the European joint conferences on theory and practice of software, ETAPS 2015, London, UK, 11–18 April, 2015. Proceedings, pp 188–193
Dijkstra EW (1976) A discipline of programming. Prentice-Hall, Englewood Cliffs
Emerson EA, Clarke EM (1980) Characterizing correctness properties of parallel programs using fixpoints. In: de Bakker JW, van Leeuwen J (eds) Automata, languages and programming, 7th colloquium, ICALP 1980, volume 85 of Lecture Notes in Computer Science, pp 169–181. Springer
Emerson EA, Halpern JY (1986) “sometimes” and “not never” revisited: on branching versus linear time temporal logic. J ACM 33(1): 151–178
Ezekiel J, Lomuscio A (2009) Combining fault injection and model checking to verify fault tolerance in multi-agent systems. In: Sierra C, Castelfranchi C, Decker KS, Sichman JS (eds) 8th international joint conference on autonomous agents and multiagent systems, AAMAS 2009, pp 113–120. IFAAMAS
Francalanza A, Hennessy M (2007) A theory for observational fault tolerance. J Log Algebr Program 73(1-2): 22–50
French T, McCabe-Dansted JC, Reynolds M (2007) A temporal logic of robustness. In: Frontiers of combining systems, 6th international symposium, FroCoS 2007, Liverpool, UK, 10–12 Sept 2007, Proceedings, pp 193–205
Gärtner FC (1999) Fundamentals of fault-tolerant distributed computing in asynchronous environments. ACM Comput Surv 31: 1–26
Guiho GD, Hennebert C (1990) SACEM software validation (experience report). In: Valette F-R, Freeman PA, Gaudel M-C (eds) Proceedings of the 12th international conference on software engineering, ICSE 1990, pp 186–191. IEEE Computer Society
Ghezzi C, Jazayeri M, Mandrioli D (2003) Fundamentals of software engineering, 2nd edn. Prentice Hall, Englewood Cliffs
Henzinger MR, Henzinger TA, Kopke PW (1995) Computing simulations on finite and infinite graphs. In: 36th annual symposium on foundations of computer science, FOCS 1995, pp 453–462. IEEE Computer Society
Janowski T (1995) Bisimulation and fault-tolerance. PhD thesis, University of Warwick, United Kingdom
Janowski T (1997) On bisimulation, fault-monotonicity and provable fault-tolerance. In: Johnson M (ed) Algebraic methodology and software technology, 6th international conference, AMAST 1997, volume 1349 of Lecture Notes in Computer Science, pp 292–306. Springer
Jeffords RD, Heitmeyer CL, Archer M, Leonard EI (2009) A formal method for developing provably correct fault-tolerant systems using partial refinement and composition. In: Cavalcanti A, Dams D (eds) Formal methods, second world congress, FM 2009, volume 5850 of Lecture Notes in Computer Science, pp 173–189. Springer
Jeffords RD, Heitmeyer CL, Archer M, Leonard EI (2010) Model-based construction and verification of critical systems using composition and partial refinement. Formal Methods Syst Des 37(2-3): 265–294
Lee PA, Anderson T (1990) Fault tolerance: principles and practice, 2nd edn. Springer, Secaucus
Lamport L (1985) Solved problems, unsolved problems and non-problems in concurrency. Oper Syst Rev 19(4): 34–44
Lamport L (1994) The temporal logic of actions. ACM Trans Program Lang Syst 16(3): 872–923
Lamport L, Shostak RE, Pease MC (1982) The Byzantine generals problem. ACM Trans Program Lang Syst 4(3): 382–401
Mead C, Conway L (1980) Introduction to VLSI systems. Addison-Wesley, Reading
Milner R (1989) Communication and concurrency. PHI series in computer science. Prentice Hall, Upper Saddle River
Manolios P, Trefler RJ (2001) Safety and liveness in branching time. In: 16th annual IEEE symposium on logic in computer science, Boston, Massachusetts, USA, 16–19 June 2001, Proceedings, pp 366–374
Pnueli A, Rosner R (1989) On the synthesis of a reactive module. In: Sixteenth annual ACM symposium on principles of programming languages, POPL 1989, pp 179–190. ACM Press
Prasetya ISWB, Swierstra SD (2005) Formal design of self-stabilizing programs. J High Speed Netw 14(1): 59–83
Schneider F, Easterbrook SM, Callahan JR, Holzmann GJ (1998) Validating requirements for fault tolerant systems using model checking. In: 3rd international conference on requirements engineering, ICRE 1998, pp 4–13. IEEE Computer Society
Siewiorek DP, Swarz RS (1998) Reliable computer systems, 3rd edn: Design and evaluation. A. K. Peters, Ltd., Natick
Torres-Pomales W (2000) Software fault tolerance: a tutorial. Technical report, NASA Technical Memorandum TM-2000-210616
Yokogawa T, Tsuchiya T, Kikuno T (2001) Automatic verification of fault tolerance using model checking. In: 8th Pacific rim international symposium on dependable computing, PRDC 2001, pp 95–102. IEEE Computer Society
Author information
Authors and Affiliations
Corresponding author
Additional information
Michael J. Butler
Rights and permissions
About this article
Cite this article
Demasi, R., Castro, P.F., Maibaum, T.S.E. et al. Simulation relations for fault-tolerance. Form Asp Comp 29, 1013–1050 (2017). https://doi.org/10.1007/s00165-017-0426-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00165-017-0426-2