Skip to main content
Log in

Simulation relations for fault-tolerance

  • Original Article
  • Published:
Formal Aspects of Computing

Abstract

We present a formal characterization of fault-tolerant behaviors of computing systems via simulation relations. This formalization makes use of variations of standard simulation relations in order to compare the executions of a system that exhibits faults with executions where no faults occur; intuitively, the latter can be understood as a specification of the system and the former as a fault-tolerant implementation. By employing variations of standard simulation algorithms, our characterization enables us to algorithmically check fault-tolerance in polynomial time, i.e., to verify that a system behaves in an acceptable way even subject to the occurrence of faults. Furthermore, the use of simulation relations in this setting allows us to distinguish between the different levels of fault-tolerance exhibited by systems during their execution. We prove that each kind of simulation relation preserves a corresponding class of temporal properties expressed in CTL; more precisely, masking fault-tolerance preserves liveness and safety properties, nonmasking fault-tolerance preserves liveness properties, while failsafe fault-tolerance guarantees the preservation of safety properties. We illustrate the suitability of this formal framework through its application to standard examples of fault-tolerance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Attie PC, Arora A, Emerson EA (2004) Synthesis of fault-tolerant concurrent programs. ACM Trans Program Lang Syst (TOPLAS) 26(1): 125–185

    Article  MATH  Google Scholar 

  2. Abdelouahab Z, Braga RI (2008) An adaptive train traffic controller. In: An adaptive train traffic controller. Springer, Netherlands, pp 550–555

  3. Aminof B, Ball T, Kupferman O (2004) Reasoning about systems with transition fairness. In: Baader F, Voronkov A (eds) Logic for programming, artificial intelligence, and reasoning, 11th international conference, LPAR 2004, volume 3452 of Lecture Notes in Computer Science, pp 194–208. Springer

  4. Abrial J-R (2006). Train systems. In: Butler MJ, Jones CB, Romanovsky A, Troubitsynalena (eds) Rigorous development of complex fault-tolerant systems [FP6 IST-511599 RODIN project], RODIN Book, volume 4157 of Lecture Notes in Computer Science, pp 1–36. Springer

  5. Abrial J-R (2010) Modeling in event-B—system and software engineering. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  6. Arora A, Gouda MG (1993) Closure and convergence: a foundation of fault-tolerant computing. IEEE Trans Softw Eng 19(11): 1015–1027

    Article  Google Scholar 

  7. Arora A, Kulkarni SS (1998) Component based design of multitolerant systems. IEEE Trans Softw Eng 24(1): 63–78

    Article  Google Scholar 

  8. Arora A, Kulkarni SS (1998) Detectors and correctors: a theory of fault-tolerance components. In: 18th international conference on distributed computing systems, ICDCS 1998, pp 436–443. IEEE Computer Society

  9. Avizienis AA (1995) Software fault tolerance, volume 2, chapter the methodology of N-version programming, pp 22–45. Wiley

  10. Banach R, Bozzano M (2006) Retrenchment, and the generation of fault trees for static, dynamic and cyclic systems. In: Computer safety, reliability, and security, 25th international conference, SAFECOMP 2006, Gdansk, Poland, 27–29 Sept 2006, Proceedings, pp 127–141

  11. Banach R, Cross R (2004) Safety requirements and fault trees using retrenchment. In: Computer safety, reliability, and security, 23rd international conference, SAFECOMP 2004, Potsdam, Germany, 21–24 Sept, 2004, Proceedings, pp 210–223

  12. Bernardeschi C, Fantechi A, Gnesi S (2002) Model checking fault tolerant systems. Softw Test Verif Reliab 12(4): 251–275

    Article  Google Scholar 

  13. Back R-J, Kurki-Suonio R (1988) Distributed cooperation with action systems. ACM Trans Program Lang Syst 10(4): 513–554

    Article  MATH  Google Scholar 

  14. Baier C, Katoen J-P (2008) Principles of model checking. MIT Press, Cambridge

    MATH  Google Scholar 

  15. Bonakdarpour B (2008) Automated revision of distributed and real-time programs. PhD thesis, Michigan State University

  16. Banach R, Poppleton M (1998) Retrenchment: an engineering variation on refinement. In: B’98: recent advances in the development and use of the B method, second International B conference, Montpellier, France, 22–24 April 1998, Proceedings, pp 129–147

  17. Banach R, Poppleton M (1999) Retrenchment and punctured simulation. In: Integrated formal methods, proceedings of the 1st international conference on integrated formal methods, IFM 99, York, UK, 28–29 June 1999, pp 457–476

  18. Banach R, Poppleton M, Jeske C, Stepney S (2007) Engineering and theoretical underpinnings of retrenchment. Sci Comput Program 67(2-3): 301–329

    Article  MATH  MathSciNet  Google Scholar 

  19. Braun B (2006) Implementing automatic addition and verification of fault tolerance. Master’s thesis, RWTH Aachen University

  20. Bradfield J, Stirling C (2007) 12 modal mu-calculi. In: Van Benthem J, Blackburn P, Wolter F (eds) Handbook of modal logic, volume 3 of studies in logic and practical reasoning, pp 721–756. Elsevier

  21. Castro PF, Kilmurray C, Acosta A, Aguirre N (2011) dCTL: a branching time temporal logic for fault-tolerant system verification. In: Barthe G, Pardo A, Schneider G (eds) Software engineering and formal methods—9th international conference, SEFM 2011, volume 7041 of Lecture Notes in Computer Science, pp 106–121. Springer

  22. Clarke EM (1999) Model checking. MIT Press, Cambridge

    Google Scholar 

  23. Chandy KM, Misra J (1989) Parallel program design—a foundation. Addison-Wesley, Reading

  24. Castro PF, Maibaum TSE (2009) Deontic logic, contrary to duty reasoning and fault tolerance. Electr Notes Theor Comput Sci 258(2): 17–34

    Article  MATH  Google Scholar 

  25. Cristian F (1985) A rigorous approach to fault-tolerant programming. IEEE Trans Softw Eng 11(1): 23–31

    Article  Google Scholar 

  26. Demasi R, Castro PF, Maibaum TSE, Aguirre N (2013) Characterizing fault-tolerant systems by means of simulation relations. In: Johnsen EB, Petre L (eds) Integrated formal methods, 10th international conference, IFM 2013, volume 7940 of Lecture Notes in Computer Science, pp 428–442. Springer

  27. Demasi R, Castro PF, Maibaum TSE, Aguirre N (2013) Synthesizing masking fault-tolerant systems from deontic specifications. In: Van Hung D, Ogawa M (eds) Automated technology for verification and analysis—11th international symposium, ATVA 2013, volume 8172 of Lecture Notes in Computer Science, pp 163–177. Springer

  28. Demasi R, Castro PF, Ricci N, Maibaum TSE, Aguirre N (2015) syntmaskft: a tool for synthesizing masking fault-tolerant programs from deontic specifications. In: Tools and algorithms for the construction and analysis of systems—21st international conference, TACAS 2015, Held as part of the European joint conferences on theory and practice of software, ETAPS 2015, London, UK, 11–18 April, 2015. Proceedings, pp 188–193

  29. Dijkstra EW (1976) A discipline of programming. Prentice-Hall, Englewood Cliffs

    MATH  Google Scholar 

  30. Emerson EA, Clarke EM (1980) Characterizing correctness properties of parallel programs using fixpoints. In: de Bakker JW, van Leeuwen J (eds) Automata, languages and programming, 7th colloquium, ICALP 1980, volume 85 of Lecture Notes in Computer Science, pp 169–181. Springer

  31. Emerson EA, Halpern JY (1986) “sometimes” and “not never” revisited: on branching versus linear time temporal logic. J ACM 33(1): 151–178

    Article  MATH  MathSciNet  Google Scholar 

  32. Ezekiel J, Lomuscio A (2009) Combining fault injection and model checking to verify fault tolerance in multi-agent systems. In: Sierra C, Castelfranchi C, Decker KS, Sichman JS (eds) 8th international joint conference on autonomous agents and multiagent systems, AAMAS 2009, pp 113–120. IFAAMAS

  33. Francalanza A, Hennessy M (2007) A theory for observational fault tolerance. J Log Algebr Program 73(1-2): 22–50

    Article  MATH  MathSciNet  Google Scholar 

  34. French T, McCabe-Dansted JC, Reynolds M (2007) A temporal logic of robustness. In: Frontiers of combining systems, 6th international symposium, FroCoS 2007, Liverpool, UK, 10–12 Sept 2007, Proceedings, pp 193–205

  35. Gärtner FC (1999) Fundamentals of fault-tolerant distributed computing in asynchronous environments. ACM Comput Surv 31: 1–26

    Article  Google Scholar 

  36. Guiho GD, Hennebert C (1990) SACEM software validation (experience report). In: Valette F-R, Freeman PA, Gaudel M-C (eds) Proceedings of the 12th international conference on software engineering, ICSE 1990, pp 186–191. IEEE Computer Society

  37. Ghezzi C, Jazayeri M, Mandrioli D (2003) Fundamentals of software engineering, 2nd edn. Prentice Hall, Englewood Cliffs

  38. Henzinger MR, Henzinger TA, Kopke PW (1995) Computing simulations on finite and infinite graphs. In: 36th annual symposium on foundations of computer science, FOCS 1995, pp 453–462. IEEE Computer Society

  39. Janowski T (1995) Bisimulation and fault-tolerance. PhD thesis, University of Warwick, United Kingdom

  40. Janowski T (1997) On bisimulation, fault-monotonicity and provable fault-tolerance. In: Johnson M (ed) Algebraic methodology and software technology, 6th international conference, AMAST 1997, volume 1349 of Lecture Notes in Computer Science, pp 292–306. Springer

  41. Jeffords RD, Heitmeyer CL, Archer M, Leonard EI (2009) A formal method for developing provably correct fault-tolerant systems using partial refinement and composition. In: Cavalcanti A, Dams D (eds) Formal methods, second world congress, FM 2009, volume 5850 of Lecture Notes in Computer Science, pp 173–189. Springer

  42. Jeffords RD, Heitmeyer CL, Archer M, Leonard EI (2010) Model-based construction and verification of critical systems using composition and partial refinement. Formal Methods Syst Des 37(2-3): 265–294

    Article  MATH  Google Scholar 

  43. Lee PA, Anderson T (1990) Fault tolerance: principles and practice, 2nd edn. Springer, Secaucus

  44. Lamport L (1985) Solved problems, unsolved problems and non-problems in concurrency. Oper Syst Rev 19(4): 34–44

    Article  Google Scholar 

  45. Lamport L (1994) The temporal logic of actions. ACM Trans Program Lang Syst 16(3): 872–923

    Article  Google Scholar 

  46. Lamport L, Shostak RE, Pease MC (1982) The Byzantine generals problem. ACM Trans Program Lang Syst 4(3): 382–401

    Article  MATH  Google Scholar 

  47. Mead C, Conway L (1980) Introduction to VLSI systems. Addison-Wesley, Reading

  48. Milner R (1989) Communication and concurrency. PHI series in computer science. Prentice Hall, Upper Saddle River

  49. Manolios P, Trefler RJ (2001) Safety and liveness in branching time. In: 16th annual IEEE symposium on logic in computer science, Boston, Massachusetts, USA, 16–19 June 2001, Proceedings, pp 366–374

  50. Pnueli A, Rosner R (1989) On the synthesis of a reactive module. In: Sixteenth annual ACM symposium on principles of programming languages, POPL 1989, pp 179–190. ACM Press

  51. Prasetya ISWB, Swierstra SD (2005) Formal design of self-stabilizing programs. J High Speed Netw 14(1): 59–83

    Google Scholar 

  52. Schneider F, Easterbrook SM, Callahan JR, Holzmann GJ (1998) Validating requirements for fault tolerant systems using model checking. In: 3rd international conference on requirements engineering, ICRE 1998, pp 4–13. IEEE Computer Society

  53. Siewiorek DP, Swarz RS (1998) Reliable computer systems, 3rd edn: Design and evaluation. A. K. Peters, Ltd., Natick

  54. Torres-Pomales W (2000) Software fault tolerance: a tutorial. Technical report, NASA Technical Memorandum TM-2000-210616

  55. Yokogawa T, Tsuchiya T, Kikuno T (2001) Automatic verification of fault tolerance using model checking. In: 8th Pacific rim international symposium on dependable computing, PRDC 2001, pp 95–102. IEEE Computer Society

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pablo F. Castro.

Additional information

Michael J. Butler

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Demasi, R., Castro, P.F., Maibaum, T.S.E. et al. Simulation relations for fault-tolerance. Form Asp Comp 29, 1013–1050 (2017). https://doi.org/10.1007/s00165-017-0426-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00165-017-0426-2

Keywords

Navigation