Automating the addition of fault tolerance with discrete controller synthesis

  • Alain GiraultEmail author
  • Éric Rutten


Discrete controller synthesis (DCS) is a formal approach, based on the same state-space exploration algorithms as model-checking. Its interest lies in the ability to obtain automatically systems satisfying by construction formal properties specified a priori. In this paper, our aim is to demonstrate the feasibility of this approach for fault tolerance. We start with a fault intolerant program, modeled as the synchronous parallel composition of finite labeled transition systems; we specify formally a fault hypothesis; we state some fault tolerance requirements; and we use DCS to obtain automatically a program, having the same behavior as the initial fault intolerant one in the absence of faults, and satisfying the fault tolerance requirements under the fault hypothesis. Our original contribution resides in the demonstration that DCS can be elegantly used to design fault tolerant systems, with guarantees on key properties of the obtained system, such as the fault tolerance level, the satisfaction of quantitative constraints, and so on. We show with numerous examples taken from case studies that our method can address different kinds of failures (crash, value, or Byzantine) affecting different kinds of hardware components (processors, communication links, actuators, or sensors). Besides, we show that our method also offers an optimality criterion very useful to synthesize fault tolerant systems compliant to the constraints of embedded systems, like power consumption.


Fault tolerant systems Discrete controller synthesis Automatic fault tolerance 


  1. 1.
    Altisen K, Clodic A, Maraninchi F, Rutten E (2003) Using controller-synthesis techniques to build property-enforcing layers. In: Proceedings of the European symposium on programming, ESOP’03, Warsaw, Poland, April 2003. LNCS, vol 2618 Google Scholar
  2. 2.
    Altisen K, Gössler G, Sifakis J (2002) Scheduler modeling based on the controller synthesis paradigm. J Real-Time Syst 23(1/2):55–84 zbMATHCrossRefGoogle Scholar
  3. 3.
    Attie PC, Arora A, Emerson EA (2004) Synthesis of fault-tolerant concurrent programs. ACM Trans Program Lang Syst 26(1):125–185 CrossRefGoogle Scholar
  4. 4.
    Avizienis A, Laprie J-C, Randell B (2004) Dependability and its threats: a taxonomy. In: IFIP world computer congress, Toulouse, France, August 2004. Kluwer Academic, Norvell, pp 91–120 Google Scholar
  5. 5.
    Bellman R (1957) Dynamic programming. Princeton University Press, Princeton Google Scholar
  6. 6.
    Benveniste A, Caspi P, Edwards SA, Halbwachs N, Le Guernic P, de Simone R (2003) The synchronous languages twelve years later. Proc IEEE 91(1):64–83. Special issue on embedded systems CrossRefGoogle Scholar
  7. 7.
    Bernardeschi C, Fantechi A, Simoncini L (2000) Formally verifying fault tolerant system designs. Comput J 43(3) Google Scholar
  8. 8.
    Bonakdarpour B, Kulkarni SS (2007) Exploiting symbolic techniques in automated synthesis of distributed programs with large state space. In: International conference on distributed computing systems, ICDCS’07, Toronto, Canada, June 2007 Google Scholar
  9. 9.
    Bonakdarpour B, Kulkarni SS (2008) Revising distributed UNITY programs is NP-complete. In: International Conference on Principles of Distributed Systems, OPODIS’08, Luxor, Egypt, December 2008. LNCS, vol 5401. Springer, New York, pp 408–427 Google Scholar
  10. 10.
    Bonakdarpour B, Kulkarni SS (2008) SYCRAFT: A tool for synthesizing distributed fault-tolerant programs. In: International conference on concurrency theory, CONCUR’08, Toronto, Canada, August 2008. LNCS, vol 5201. Springer, New York, pp 167–171. Tool paper Google Scholar
  11. 11.
    Brière D, Ribot D, Pilaud D, Camus J-L (1994) Methods and specifications tools for Airbus on-board systems. In: Avionics conference and exhibition, London, UK, December 1994. ERA Technology Google Scholar
  12. 12.
    Brinis N (2005) Synthèse d’un contrôleur pour le problème des généraux byzantins. Master’s Report, École Nationale des Sciences de l’Informatique, La Manouba, Tunisie, July 2005 Google Scholar
  13. 13.
    Bruns G, Sutherland I (1997) Model checking and fault tolerance. In: International conference on algebraic methodology and software technology, AMAST’97, Sidney, Australia, 1997 Google Scholar
  14. 14.
    Bryant RE (1986) Graph-based algorithms for boolean function manipulation. IEEE Trans Comput C-35(8):677–691 CrossRefGoogle Scholar
  15. 15.
    Caspi P, Girault A, Pilaud D (1999) Automatic distribution of reactive systems for asynchronous networks of processors. IEEE Trans Softw Eng 25(3):416–427 CrossRefGoogle Scholar
  16. 16.
    Cassez F, David A, Fleury E, Larsen KG, Lime D (2005) Efficient on-the-fly algorithms for the analysis of timed games. In: International conference on concurrency theory, CONCUR’05, San Francisco (CA), USA, August, 2005. LNCS, vol 3653. Springer, Berlin, pp 66–80 Google Scholar
  17. 17.
    Cho K-H, Lim J-T (1998) Synthesis of fault-tolerant supervisor for automated manufacturing systems: A case study on photolothographic process. IEEE Trans Robot Autom 14(2):348–351 CrossRefGoogle Scholar
  18. 18.
    Cieslak R, Desclaux C, Fawaz A, Varaiya P (1988) Supervisory control of discrete-event processes with partial observations. IEEE Trans Autom Control 33(3):249–260 zbMATHCrossRefGoogle Scholar
  19. 19.
    Cortadella J, Kondratyev A, Lavagno L, Passerone C, Wanatabe Y (2005) Quasi-static scheduling of independant tasks for reactive systems. IEEE Trans Comput-Aided Des Integr Circuits Syst 24(10):1492–1514 CrossRefGoogle Scholar
  20. 20.
    Cousot P, Cousot R (1977) Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: 4th symposium on principles of programming languages, Los Angeles (CA), USA, January 1977. ACM SIGPLAN Google Scholar
  21. 21.
    Delaval G, Rutten E (2007) A domain-specific language for multi-task systems, applying discrete controller synthesis. EURASIP J Embed Syst. Article ID 84192 Google Scholar
  22. 22.
    Dumitrescu E, Girault A, Rutten E (2004) Validating fault-tolerant behaviors of synchronous system specifications by discrete controller synthesis. In: Workshop on discrete event systems, WODES’04, Reims, France, September 2004. IFAC, New York Google Scholar
  23. 23.
    Dumitrescu E, Girault A, Marchand H, Rutten E (2007) Optimal discrete controller synthesis for modeling fault-tolerant distributed systems. In: Workshop on dependable control of discrete systems, DCDS’07, Cachan, France, June 2007. IFAC, New York, pp 23–28 Google Scholar
  24. 24.
    Emerson EA, Clarke EM (1982) Using branching time temporal logic to synthesize synchronization skeletons. Sci Comput Program 2:241–266 zbMATHCrossRefGoogle Scholar
  25. 25.
    Gärtner F (1999) Fundamentals of fault-tolerant distributed computing in asynchronous environments. ACM Comput Surv 31(1):1–26 CrossRefGoogle Scholar
  26. 26.
    Gärtner F, Jhumka A (2004) Automating the addition of fail-safe fault-tolerance: Beyond fusion-closed specifications. In: Joint conference on formal modelling and analysis of timed systems and formal techniques in real-time and fault tolerant system, FORMATS-FTRTFT’04, Grenoble, France, September 2004. LNCS, vol 3253. Springer, Berlin Google Scholar
  27. 27.
    Girault A, Rutten E (2004) Discrete controller synthesis for fault-tolerant distributed systems. In: International workshop on formal methods for industrial critical systems, FMICS’04, Linz, Austria, September 2004. ENTCS, vol 133. Elsevier, New York, pp 81–100 Google Scholar
  28. 28.
    Girault A, Yu H (2006) A flexible method to tolerate value sensor failures. In: International conference on emerging technologies and factory automation, ETFA’06, Prague, Czech Republic, September 2006. IEEE, Los Alamitos, pp 86–93 Google Scholar
  29. 29.
    Halbwachs N, Lagnier F, Raymond P (1993) Synchronous observers and the verification of reactive systems. In: Nivat M, Rattray C, Rus T, Scollo G (eds) International conference on algebraic methodology and software technology, AMAST’93, Twente, NL, June 1993. Springer, Berlin Google Scholar
  30. 30.
    Jeannet B (2003) Dynamic partitioning in linear relation analysis. Application to the verification of reactive systems. Formal Methods Syst Des 23(1):5–37 zbMATHCrossRefGoogle Scholar
  31. 31.
    Jensen RM, Veloso M, Bryant R (2003) Synthesis of fault-tolerant plans for non-deterministic domains. In: Workshop on planning under uncertainty and incomplete information, Trento, Italy, June 2003 Google Scholar
  32. 32.
    Kamach O, Pietrac L, Niel E (2005) Approche multi-modèle pour les systèmes à événements discrets: application à un préhenseur pneumatique. In: Modélisation des systèmes réactifs, MSR’05, Autrans, France, September 2005. Hermes, Paris, pp 159–174 Google Scholar
  33. 33.
    Kulkarni SS, Arora A (2000) Automating the addition of fault-tolerance. In: Joseph M (ed) International symposium on formal techniques in real-time and fault-tolerant systems, FTRTFT’00, Pune, India, September 2000. LNCS, vol 1926. Springer, Berlin, pp 82–93 CrossRefGoogle Scholar
  34. 34.
    Kulkarni SS, Ebnenasir A (2004) Automated synthesis of multitolerance. In: International conference on dependable systems and networks, DSN’04, Firenze, Italy, June 2004. IEEE, Los Alamitos Google Scholar
  35. 35.
    Kulkarni SS, Ebnenasir A (2005) Complexity issues in automated synthesis of failsafe fault-tolerance. IEEE Trans Dependable Secure Comput 2(3):201–215 CrossRefGoogle Scholar
  36. 36.
    Kumar R, Garg VK (1995) Optimal supervisory control of discrete event dynamic systems. SIAM J Control Optim 33(2):419–439 zbMATHCrossRefMathSciNetGoogle Scholar
  37. 37.
    Lamport L, Shostak R, Pease M (1982) The Byzantine generals problem. ACM Trans Program Lang Syst 4(3):382–401 zbMATHCrossRefGoogle Scholar
  38. 38.
    Lin F, Wonham WM (1988) Decentralized supervisory control of discrete-event systems. Inf Sci 44(3):199–224 zbMATHCrossRefMathSciNetGoogle Scholar
  39. 39.
    Lin F, Wonham WM (1988) On observability of discrete-event systems. Inf Sci 44(3):173–198 zbMATHCrossRefMathSciNetGoogle Scholar
  40. 40.
    Maraninchi F, Rémond Y (2003) Mode-automata: a new domain-specific construct for the development of safe critical systems. Sci Comput Program 46(3):219–254 zbMATHCrossRefGoogle Scholar
  41. 41.
    Marchand H, Rutten E (2002) Managing multi-mode tasks with time cost and quality levels using optimal discrete controller synthesis. In: Euromicro conference on real-time systems, ECRTS’02, Vienna, Austria, June 2002 Google Scholar
  42. 42.
    Marchand H, Samaan M (2000) Incremental design of a power transformer station controller using a controller synthesis methodology. IEEE Trans Softw Eng 26(8):729–741 CrossRefGoogle Scholar
  43. 43.
    Marchand H, Boivineau O, Lafortune S (2000) On the synthesis of optimal schedulers in discrete event control problems with multiple goals. SIAM J Control Optim 39(2):512–532 zbMATHCrossRefMathSciNetGoogle Scholar
  44. 44.
    Marchand H, Bournai P, Le Borgne M, Le Guernic P (2000) Synthesis of discrete-event controllers based on the signal environment. Discrete Event Dyn Syst: Theory Appl 10(4):325–346 zbMATHCrossRefGoogle Scholar
  45. 45.
    Marchand H, Boivineau O, Lafortune S (2002) On optimal control of a class of partially observed discrete event systems. Automatica 38:1935–1943 CrossRefMathSciNetGoogle Scholar
  46. 46.
    Milner R (1989) Communication and concurrency. International series in computer science. Prentice-Hall, Englewood Cliffs zbMATHGoogle Scholar
  47. 47.
    Powell D (1992) Failure mode assumption and assumption coverage. In: International symposium on fault-tolerant computing, FTCS-22, Boston (MA), USA, July 1992. IEEE, Los Alamitos, pp 386–395. Research report LAAS 91462 Google Scholar
  48. 48.
    Ramadge PJ, Wonham WM (1987) Supervisory control of a class of discrete event processes. SIAM J Control Optim 25(1):206–230 zbMATHCrossRefMathSciNetGoogle Scholar
  49. 49.
    Schepers H, Hooman J (1994) Trace-based compositional proof theory for fault tolerant distributed systems. Theor Comput Sci, 128 Google Scholar
  50. 50.
    Sengupta R, Lafortune S (1998) An optimal control theory for discrete event systems. SIAM J Control Optim 36(2):488–541 zbMATHCrossRefMathSciNetGoogle Scholar
  51. 51.
    Taha S (2004) Synthèse de contròleurs discrets pour systèmes embarqués tolérants aux pannes. Master’s Report, Institut National Polytechnique de Grenoble, Grenoble, France, June 2004 Google Scholar
  52. 52.
    Tripakis S (2004) Decentralized control of discrete event systems with bounded or unbounded delay communication. IEEE Trans Autom Control 49(9):1489–1501 CrossRefMathSciNetGoogle Scholar
  53. 53.
    Tronci E (1996) Optimal finite state supervisory control. In: IEEE conference on decision and control, CDC’96, Kobe, Japan, December 1996. IEEE, Los Alamitos Google Scholar
  54. 54.
    Tsitsiklis JN (1989) On the control of discrete event dynamical systems. Math Control Signals Syst 2(2):95–107 zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.POP ART project-team and LIG laboratoryINRIA Grenoble Rhône-Alpes and Grenoble UniversitySaint-Ismier cedexFrance
  2. 2.SARDES project-team and LIG laboratoryINRIA Grenoble Rhône-Alpes and Grenoble UniversitySaint-Ismier cedexFrance

Personalised recommendations