Advertisement

Tolerance of Design Faults

  • David Powell
  • Jean Arlat
  • Yves Deswarte
  • Karama Kanoun
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6875)

Abstract

The idea that diverse or dissimilar computations could be used to detect errors can be traced back to Dynosius Lardner’s analysis of Babbage’s mechanical computers in the early 19th century. In the modern era of electronic computers, diverse redundancy techniques were pioneered in the 1970’s by Elmendorf, Randell, Aviz̆ienis and Chen. Since then, the tolerance of design faults has been a very active research topic, which has had practical impact on real critical applications. In this paper, we present a brief history of the topic and then describe two contemporary studies on the application of diversity in the fields of robotics and security.

Keywords

design-fault software-fault vulnerability fault-tolerance recovery blocks N-version programming N-self-checking components 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abrial, J.: The B-Book - Assigning Programs to Meanings. Cambridge University Press, Cambridge (1996)CrossRefzbMATHGoogle Scholar
  2. 2.
    Alami, R., Chatila, R., Fleury, S., Ghallab, M., Ingrand, F.: An architecture for autonomy. International Journal of Robotic Research 17(4), 315–337 (1998)CrossRefGoogle Scholar
  3. 3.
    Amman, P.E., Knight, J.C.: Data diversity: An approach to software fault tolerance. IEEE Trans. on Computers 37(4), 418–425 (1988)CrossRefGoogle Scholar
  4. 4.
    Anderson, T., Barrett, P., Halliwell, D., Moulding, M.: Software fault tolerance: an evaluation. IEEE Trans. on Software Engineering SE 11(12), 1502–1510 (1985)CrossRefGoogle Scholar
  5. 5.
    Anderson, T., Lee, P.: Fault Tolerance - Principles and Practice. Prentice-Hall, Englewood Cliffs (1981)zbMATHGoogle Scholar
  6. 6.
    Arlat, J., Kanekawa, N., Amendola, A., Dufour, J.L., Hirao, Y., Profeta III, J.: Dependability of railway control systems. In: 16th IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-16), pp. 150–155. IEEE CS Press, Vienna (1996)Google Scholar
  7. 7.
    Arlat, J., Kanoun, K., Laprie, J.C.: Dependability modeling and evaluation of software fault-tolerant systems. IEEE Trans. on Computers 39(4), 504–513 (1990)CrossRefGoogle Scholar
  8. 8.
    Avižienis, A.: The N-version approach to fault-tolerant systems. IEEE Trans. on Software Engineering 11(12), 1491–1501 (1985)CrossRefGoogle Scholar
  9. 9.
    Avižienis, A., Chen, L.: On the implementation of N-version programming for software fault tolerance during execution. In: 1st IEEE-CS Int. Computer Software and Applications Conference (COMPSAC 1977), pp. 149–155. IEEE CS Press, Chicago (1977)Google Scholar
  10. 10.
    Avižienis, A., Kelly, J.: Fault-tolerance by design diversity: Concepts and experiments. Computer 17(8), 67–80 (1984)CrossRefGoogle Scholar
  11. 11.
    Avižienis, A., Laprie, J.C., Randell, B., Landwehr, C.: Basic concepts and terminology of dependable and secure computing. IEEE Trans. on Dependable and Secure Computing 1(1), 11–33 (2004)CrossRefGoogle Scholar
  12. 12.
    Avižienis, A., Lyu, M., Schutz, W., Tso, K., Voges, U.: DEDIX 87 - a supervisory system for design diversity experiments at UCLA. In: Voges, U. (ed.) Software Diversity in Computerized Control Systems, pp. 129–168. Springer, Wien (1988)CrossRefGoogle Scholar
  13. 13.
    Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: 19th ACM Symp. on Operating Systems Principles (SOSP), pp. 164–177. ACM, New York (2003)Google Scholar
  14. 14.
    Bartlett, J., Gray, J., Horst, B.: Fault tolerance in Tandem computer systems. In: Avižienis, A., Kopetz, H., Laprie, J.C. (eds.) The Evolution of Fault-Tolerant Systems, pp. 55–76. Springer, Vienna (1987)CrossRefGoogle Scholar
  15. 15.
    Beder, D., Randell, B., Romanovsky, A., Rubira, C.: On applying atomic actions and dependable software architectures for developing complex systems. In: 4th IEEE Int. Symp. on Object-Oriented Real-Time Distributed Computing, pp. 103–112. IEEE CS Press, Magdeburg (2001)Google Scholar
  16. 16.
    Bishop, P., Esp, D., Barnes, M., Humphreys, P., Dahl, G., Lahti, J., Yoshimura, S.: Project on diverse software - an experiment in software reliability. In: Safety of Computer Control Systems (SAFECOMP), pp. 153–158 (1985)Google Scholar
  17. 17.
    Brière, D., Traverse, P.: Airbus A320/A330/A340 electrical flight controls - a family of fault-tolerant systems. In: 23rd IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-23), pp. 616–623. IEEE CS Press, Toulouse (1993)Google Scholar
  18. 18.
    Castelli, V., Harper, R., Heidelberger, P., Hunter, S., Trivedi, K., Vaidyanathan, K., Zeggert, W.: Proactive management of software aging. IBM Journal of Research and Development 45(2), 311–332 (2001)CrossRefGoogle Scholar
  19. 19.
    Crosby, P.B.: Cutting the cost of quality; the defect prevention workbook for managers. Industrial Education Institute, Boston (1967)Google Scholar
  20. 20.
    Crouzet, Y., Waeselynck, H., Lussier, B., Powell, D.: The SESAME experience: from assembly languages to declarative models. In: Mutation 2006 - The Second Workshop on Mutation Analysis, 17th IEEE Int. Symp. on Software Reliability Engineering (ISSRE 2006). IEEE, Raleigh (2006)Google Scholar
  21. 21.
    Deswarte, Y., Kanoun, K., Laprie, J.C.: Diversity against accidental and deliberate faults. In: Amman, P., Barnes, B., Jajodia, S., Sibley, E. (eds.) Computer Security, Dependability and Assurance: From Needs to Solutions, pp. 171–182. IEEE CS Press, Los Alamitos (1999)CrossRefGoogle Scholar
  22. 22.
    Duflot, L., Levillain, O., Morin, B.: ACPI: Design principles and concerns. In: Chen, L., Mitchell, C., Martin, A. (eds.) Trust 2009. LNCS, vol. 5471, pp. 14–28. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  23. 23.
    Dugan, J., Lyu, M.: Dependability modeling for fault-tolerant software and systems. In: Lyu, M. (ed.) Software Fault Tolerance, pp. 109–138. Wiley & Sons, Chichester (1995)Google Scholar
  24. 24.
    Eckhardt, D., Caglayan, A., Knight, J., Lee, L., McAllister, D., Vouk, M., Kelly, J.: An experimental evaluation of software redundancy as a strategy for improving reliability. IEEE Trans. on Software Engineering 17(7), 692–6702 (1991)CrossRefGoogle Scholar
  25. 25.
    Eckhardt, D., Lee, L.: A theoretical basis of multiversion software subject to coincident errors. IEEE Trans. on Software Engineering SE-11, 1511–1517 (1985)CrossRefzbMATHGoogle Scholar
  26. 26.
    Elmendorf, W.: Fault-tolerant programming. In: 2nd IEEE Int. Symp. on Fault Tolerant Computing (FTCS-2), pp. 79–83. IEEE CS Press, Newton (1972)Google Scholar
  27. 27.
    Ghallab, M., Laruelle, H.: Representation and control in IxTeT, a temporal planner. In: 2nd Int. Conf. on Artificial Intelligence Planning Systems (AIPS 1994), pp. 61–67. AIAA Press, Chicago (1994)Google Scholar
  28. 28.
    Goldberg, A., Havelund, K., McGann, C.: Runtime verification for autonomous spacecraft software. In: IEEE Aerospace Conference, pp. 507–516 (2005)Google Scholar
  29. 29.
    Gray, J.: Why do computers stop and what can be done about it? In: 5th Symp. on Reliability in Distributed Software and Database Systems, pp. 3–12. IEEE CS Press, Los Angeles (1986)Google Scholar
  30. 30.
    Grnarov, A., Arlat, J., Avižienis, A.: On the performance of software fault-tolerant strategies. In: 10th IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-10), pp. 251–253. IEEE CS Press, Kyoto (1980)Google Scholar
  31. 31.
    Hatton, L.: N-version design vs. one good version. IEEE Software 14(6), 71–76 (1997)CrossRefGoogle Scholar
  32. 32.
    Hennebert, C., Guiho, G.: SACEM: A fault-tolerant system for train speed control. In: 23rd IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-23), pp. 624–628. IEEE CS Press, Toulouse (1993)Google Scholar
  33. 33.
    Howey, R., Long, D., Fox, M.: VAL: Automatic plan validation, continuous effects and mixed initiative planning using PDDL. In: 16th IEEE International Conference on Tools with Artificial Intelligence, pp. 294–301 (2004)Google Scholar
  34. 34.
    Huang, Y., Kintala, C., Kolettis, N., Fulton, N.: Software rejuvenation: Analysis, module and applications. In: 25th IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-25), pp. 381–390. IEEE CS Press, Pasadena (1995)Google Scholar
  35. 35.
    ISO/IEC-15408: Common criteria for information technology security evaluationGoogle Scholar
  36. 36.
    Jaeger, E., Hardin, T.: A few remarks about formal development of secure systems. In: 11th IEEE High Assurance Systems Engineering Symposium (HASE), pp. 165–1174 (2008)Google Scholar
  37. 37.
    Jalote, P., Huang, Y., KIntala, C.: A framework for understanding and handling transient software failures. In: 2nd ISSAT Int. Conf. Reliability and Quality in Design, Orlando, FL, USA, pp. 231–237 (1995)Google Scholar
  38. 38.
    Kanoun, K.: Real-world design diversity: a case study on cost. IEEE Software 18(4), 29–233 (2001)CrossRefGoogle Scholar
  39. 39.
    Kelly, J., Eckhardt Jr., D.E., Vouk, M., McAllister, D., Caglayan, A.: A large scale second generation experiment in multi-version software: description and early results. In: 18th IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-18), pp. 9–14. IEEE CS Press, Los Alamitos (1988)Google Scholar
  40. 40.
    Khatib, L., Muscettola, N., Havelund, K.: Mapping temporal planning constraints into timed automata. In: 8th Int. Symp. on Temporal Representation and Reasoning (TIME 2001), pp. 21–27. IEEE, Cividale Del Friuli (2001)Google Scholar
  41. 41.
    Kim, K., Welch, H.: Distributed execution of recovery blocks: an approach to uniform treatment of hardware and software faults in real-time applications. IEEE Trans. on Computers 38(5), 626–636 (1989)CrossRefGoogle Scholar
  42. 42.
    Klein, G., Elphinstone, K., Heiser, G., Andronick, J., Cock, F., Derrin, P., Elkaduwe, F., Engelhardt, K., Kolanski, R., Norrish, M., Sewell, T., Tuch, H., Winwood, S.: sel4: Formal verification of an OS kernel. In: 22nd Symp. on Operating Systems Principles (SOSP), pp. 207–220. ACM, Big Sky (2009)Google Scholar
  43. 43.
    Knight, J., Leveson, N.: An experimental evaluation of the assumption of independence in multi-version programming. IEEE Trans. on Software Engineering SE-12(1), 96–109 (1986)CrossRefGoogle Scholar
  44. 44.
    Laarouchi, Y., Deswarte, Y., Powell, D., Arlat, J., de Nadai, E.: Connecting commercial computers to avionics systems. In: IEEE/AIAA 28th Digital Avionics Systems Conference (DASC 2009), Orlando, FL, USA, pp. 6.D.1–6.D.9 (2009)Google Scholar
  45. 45.
    Lacombe, E., Nicomette, V., Deswarte, Y.: Enforcing kernel constraints by hardware-assisted virtualization. Journal in Computer Virology 7(1), 1–21 (2011)CrossRefGoogle Scholar
  46. 46.
    Laprie, J.C.: Dependability: Basic concepts and associated terminology. Dependability : Basic Concepts and Terminology LAAS-CNRS, 7 Ave. Colonel Roche, 31077 Toulouse, France, p. 33 (1990)Google Scholar
  47. 47.
    Laprie, J.C., Arlat, J., Béounes, C., Kanoun, K.: Definition and analysis of hardware-and-software fault-tolerant architectures. Computer 23(7), 39–51 (1990)CrossRefGoogle Scholar
  48. 48.
    Laprie, J.C., Arlat, J., Béounes, C., Kanoun, K.: Architectural issues in software fault tolerance. In: Lyu, M. (ed.) Software Fault Tolerance, pp. 47–78. Wiley & Sons, Chichester (1995)Google Scholar
  49. 49.
    Lardner, D.: Babbage’s calculating engine. Edinburgh Review 59, 263–327 (1834)Google Scholar
  50. 50.
    Leveson, N., Cha, S., Knight, J., Shimeall, T.: The use of self checks and voting in software error detection: an empirical study. IEEE Transactions on Software Engineering 16(4), 432–4443 (1990)CrossRefGoogle Scholar
  51. 51.
    Littlewood, B., Popov, P., Strigini, L.: Assessment of the reliability of fault-tolerant software: A bayesian approach. In: Koornneef, F., van der Meulen, M.J.P. (eds.) SAFECOMP 2000. LNCS, vol. 1943, pp. 294–308. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  52. 52.
    Lone Sang, F., Lacombe, É., Nicomette, V., Deswarte, Y.: Exploiting an I/OMMU vulnerability. In: 5th Int’l Conf. on Malicious and Unwanted Software (MALWARE), pp. 7–14 (2010)Google Scholar
  53. 53.
    Lussier, B., Gallien, M., Guiochet, J., Ingrand, F., Killijian, M.O., Powell, D.: Fault tolerant planning for critical robots. In: 37th Annual IEEE/IFIP Int. Conf. on Dependable Systems and Networks (DSN 2007), pp. 144–153. IEEE CS Press, Edinburgh (2007)CrossRefGoogle Scholar
  54. 54.
    Lussier, B., Gallien, M., Guiochet, J., Ingrand, F., Killijian, M.O., Powell, D.: Planning with diversified models for fault-tolerant robots. In: 17th. Int. Conf. on Automated Planning and Scheduling (ICAPS), pp. 216–223. AAAI, Providence (2007)Google Scholar
  55. 55.
    Lussier, B., Lampe, A., Chatila, R., Guiochet, J., Ingrand, F., Killijian, M.O., Powell, D.: Fault tolerance in autonomous systems: How and how much? In: 4th IARP - IEEE/RAS - EURON Joint Workshop on Technical Challenges for Dependable Robots in Human Environments, Nagoya, Japan (2005)Google Scholar
  56. 56.
    Meulen, M.J.P., van der Revilla, M.A.: The effectiveness of software diversity in a large population of programs. IEEE Trans. on Software Engineering 34(6), 753–764 (2008)CrossRefGoogle Scholar
  57. 57.
    Migneault, G.E.: The cost of software fault tolerance. In: AGARD Symposium on Software Avionics, The Hague, The Netherlands, pp. 37/1–37/8 (1992)Google Scholar
  58. 58.
    Mongardi, G.: Dependable computing for railway control systems. In: 3rd IFIP Working Conf. on Dependable Computing for Critical Applications (DCCA-3), Palermo, Italy, pp. 255–273 (1993)Google Scholar
  59. 59.
    Muscettola, N., Dorais, G., Fry, C., Levinson, R., Plaunt, C.: IDEA: Planning at the core of autonomous reactive agents. In: 3rd Int. NASA Workshop on Planning and Scheduling for Space, Houston, TX, USA (2002)Google Scholar
  60. 60.
    Nguyen-Tuong, A., Evans, D., Knight, J., Cox, B., Davidson, J.: Security through redundant data diversity. In: IEEE/IFIP Int. Conf. on Dependable Systems and Networks, Ancorage, Alaska, USA, pp. 187–196 (2008)Google Scholar
  61. 61.
    Nicola, V., Goyal, A.: Modeling of correlated failures and community error recovery in multiversion software. IEEE Trans. on Software Engineering 16(3), 350–359 (1990)CrossRefGoogle Scholar
  62. 62.
    Oh, N., Mitra, S., McCluskey, E.: ED4I: Error detection by diverse data and duplicated instructions. IEEE Trans. on Computers 51(2), 180–199 (2002)CrossRefGoogle Scholar
  63. 63.
    Penix, J., Pecheur, C., Havelund, K.: Using model checking to validate AI planner domain models. In: 23rd Annual Software Engineering Workshop, NASA Goddard (1998)Google Scholar
  64. 64.
    Popov, P., Strigini, L.: Assessing asymmetric fault-tolerant software. In: 21st Int. Symp. on Software Reliability Engineering (ISSRE), pp. 41–450. IEEE CS Press, Los Alamitos (2010)Google Scholar
  65. 65.
    Randell, B.: System structure for software fault tolerance. IEEE Trans. on Software Engineering SE-1(2), 220–232 (1975)MathSciNetCrossRefGoogle Scholar
  66. 66.
    Randell, B., Romanovsky, A., Rubira, C., Stroud, R., Wu, Z., Xu, J.: From recovery blocks to coordinated atomic actions. In: Randell, B., Laprie, J.C., Kopetz, H., Littlewood, B. (eds.) Predictably Dependable Computer Systems, pp. 87–101. Springer, Heidelberg (1995)CrossRefGoogle Scholar
  67. 67.
    Scott, R., Gault, J., McAllister, D.: Fault-tolerant software reliability modeling. IEEE Trans. on Software Engineering SE-13(5), 582–592 (1987)CrossRefGoogle Scholar
  68. 68.
    Smith, J., Nair, R.: Virtual Machines: Versatile Platforms for Systems and Processes. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  69. 69.
    Sullivan, G.F., Masson, G.M.: Certification trails for data structures. In: 21st IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-21), pp. 240–247. IEEE CS Press, Montreal (1991)Google Scholar
  70. 70.
    Tai, A., Meyer, J., Avižienis, A.: Performability enhancement of fault-tolerant software. IEEE Trans. on Reliability 42(2), 227–2237 (1993)CrossRefzbMATHGoogle Scholar
  71. 71.
    Tomek, L., Muppala, J., Trivedi, K.: Analyses using reward nets. In: Lyu, M. (ed.) Software Fault Tolerance, pp. 139–165. Wiley & Sons, Chichester (1995)Google Scholar
  72. 72.
    Traverse, P., Lacaze, I., Souyris, J.: Airbus fly-by-wire: A total approach to dependability. In: Jacquart, J. (ed.) Building the Information Society, 18th IFIP World Computer Congress, pp. 191–212. Kluwer Awademic Publishers, Dordrecht (2004)CrossRefGoogle Scholar
  73. 73.
    Varian, M.: VM and the VM community: Past, present, and future (1997), http://web.me.com/melinda.varian/
  74. 74.
    Voges, U.: Software Diversity in Computerized Control Systems, vol. 2. Springer, Heidelberg (1988)zbMATHGoogle Scholar
  75. 75.
    Xia, C., Lyu, M.: An empirical study on reliability modeling for diverse software systems. In: 15th Int. Symp. on Software Reliability Engineering (ISSRE), pp. 125–136 (2004)Google Scholar
  76. 76.
    Xia, C., Lyu, M., Vouk, M.: An experimental evaluation on reliability features of N-version programming. In: 16th Int. Symp. on Software Reliability Engineering, ISSRE, pp. 10pp.– 170 (2005)Google Scholar
  77. 77.
    Xu, J.: The t/(n − 1)-diagnosability and its applications to fault tolerance. In: 21st IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-21), pp. 496–503. IEEE CS Press, Montreal (1991)Google Scholar
  78. 78.
    Yeh, Y.: Dependability of the 777 primary flight control system. In: Iyer, R., Morganti, M., Fuchs, W.K., Gligor, V. (eds.) Dependable Computing for Critical Applications (DCCA-5), pp. 3–17. IEEE CS Press, Los Alamitos (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • David Powell
    • 1
    • 2
  • Jean Arlat
    • 1
    • 2
  • Yves Deswarte
    • 1
    • 2
  • Karama Kanoun
    • 1
    • 2
  1. 1.CNRS ; LAASToulouse Cedex 4France
  2. 2.Université de Toulouse ; UPS, INSA, INP, ISAE ; UT1, UTM, LAASToulouse Cedex 4France

Personalised recommendations