Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6875))

Abstract

The idea that diverse or dissimilar computations could be used to detect errors can be traced back to Dynosius Lardner’s analysis of Babbage’s mechanical computers in the early 19th century. In the modern era of electronic computers, diverse redundancy techniques were pioneered in the 1970’s by Elmendorf, Randell, Aviz̆ienis and Chen. Since then, the tolerance of design faults has been a very active research topic, which has had practical impact on real critical applications. In this paper, we present a brief history of the topic and then describe two contemporary studies on the application of diversity in the fields of robotics and security.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Abrial, J.: The B-Book - Assigning Programs to Meanings. Cambridge University Press, Cambridge (1996)

    Book  MATH  Google Scholar 

  2. Alami, R., Chatila, R., Fleury, S., Ghallab, M., Ingrand, F.: An architecture for autonomy. International Journal of Robotic Research 17(4), 315–337 (1998)

    Article  Google Scholar 

  3. Amman, P.E., Knight, J.C.: Data diversity: An approach to software fault tolerance. IEEE Trans. on Computers 37(4), 418–425 (1988)

    Article  Google Scholar 

  4. Anderson, T., Barrett, P., Halliwell, D., Moulding, M.: Software fault tolerance: an evaluation. IEEE Trans. on Software Engineering SE 11(12), 1502–1510 (1985)

    Article  Google Scholar 

  5. Anderson, T., Lee, P.: Fault Tolerance - Principles and Practice. Prentice-Hall, Englewood Cliffs (1981)

    MATH  Google Scholar 

  6. Arlat, J., Kanekawa, N., Amendola, A., Dufour, J.L., Hirao, Y., Profeta III, J.: Dependability of railway control systems. In: 16th IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-16), pp. 150–155. IEEE CS Press, Vienna (1996)

    Google Scholar 

  7. Arlat, J., Kanoun, K., Laprie, J.C.: Dependability modeling and evaluation of software fault-tolerant systems. IEEE Trans. on Computers 39(4), 504–513 (1990)

    Article  Google Scholar 

  8. Avižienis, A.: The N-version approach to fault-tolerant systems. IEEE Trans. on Software Engineering 11(12), 1491–1501 (1985)

    Article  Google Scholar 

  9. Avižienis, A., Chen, L.: On the implementation of N-version programming for software fault tolerance during execution. In: 1st IEEE-CS Int. Computer Software and Applications Conference (COMPSAC 1977), pp. 149–155. IEEE CS Press, Chicago (1977)

    Google Scholar 

  10. Avižienis, A., Kelly, J.: Fault-tolerance by design diversity: Concepts and experiments. Computer 17(8), 67–80 (1984)

    Article  Google Scholar 

  11. Avižienis, A., Laprie, J.C., Randell, B., Landwehr, C.: Basic concepts and terminology of dependable and secure computing. IEEE Trans. on Dependable and Secure Computing 1(1), 11–33 (2004)

    Article  Google Scholar 

  12. Avižienis, A., Lyu, M., Schutz, W., Tso, K., Voges, U.: DEDIX 87 - a supervisory system for design diversity experiments at UCLA. In: Voges, U. (ed.) Software Diversity in Computerized Control Systems, pp. 129–168. Springer, Wien (1988)

    Chapter  Google Scholar 

  13. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: 19th ACM Symp. on Operating Systems Principles (SOSP), pp. 164–177. ACM, New York (2003)

    Google Scholar 

  14. Bartlett, J., Gray, J., Horst, B.: Fault tolerance in Tandem computer systems. In: Avižienis, A., Kopetz, H., Laprie, J.C. (eds.) The Evolution of Fault-Tolerant Systems, pp. 55–76. Springer, Vienna (1987)

    Chapter  Google Scholar 

  15. Beder, D., Randell, B., Romanovsky, A., Rubira, C.: On applying atomic actions and dependable software architectures for developing complex systems. In: 4th IEEE Int. Symp. on Object-Oriented Real-Time Distributed Computing, pp. 103–112. IEEE CS Press, Magdeburg (2001)

    Google Scholar 

  16. Bishop, P., Esp, D., Barnes, M., Humphreys, P., Dahl, G., Lahti, J., Yoshimura, S.: Project on diverse software - an experiment in software reliability. In: Safety of Computer Control Systems (SAFECOMP), pp. 153–158 (1985)

    Google Scholar 

  17. Brière, D., Traverse, P.: Airbus A320/A330/A340 electrical flight controls - a family of fault-tolerant systems. In: 23rd IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-23), pp. 616–623. IEEE CS Press, Toulouse (1993)

    Google Scholar 

  18. Castelli, V., Harper, R., Heidelberger, P., Hunter, S., Trivedi, K., Vaidyanathan, K., Zeggert, W.: Proactive management of software aging. IBM Journal of Research and Development 45(2), 311–332 (2001)

    Article  Google Scholar 

  19. Crosby, P.B.: Cutting the cost of quality; the defect prevention workbook for managers. Industrial Education Institute, Boston (1967)

    Google Scholar 

  20. Crouzet, Y., Waeselynck, H., Lussier, B., Powell, D.: The SESAME experience: from assembly languages to declarative models. In: Mutation 2006 - The Second Workshop on Mutation Analysis, 17th IEEE Int. Symp. on Software Reliability Engineering (ISSRE 2006). IEEE, Raleigh (2006)

    Google Scholar 

  21. Deswarte, Y., Kanoun, K., Laprie, J.C.: Diversity against accidental and deliberate faults. In: Amman, P., Barnes, B., Jajodia, S., Sibley, E. (eds.) Computer Security, Dependability and Assurance: From Needs to Solutions, pp. 171–182. IEEE CS Press, Los Alamitos (1999)

    Chapter  Google Scholar 

  22. Duflot, L., Levillain, O., Morin, B.: ACPI: Design principles and concerns. In: Chen, L., Mitchell, C., Martin, A. (eds.) Trust 2009. LNCS, vol. 5471, pp. 14–28. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  23. Dugan, J., Lyu, M.: Dependability modeling for fault-tolerant software and systems. In: Lyu, M. (ed.) Software Fault Tolerance, pp. 109–138. Wiley & Sons, Chichester (1995)

    Google Scholar 

  24. Eckhardt, D., Caglayan, A., Knight, J., Lee, L., McAllister, D., Vouk, M., Kelly, J.: An experimental evaluation of software redundancy as a strategy for improving reliability. IEEE Trans. on Software Engineering 17(7), 692–6702 (1991)

    Article  Google Scholar 

  25. Eckhardt, D., Lee, L.: A theoretical basis of multiversion software subject to coincident errors. IEEE Trans. on Software Engineering SE-11, 1511–1517 (1985)

    Article  MATH  Google Scholar 

  26. Elmendorf, W.: Fault-tolerant programming. In: 2nd IEEE Int. Symp. on Fault Tolerant Computing (FTCS-2), pp. 79–83. IEEE CS Press, Newton (1972)

    Google Scholar 

  27. Ghallab, M., Laruelle, H.: Representation and control in IxTeT, a temporal planner. In: 2nd Int. Conf. on Artificial Intelligence Planning Systems (AIPS 1994), pp. 61–67. AIAA Press, Chicago (1994)

    Google Scholar 

  28. Goldberg, A., Havelund, K., McGann, C.: Runtime verification for autonomous spacecraft software. In: IEEE Aerospace Conference, pp. 507–516 (2005)

    Google Scholar 

  29. Gray, J.: Why do computers stop and what can be done about it? In: 5th Symp. on Reliability in Distributed Software and Database Systems, pp. 3–12. IEEE CS Press, Los Angeles (1986)

    Google Scholar 

  30. Grnarov, A., Arlat, J., Avižienis, A.: On the performance of software fault-tolerant strategies. In: 10th IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-10), pp. 251–253. IEEE CS Press, Kyoto (1980)

    Google Scholar 

  31. Hatton, L.: N-version design vs. one good version. IEEE Software 14(6), 71–76 (1997)

    Article  Google Scholar 

  32. Hennebert, C., Guiho, G.: SACEM: A fault-tolerant system for train speed control. In: 23rd IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-23), pp. 624–628. IEEE CS Press, Toulouse (1993)

    Google Scholar 

  33. Howey, R., Long, D., Fox, M.: VAL: Automatic plan validation, continuous effects and mixed initiative planning using PDDL. In: 16th IEEE International Conference on Tools with Artificial Intelligence, pp. 294–301 (2004)

    Google Scholar 

  34. Huang, Y., Kintala, C., Kolettis, N., Fulton, N.: Software rejuvenation: Analysis, module and applications. In: 25th IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-25), pp. 381–390. IEEE CS Press, Pasadena (1995)

    Google Scholar 

  35. ISO/IEC-15408: Common criteria for information technology security evaluation

    Google Scholar 

  36. Jaeger, E., Hardin, T.: A few remarks about formal development of secure systems. In: 11th IEEE High Assurance Systems Engineering Symposium (HASE), pp. 165–1174 (2008)

    Google Scholar 

  37. Jalote, P., Huang, Y., KIntala, C.: A framework for understanding and handling transient software failures. In: 2nd ISSAT Int. Conf. Reliability and Quality in Design, Orlando, FL, USA, pp. 231–237 (1995)

    Google Scholar 

  38. Kanoun, K.: Real-world design diversity: a case study on cost. IEEE Software 18(4), 29–233 (2001)

    Article  Google Scholar 

  39. Kelly, J., Eckhardt Jr., D.E., Vouk, M., McAllister, D., Caglayan, A.: A large scale second generation experiment in multi-version software: description and early results. In: 18th IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-18), pp. 9–14. IEEE CS Press, Los Alamitos (1988)

    Google Scholar 

  40. Khatib, L., Muscettola, N., Havelund, K.: Mapping temporal planning constraints into timed automata. In: 8th Int. Symp. on Temporal Representation and Reasoning (TIME 2001), pp. 21–27. IEEE, Cividale Del Friuli (2001)

    Google Scholar 

  41. Kim, K., Welch, H.: Distributed execution of recovery blocks: an approach to uniform treatment of hardware and software faults in real-time applications. IEEE Trans. on Computers 38(5), 626–636 (1989)

    Article  Google Scholar 

  42. Klein, G., Elphinstone, K., Heiser, G., Andronick, J., Cock, F., Derrin, P., Elkaduwe, F., Engelhardt, K., Kolanski, R., Norrish, M., Sewell, T., Tuch, H., Winwood, S.: sel4: Formal verification of an OS kernel. In: 22nd Symp. on Operating Systems Principles (SOSP), pp. 207–220. ACM, Big Sky (2009)

    Google Scholar 

  43. Knight, J., Leveson, N.: An experimental evaluation of the assumption of independence in multi-version programming. IEEE Trans. on Software Engineering SE-12(1), 96–109 (1986)

    Article  Google Scholar 

  44. Laarouchi, Y., Deswarte, Y., Powell, D., Arlat, J., de Nadai, E.: Connecting commercial computers to avionics systems. In: IEEE/AIAA 28th Digital Avionics Systems Conference (DASC 2009), Orlando, FL, USA, pp. 6.D.1–6.D.9 (2009)

    Google Scholar 

  45. Lacombe, E., Nicomette, V., Deswarte, Y.: Enforcing kernel constraints by hardware-assisted virtualization. Journal in Computer Virology 7(1), 1–21 (2011)

    Article  Google Scholar 

  46. Laprie, J.C.: Dependability: Basic concepts and associated terminology. Dependability : Basic Concepts and Terminology LAAS-CNRS, 7 Ave. Colonel Roche, 31077 Toulouse, France, p. 33 (1990)

    Google Scholar 

  47. Laprie, J.C., Arlat, J., Béounes, C., Kanoun, K.: Definition and analysis of hardware-and-software fault-tolerant architectures. Computer 23(7), 39–51 (1990)

    Article  Google Scholar 

  48. Laprie, J.C., Arlat, J., Béounes, C., Kanoun, K.: Architectural issues in software fault tolerance. In: Lyu, M. (ed.) Software Fault Tolerance, pp. 47–78. Wiley & Sons, Chichester (1995)

    Google Scholar 

  49. Lardner, D.: Babbage’s calculating engine. Edinburgh Review 59, 263–327 (1834)

    Google Scholar 

  50. Leveson, N., Cha, S., Knight, J., Shimeall, T.: The use of self checks and voting in software error detection: an empirical study. IEEE Transactions on Software Engineering 16(4), 432–4443 (1990)

    Article  Google Scholar 

  51. Littlewood, B., Popov, P., Strigini, L.: Assessment of the reliability of fault-tolerant software: A bayesian approach. In: Koornneef, F., van der Meulen, M.J.P. (eds.) SAFECOMP 2000. LNCS, vol. 1943, pp. 294–308. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  52. Lone Sang, F., Lacombe, É., Nicomette, V., Deswarte, Y.: Exploiting an I/OMMU vulnerability. In: 5th Int’l Conf. on Malicious and Unwanted Software (MALWARE), pp. 7–14 (2010)

    Google Scholar 

  53. Lussier, B., Gallien, M., Guiochet, J., Ingrand, F., Killijian, M.O., Powell, D.: Fault tolerant planning for critical robots. In: 37th Annual IEEE/IFIP Int. Conf. on Dependable Systems and Networks (DSN 2007), pp. 144–153. IEEE CS Press, Edinburgh (2007)

    Chapter  Google Scholar 

  54. Lussier, B., Gallien, M., Guiochet, J., Ingrand, F., Killijian, M.O., Powell, D.: Planning with diversified models for fault-tolerant robots. In: 17th. Int. Conf. on Automated Planning and Scheduling (ICAPS), pp. 216–223. AAAI, Providence (2007)

    Google Scholar 

  55. Lussier, B., Lampe, A., Chatila, R., Guiochet, J., Ingrand, F., Killijian, M.O., Powell, D.: Fault tolerance in autonomous systems: How and how much? In: 4th IARP - IEEE/RAS - EURON Joint Workshop on Technical Challenges for Dependable Robots in Human Environments, Nagoya, Japan (2005)

    Google Scholar 

  56. Meulen, M.J.P., van der Revilla, M.A.: The effectiveness of software diversity in a large population of programs. IEEE Trans. on Software Engineering 34(6), 753–764 (2008)

    Article  Google Scholar 

  57. Migneault, G.E.: The cost of software fault tolerance. In: AGARD Symposium on Software Avionics, The Hague, The Netherlands, pp. 37/1–37/8 (1992)

    Google Scholar 

  58. Mongardi, G.: Dependable computing for railway control systems. In: 3rd IFIP Working Conf. on Dependable Computing for Critical Applications (DCCA-3), Palermo, Italy, pp. 255–273 (1993)

    Google Scholar 

  59. Muscettola, N., Dorais, G., Fry, C., Levinson, R., Plaunt, C.: IDEA: Planning at the core of autonomous reactive agents. In: 3rd Int. NASA Workshop on Planning and Scheduling for Space, Houston, TX, USA (2002)

    Google Scholar 

  60. Nguyen-Tuong, A., Evans, D., Knight, J., Cox, B., Davidson, J.: Security through redundant data diversity. In: IEEE/IFIP Int. Conf. on Dependable Systems and Networks, Ancorage, Alaska, USA, pp. 187–196 (2008)

    Google Scholar 

  61. Nicola, V., Goyal, A.: Modeling of correlated failures and community error recovery in multiversion software. IEEE Trans. on Software Engineering 16(3), 350–359 (1990)

    Article  Google Scholar 

  62. Oh, N., Mitra, S., McCluskey, E.: ED4I: Error detection by diverse data and duplicated instructions. IEEE Trans. on Computers 51(2), 180–199 (2002)

    Article  Google Scholar 

  63. Penix, J., Pecheur, C., Havelund, K.: Using model checking to validate AI planner domain models. In: 23rd Annual Software Engineering Workshop, NASA Goddard (1998)

    Google Scholar 

  64. Popov, P., Strigini, L.: Assessing asymmetric fault-tolerant software. In: 21st Int. Symp. on Software Reliability Engineering (ISSRE), pp. 41–450. IEEE CS Press, Los Alamitos (2010)

    Google Scholar 

  65. Randell, B.: System structure for software fault tolerance. IEEE Trans. on Software Engineering SE-1(2), 220–232 (1975)

    Article  MathSciNet  Google Scholar 

  66. Randell, B., Romanovsky, A., Rubira, C., Stroud, R., Wu, Z., Xu, J.: From recovery blocks to coordinated atomic actions. In: Randell, B., Laprie, J.C., Kopetz, H., Littlewood, B. (eds.) Predictably Dependable Computer Systems, pp. 87–101. Springer, Heidelberg (1995)

    Chapter  Google Scholar 

  67. Scott, R., Gault, J., McAllister, D.: Fault-tolerant software reliability modeling. IEEE Trans. on Software Engineering SE-13(5), 582–592 (1987)

    Article  Google Scholar 

  68. Smith, J., Nair, R.: Virtual Machines: Versatile Platforms for Systems and Processes. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  69. Sullivan, G.F., Masson, G.M.: Certification trails for data structures. In: 21st IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-21), pp. 240–247. IEEE CS Press, Montreal (1991)

    Google Scholar 

  70. Tai, A., Meyer, J., Avižienis, A.: Performability enhancement of fault-tolerant software. IEEE Trans. on Reliability 42(2), 227–2237 (1993)

    Article  MATH  Google Scholar 

  71. Tomek, L., Muppala, J., Trivedi, K.: Analyses using reward nets. In: Lyu, M. (ed.) Software Fault Tolerance, pp. 139–165. Wiley & Sons, Chichester (1995)

    Google Scholar 

  72. Traverse, P., Lacaze, I., Souyris, J.: Airbus fly-by-wire: A total approach to dependability. In: Jacquart, J. (ed.) Building the Information Society, 18th IFIP World Computer Congress, pp. 191–212. Kluwer Awademic Publishers, Dordrecht (2004)

    Chapter  Google Scholar 

  73. Varian, M.: VM and the VM community: Past, present, and future (1997), http://web.me.com/melinda.varian/

  74. Voges, U.: Software Diversity in Computerized Control Systems, vol. 2. Springer, Heidelberg (1988)

    MATH  Google Scholar 

  75. Xia, C., Lyu, M.: An empirical study on reliability modeling for diverse software systems. In: 15th Int. Symp. on Software Reliability Engineering (ISSRE), pp. 125–136 (2004)

    Google Scholar 

  76. Xia, C., Lyu, M., Vouk, M.: An experimental evaluation on reliability features of N-version programming. In: 16th Int. Symp. on Software Reliability Engineering, ISSRE, pp. 10pp.– 170 (2005)

    Google Scholar 

  77. Xu, J.: The t/(n − 1)-diagnosability and its applications to fault tolerance. In: 21st IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-21), pp. 496–503. IEEE CS Press, Montreal (1991)

    Google Scholar 

  78. Yeh, Y.: Dependability of the 777 primary flight control system. In: Iyer, R., Morganti, M., Fuchs, W.K., Gligor, V. (eds.) Dependable Computing for Critical Applications (DCCA-5), pp. 3–17. IEEE CS Press, Los Alamitos (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Powell, D., Arlat, J., Deswarte, Y., Kanoun, K. (2011). Tolerance of Design Faults. In: Jones, C.B., Lloyd, J.L. (eds) Dependable and Historic Computing. Lecture Notes in Computer Science, vol 6875. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24541-1_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24541-1_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24540-4

  • Online ISBN: 978-3-642-24541-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics