Advertisement

Systematic and design diversity — Software techniques for hardware fault detection

  • Tomislav Lovric
Session 8: Software diversity
Part of the Lecture Notes in Computer Science book series (LNCS, volume 852)

Abstract

For the detection of hardware operational faults in most safe systems static redundancy is used. Thus, in the most simple case we have the well known Duplex System. If design fault detection is required, design diversity in the software has to be used, too. We suggest the combined utilization of so called systematic diversity and design diversity in a time-redundant system instead of the structural redundant Duplex System. For this purpose two diversly designed and systematically transformed variants of an application program are executed sequentially on the same processor. We call this new approach a Virtual Duplex System. In this paper we investigate the safety of a Virtual Duplex System.We propose the use of software diversity techniques (i.e. systematic diversity) to detect nearly all hardware faults in this system. Transient faults are effectively detected through the time redundancy and permanent faults by the new software diversity approach. In addition software design faults and even compiler-, library-, operating system- and underlying hardware design faults can be detected. The proposed software techniques are either new or never considered systematically for the detection of hardware faults in a general purpose system environment with design diversity.

As an example the new systematic diversity technique ‘simple register permutation’ was applied on different application programs by means of a simple heuristic. The technique was evaluated experimentally by injecting permanent hardware faults with the fault injection tool ProFI and measuring the safety of Virtual Duplex Systems. The results are compared to systems that do not use special fault detection (Simplex Systems) and Virtual Duplex Systems that use pure design diversity. The experiments show that even by simple systematic diversity most permanent hardware faults are detected.

Keywords

design faults operational faults fail-safe self-checking fault detection coverage relative test absolute test software implemented hardware-fault injection design diversity systematic diversity Virtual Duplex System 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AbTh 80.
    J. A. Abraham, S. M. Thatte: Test Generation for microprocessors; IEEE Transactions on Computers, Vol. C-29, No. 6, Juni 1980, pp. 429–441.Google Scholar
  2. AmKn 87.
    Paul E. Ammann, John C. Knight: Data diversity: an approach to software fault tolerance; FTCS-17, conf. proc., IEEE, 1987, pp. 122–126.Google Scholar
  3. ArLa 85.
    Jean Arlat, Jean-Claude Laprie: On the dependability evaluation of high safety sytems; FTCS-15, conf. proc., IEEE, 1987, pp. 318–323.Google Scholar
  4. Aviz 78a.
    Algirdas Avizienis: Fault-tolerance in computer systems; System reliability and integrity, Vol. 2, Infotech International, 1978, pp. 39–62Google Scholar
  5. BKLe 87.
    S. S. Brilliant, J. C. Knight, N. G. Leveson: The consistent comparison problem in n-version software; SIGSOFT software engineering notes, vol. 12, no. 1, acm, 1987, pp. 29–34.Google Scholar
  6. BKLe 90.
    Susan S. Brilliant, John C. Knight, Nancy G. Leveson: Analysis of Faults in an N-Version Software Experiment; IEEE Transactions on Software Engineering, Vol. 16, No. 2, February 1990, pp 238–247.CrossRefGoogle Scholar
  7. ChAv 78.
    Liming Chen, Algirdas Avizienis: N-Version Programming: A Fault-Tolerance Approach to the Reliability of Software Operations; FTCS-8, conf. proc., IEEE, 1978, pp. 3–9.Google Scholar
  8. CzSi 90.
    Edward W. Czeck, Daniel P. Siewiorek: Effects of Transient Gate-Level Faults on Program Behavior; FTCS-20, conf. proc., IEEE, 1990, pp. 236–243.Google Scholar
  9. Dück 93.
    H. Dücker: Ergebnisvalidierung und nebenläufige Hardwarefehlererkennung mittels systematisch erzeugter Diversität; Verlä\liche Informationssysteme, Vieweg, Braunschw. 1993, pp. 135–162.Google Scholar
  10. DüEc 92.
    H. Dücker, K. Echtle: Detection of design faults by diverse software in layered systems; 4th European workshop on dependable computing, EWDC-4, Prague, 1992 (available on request from the author).Google Scholar
  11. EHNi 92.
    K. Echtle, B. Hinz, T. Nikolov: On hardware fault diagnosis by diverse software; Hardware and software fault tolerance in parallel computing systems, Ellis Horwood Chichester 92, pp.313–325.Google Scholar
  12. HaGö 91.
    W. Hahn, M. Gössel: Pseudoduplication of floating point addition — a method of compiler generated checking of permanent harware faults; Conf. Proc. 9th anual IEE VLSI test symposium, 1991.Google Scholar
  13. Kell 89.
    J. P. Kelly; Current experiences with fault tolerant software design; dependability through diverse formal specifications?; 4. int. GI-ITG-GMA-Conf. “Fehlertolerierende Rechensysteme”, Informatik-Fachberichte 214, Springer-Verlag, Heidelberg, 1989, pp. 134–149.Google Scholar
  14. Kona 88.
    R. Konakovsky; Verfahren der vollständigen Fehlererkennung durch gezielten Einsatz von Diversität; Proze\rechner 1988, Informatik-Fachberichte 167, Springer 1988, pp. 281–290.Google Scholar
  15. Kop* 90.
    H. Kopetz, H. Kantz, G. Grünsteidl, P. Puschner, J. Reisinger: Tolerating Transient Faults in MARS; FTCS-20, conf. proc., IEEE, 1990, pp. 466–473.Google Scholar
  16. LaAl 88.
    Jaynarayan H. Lala, Linda S. Alger: Hardware and Software fault tolerance: a unified architectural approach; FTCS-18, conf. proc., IEEE, 1988, pp. 240–245.Google Scholar
  17. Lap* 87.
    J.C. Laprie, J. Arlat, C. Beounes, K. Kanoun, C. Hourtolle: Hardware-and software-fault tolerance: definition and analysis of architectural solutions; FTCS-17, conf. proc., IEEE, 1987, pp. 116–121.Google Scholar
  18. Lipt 91.
    Richard J. Lipton: New Directions in Testing; DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol. 2, 1991.Google Scholar
  19. LeKo 93.
    Günther Leber, Herman Kopez: Preliminary Results of the Validation of the MARS system by EMI Fault Injection; IEEE International Workshop on Fault and Error Injection for Dependability Validation of Computer Systems, 17–18 Juni, Göteborg, Schweden.Google Scholar
  20. LoEc 93.
    T. Lovric, K. Echtle: ProFI: Prozessor fault injection for dependability validation; IEEE Int. Workshop on Fault and Error Injection for Dependability Validation of Computer Systems, June 17–18, Göteborg, Sweden (available on request from the author).Google Scholar
  21. LoEc 94.
    T. Lovric, K. Echtle; Hardware and Software Fault Tolerance using Fail-Silent Virtual Duplex Systems; 1994 IEEE Workshop on Fault Tolerant and Distributed Systems, 13–14 June 1994, Texas.Google Scholar
  22. Lovr 93.
    Tomislav Lovric: Erkennung permanenter Hardwarefehler durch Entwurfsdiversität und systematische Diversität im Virtuellen Duplex-System; Universität Dortmund, interner Bericht 502 (available on request from the author).Google Scholar
  23. Made 93.
    H. Madeira, F. Moreira, M. Rela, P. Furtado, G. J. Silva: Pin-Level Fault Injection for Dependability Validation: Some Research Results at the University of Coimbra; IEEE Int. Workshop on Fault and Error Injection for Dependability Validation of Computer Systems, 17–18 Juni, Göteborg, Schweden.Google Scholar
  24. NeCa 87.
    Victor P Nelson, Bill D. Caroll: Reliability Modeling and General Redundancy Techniques; Tutorial: Fault Tolerant Computing, Chapter 2, IEEE, 1987, pp. 45–67.Google Scholar
  25. OYVi 92.
    Choong Gun Oh, Hee Yong Youn, Vijay K. Raj: Rearranged Hamming Checksum for Matrix Computations with Algorithm-Based Fault Tolerance; 1992 IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems, pp. 185–192.Google Scholar
  26. Parh 92.
    Behrooz Parhami: Optimal Algorithms for Exact, Inexact, and Approval Voting; Fault-tolerant computing symposium FTCS-22, conf. proc., IEEE, 1992, pp. 404–411.Google Scholar
  27. PaFu 82.
    H. H. Patel, L. Y. Fung: Concurrent Error Detection in ALU's by Recomputing with Shifted Operands; IEEE Transactions on computers, C-31, 1982, pp. 589–595Google Scholar
  28. PaFu 83.
    H. H. Patel, L. Y. Fung: Concurrent Error Detection in Multiply and Divide Arrays; IEEE Transactions on computers, C-32, 1983, pp. 417–422Google Scholar
  29. Rubi 90.
    Ronit Rubinfeld: A Mathematical Theorie of Self-Checking, Self Testing and Self Correcting Programms; doctoral thesis, Univ. Calif. Berkley, 1990.Google Scholar
  30. SGMc 87.
    Keith Scott, James W. Gault, David F. McAllister: Fault-tolerant software reliability modeling; Transactions on software engineering, vol. SE-13, no. 1, IEEE, 1987, pp. 3–14.Google Scholar
  31. Voge 89.
    Udo Voges: Software-Diversität und ihre Modellierung; Informatik Fachberichte 224, Springer '89.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1994

Authors and Affiliations

  • Tomislav Lovric
    • 1
  1. 1.Fachbereich InformatikUniversität DortmundDormund

Personalised recommendations