New Approaches in System-Level Diagnosis
The concept of system-level diagnosis for fault diagnosis in multi-processor systems was introduced more than two decades ago. This approach is based on mutual tests conducted by the system processors, rather than circuit- level testing done by an external tester. At first, the research of system-level diagnosis concentrated on the study of uniquely diagnosable systems, and various characterizations for synthesis of such systems under several models of test results interpretations and faults types were presented.
Later on, new directions and aspects evolved from the classic concept of uniquely diagnosable systems. Efforts have been to improve some of its deficiencies, such as the limited degree of diagnosability or the large number of test links required. Researchers have suggested more practical models for diagnosable systems on one hand and, on the other hand, tried to generalize and unify the characterizations of uniquely diagnosable systems for various models of interpretations of test results. As a result of these new approaches, other classes of diagnosable systems (or diagnosability measures) have been introduced and characterized.
The diagnosability and the diagnosis problems have also been addressed quite extensively in recent years. Polynomial time algorithms for the diagnosability problem of some diagnosable system classes have been introduced. Many polynomial time diagnosis algorithms, some of them optimal, have also been introduced in the last few years for several classes of diagnosable systems. These include centralized algorithms to be done on a supervising processor and distributed algorithms to be run on the system processors themselves.
This survey starts by giving a background on the concept of system-level diagnosis and the classic uniquely diagnosable class and then concentrates on alternative classes of diagnosable systems, emphasizing those that were introduced in the last few years. This paper then describes recent developments in the diagnosability and diagnosis areas and discusses future possibilities.
Unable to display preview. Download preview PDF.
- P.K. Lala, Fault-tolerant and Fault-testable Hardware Design, Prentice-Hall International Inc., London, 1985.Google Scholar
- D.K. Pradhan, Fault Tolerant Computing, Theory and Techniques, Englewood Cliffs, NJ: Prentice Hall, 1986.Google Scholar
- A.D. Friedman and L. Simoncini, System-Level Fault Diagnosis, Computer Magazine 13, March 1980, pp. 47–53.Google Scholar
- Characterization IEEE Trans. Comput., Vol. C-23, 1974, pp. 86–88.Google Scholar
- T. Kohda, On one step diagnosable systems containing at most t faulty units, Systems, Computers, Controls, Vol. 9, No. 5, 1978.Google Scholar
- G. Sullivan, A Polynomial Time Algorithm for Fault Diagnosability, Annu. Symp. Foundations Comput. Sci., 1984, pp. 148–156.Google Scholar
- C.L. Yang and G.M. Masson, A generalization of hybrid faulty diagnosability, IEEE Symp. Fault-Tolerant Comput., 1985., pp. 36–41.Google Scholar
- A.T. Dahbura and G.M. Masson, Self implicating structures for diagnosable systems, IEEE Symp. Fault-Tolerant Comput., 1983, pp. 332–335.Google Scholar
- M.L. Blount, Probabilistic treatment of diagnosis in diigital systems, in Proc. 1975 Symp. Fault Tolerant Compt. June 1975, pp. 72–77.Google Scholar
- A.K. Somani, V.K. Agarwal and D. Avis, A generalized theory for system level diagnosis, IEEE Trans. Comput., Vol. C-36, 1987, pp. 538–546.Google Scholar
- A.K. Somani, Permanent fault detection under a hybrid fault situation, Technical Report EE-FTCL-89–02, Department of Electrical Engineering, University of Washington, Seattle, WA 98195.Google Scholar
- A.D. Friedman, A new measure of digital system diagnosis, IEEE Symp. Fault-Tolerant Comput., 1975, pp. 167–169.Google Scholar
- S. Huang, J. Xu and T. Chen, Characterization and design of sequentially t-diagnosable systems, IEEE Symp. Fault Tolerant Comput., 1989, pp. 554–559.Google Scholar
- A. Kavianpour and A.D. Friedman, Efficient design of easily diagnosable systems, Proc. 3rd USA-Japan Computer Conf., IEEE, 1978, pp. 251–257.Google Scholar
- C.L. Yang and G.M. Masson, An efficient algorithm for multiprocessor fault diagnosis using the comparison approach, IEEE Symp. Fault-Tolerant Comput., 1986, pp. 238–243.Google Scholar
- O. Peleg and A.K. Somani, A theory for diagnosis of large fault sets and its application to hypercubes, Submitted to IEEE Trans. Comput.Google Scholar
- K. Nakajima, A new approach to system diagnosis, Proc. 19th Annu. Allerton Conf. Commun., Contr. and Comput., 1981, pp. 697–706.Google Scholar
- J.G. Kuhl and S.M. Reddy, Fault diagnosis in fully distributed systems, IEEE Symp. Fault-Tolerant Comput., 1981, pp. 100–105.Google Scholar
- A.K. Somani and V.K. Agarwal, Distributed syndrome decoding for regular interconnected structures, IEEE Symp. Fault-Tolerant Cornput., 1989, pp. 70–77.Google Scholar
- D.M. Blough, G.F. Sullivan and G.M. Masson, Fault diagnosis for sparsely interconnected multiprocessor systems, IEEE Symp. Fault-Tolerant Comput., 1989, pp. 62–69.Google Scholar