Frontiers of Computing Systems Research pp 317-368 | Cite as

# New Approaches in System-Level Diagnosis

## Abstract

The concept of system-level diagnosis for fault diagnosis in multi-processor systems was introduced more than two decades ago. This approach is based on mutual tests conducted by the system processors, rather than circuit- level testing done by an external tester. At first, the research of system-level diagnosis concentrated on the study of uniquely diagnosable systems, and various characterizations for synthesis of such systems under several models of test results interpretations and faults types were presented.

Later on, new directions and aspects evolved from the classic concept of uniquely diagnosable systems. Efforts have been to improve some of its deficiencies, such as the limited degree of diagnosability or the large number of test links required. Researchers have suggested more practical models for diagnosable systems on one hand and, on the other hand, tried to generalize and unify the characterizations of uniquely diagnosable systems for various models of interpretations of test results. As a result of these new approaches, other classes of diagnosable systems (or diagnosability measures) have been introduced and characterized.

The diagnosability and the diagnosis problems have also been addressed quite extensively in recent years. Polynomial time algorithms for the diagnosability problem of some diagnosable system classes have been introduced. Many polynomial time diagnosis algorithms, some of them optimal, have also been introduced in the last few years for several classes of diagnosable systems. These include centralized algorithms to be done on a supervising processor and distributed algorithms to be run on the system processors themselves.

This survey starts by giving a background on the concept of system-level diagnosis and the classic uniquely diagnosable class and then concentrates on alternative classes of diagnosable systems, emphasizing those that were introduced in the last few years. This paper then describes recent developments in the diagnosability and diagnosis areas and discusses future possibilities.

## Preview

Unable to display preview. Download preview PDF.

## References

- [1]C.L. Seitz,
*Concurrent VLSI architectures*, IEEE Trans. Comput., Vol. C-33, 1984, pp. 1247–1265.CrossRefGoogle Scholar - [2]P.K. Lala,
*Fault-tolerant and Fault-testable Hardware Design*, Prentice-Hall International Inc., London, 1985.Google Scholar - [3]D.K. Pradhan,
*Fault Tolerant Computing, Theory and Techniques*, Englewood Cliffs, NJ: Prentice Hall, 1986.Google Scholar - [4]F.P. Preparata, G. Metze and R.T. Chien,
*On the connection assignment problem of diagnosable systems*, IEEE Trans. Electr. Comput., Vol. EC-16, 1967, pp. 848–854.CrossRefGoogle Scholar - [5]F. Barsi, F. Grandoni and P. Maestrini,
*A theory of diagnosability without repairs*, IEEE Trans. Comput., Vol. C-25, 1976, pp. 585–593.MathSciNetCrossRefGoogle Scholar - [6]S. Mallela and G.M. Masson,
*Diagnosable systems for intermittent faults*, IEEE Trans. Comput., Vol. C-27, 1978, pp. 560–566.MathSciNetCrossRefGoogle Scholar - [7]A.D. Friedman and L. Simoncini,
*System-Level Fault Diagnosis*, Computer Magazine 13, March 1980, pp. 47–53.Google Scholar - [8]K.Y. Chwa and S.L. Hakimi,
*Schemes for fault tolerant computing: a comparison of modularly redundant and t-diagnosable systems*, Information and Control 49, 1981, pp. 212–238.MathSciNetzbMATHCrossRefGoogle Scholar - [9]S. Mallela and G.M. Masson,
*Diagnosis without repairs for hybrid fault situations*, IEEE Trans. Comput., Vol. C-29, 1980, pp. 461–470.MathSciNetCrossRefGoogle Scholar - [10]A.K. Somani and V.K. Agarwal,
*Diagnosis in hybrid fault situations under AIM and a unified t-characterization theorem*, Comput. Math. Applic. Vol. 13, No. 5/6, 1987, pp. 567–576.zbMATHCrossRefGoogle Scholar - [11]
*Characterization*IEEE Trans. Comput., Vol. C-23, 1974, pp. 86–88.Google Scholar - [12]F.J. Allan, T. Kameda and S. Toida,
*An approach to the diagnosabil-ity analysis of a system*, IEEE Trans. Comput., Vol. C-24, 1975, pp. 1040–1042.MathSciNetCrossRefGoogle Scholar - [13]T. Kohda,
*On one step diagnosable systems containing at most t faulty units*, Systems, Computers, Controls, Vol. 9, No. 5, 1978.Google Scholar - [14]G. Sullivan,
*A Polynomial Time Algorithm for Fault Diagnosability*, Annu. Symp. Foundations Comput. Sci., 1984, pp. 148–156.Google Scholar - [15]A.T. Dahbura and G.M. Masson,
*An*0(n^{2.5})*fault identification algorithm for diagnosable systems*, IEEE Trans. Comput., Vol. C-33, 1984, pp. 486–492.CrossRefGoogle Scholar - [16]C.L. Yang and G.M. Masson,
*A generalization of hybrid faulty diagnosability*, IEEE Symp. Fault-Tolerant Comput., 1985., pp. 36–41.Google Scholar - [17]A.T. Dahbura and G.M. Masson,
*Self implicating structures for diagnosable systems*, IEEE Symp. Fault-Tolerant Comput., 1983, pp. 332–335.Google Scholar - [18]S.N. Maheshwari and S.L. Hakimi,
*On models for diagnosable systems and probabilistic fault diagnosis*, IEEE Trans. Comput., Vol. C-25, 1976, pp. 228–236.MathSciNetCrossRefGoogle Scholar - [19]H. Fujiwara and K. Kinoshita,
*Connection assignment for probabilistic diagnosable systems*, IEEE Trans. Comput., Vol. C-27, 1978, pp. 280–283.MathSciNetCrossRefGoogle Scholar - [20]H. Fujiwara and K. Kinoshita,
*Some existence theorems for probabilistically diagnosable systems*, IEEE Trans. Comput., Vol. C-27, 1978, pp. 379–384.MathSciNetCrossRefGoogle Scholar - [21]M.L. Blount,
*Probabilistic treatment of diagnosis in diigital systems*, in Proc. 1975 Symp. Fault Tolerant Compt. June 1975, pp. 72–77.Google Scholar - [22]A.K. Somani, V.K. Agarwal and D. Avis,
*A generalized theory for system level diagnosis*, IEEE Trans. Comput., Vol. C-36, 1987, pp. 538–546.Google Scholar - [23]A.K. Somani,
*Permanent fault detection under a hybrid fault situation*, Technical Report EE-FTCL-89–02, Department of Electrical Engineering, University of Washington, Seattle, WA 98195.Google Scholar - [24]A.D. Friedman,
*A new measure of digital system diagnosis*, IEEE Symp. Fault-Tolerant Comput., 1975, pp. 167–169.Google Scholar - [25]S. Karunanithi and A.D. Friedman,
*Analysis of digital systems using a new measure of system diagnosis*, IEEE Trans. Comput., Vol. C-25, 1979, pp. 121–133.CrossRefGoogle Scholar - [26]S. Huang, J. Xu and T. Chen,
*Characterization and design of sequentially t-diagnosable systems*, IEEE Symp. Fault Tolerant Comput., 1989, pp. 554–559.Google Scholar - [27]A. Kavianpour and A.D. Friedman,
*Efficient design of easily diagnosable systems*, Proc. 3rd USA-Japan Computer Conf., IEEE, 1978, pp. 251–257.Google Scholar - [28]K.Y. Chwa and S.L. Hakimi,
*On fault identification in diagnosable systems*, IEEE Trans. Comput., Vol. C-30, 1981, pp. 414–422.MathSciNetCrossRefGoogle Scholar - [29]C.L. Yang, G.M. Masson and R.A. Leonetti,
*On fault isolation and identification in t*_{1}*/t*_{1}*-diagnosable systems*, IEEE Trans. Comput. Vol. C-35, 1986, pp. 639–643.CrossRefGoogle Scholar - [30]C.L. Yang and G.M. Masson,
*An efficient algorithm for multiprocessor fault diagnosis using the comparison approach*, IEEE Symp. Fault-Tolerant Comput., 1986, pp. 238–243.Google Scholar - [31]O. Peleg and A.K. Somani,
*A theory for diagnosis of large fault sets and its application to hypercubes*, Submitted to IEEE Trans. Comput.Google Scholar - [32]A.K. Somani,
*Sequential fault occurrence and reconfiguration in system level diagnosis*, IEEE Trans. Computers, vol. C-39, pp. 1472–1475 (1990).MathSciNetCrossRefGoogle Scholar - [33]K. Nakajima,
*A new approach to system diagnosis*, Proc. 19th Annu. Allerton Conf. Commun., Contr. and Comput., 1981, pp. 697–706.Google Scholar - [34]A.K. Somani, V.K. Agarwal and D. Avis,
*On the complexity of single fault set diagnosability and diagnosis problems*, IEEE Trans. Corn-put., Vol. C-38, 1989, pp. 195–201.MathSciNetCrossRefGoogle Scholar - [35]H. Fujiwara and K. Kinoshita,
*On the computational complexity of system diagnosis*, IEEE Trans. Comput., Vol. C-27, 1978, pp. 881–885.MathSciNetCrossRefGoogle Scholar - [36]G. Sullivan,
*An 0(t*^{3}+ |*E|) fault identification algorithm for diagnosable systems*, IEEE Trans. Comput., Vol. C-37, 1988, pp. 388–397.CrossRefGoogle Scholar - [37]G.G.L. Meyer,
*A diagnosis algorithm for the BGM system-level fault model*, IEEE Trans. Comput., Vol. C-33, 1984, pp. 756–758.CrossRefGoogle Scholar - [38]C.L. Yang and G.M. Masson,
*A fault identification algorithm for t*_{i}*diagnosable systems*, IEEE Trans. Computers, vol. C-35, pp. 503–510 (1986).CrossRefGoogle Scholar - [39]S.L. Hakimi and K. Nakajima,
*On adaptive system diagnosis*, IEEE Trans. Comput., Vol. C-33, 1984, pp. 234–240.MathSciNetCrossRefGoogle Scholar - [40]J.G. Kuhl and S.M. Reddy,
*Fault diagnosis in fully distributed systems*, IEEE Symp. Fault-Tolerant Comput., 1981, pp. 100–105.Google Scholar - [41]S.H. Hosseini, J.G. Kuhl and S.M. Reddy,
*Diagnosis algorithm for distributed computing systems*, IEEE Trans. Comput., Vol. C-33, 1984, pp. 223–233.CrossRefGoogle Scholar - [42]A.K. Somani and V.K. Agarwal,
*Distributed syndrome decoding for regular interconnected structures*, IEEE Symp. Fault-Tolerant Cornput., 1989, pp. 70–77.Google Scholar - [43]D.M. Blough, G.F. Sullivan and G.M. Masson,
*Fault diagnosis for sparsely interconnected multiprocessor systems*, IEEE Symp. Fault-Tolerant Comput., 1989, pp. 62–69.Google Scholar