Advertisement

System-Level Diagnosis: A Perspective for the Third Decade

  • Anton T. Dahbura

Abstract

This paper gives an overview of twenty years of achievement in system-level diagnosis and examines the antinomy of this flourishing theoretical research area that has yet to have any apparent practical impact in an era of unforetold technical advances. The potentially important role of system-level diagnosis is discussed relative to future multicomputer systems.

Keywords

Fault Detection Fault Diagnosis Diagnosable System Multiprocessor System Diagnosis Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    F.P. Preparata, G. Metze, and R.T. Chien, “On the connection assignment problem of diagnosable systems,” IEEE Trans. Electron. Comput., vol. EC-16, pp. 848–854, Dec. 1967.Google Scholar
  2. [2]
    S.L. Hakimi and A.T. Amin, “Characterization of the connection assignment of diagnosable systems,” IEEE Trans. Comput., vol. C-23, pp. 86–88, Jan. 1974.CrossRefGoogle Scholar
  3. [3]
    F.J. Allan, T. Kameda, and S. Toida, “An approach to the diagnosability analysis of a system,” IEEE Trans. Computers, vol. C-23, pp. 1040–1042, Oct. 1975.Google Scholar
  4. [4]
    T. Kameda, S. Toida, and F.J. Allan, “A diagnosing algorithm for networks,” Information and Control, vol. 29, pp. 141–148, 1975.CrossRefGoogle Scholar
  5. [5]
    A.M. Corluhan and S.L. Hakimi, “On an algorithm for identifying faults in a t -diagnosable system,” in Proc. 1976 Conf. on Inf. Sci. and Sys., The Johns Hopkins University, pp. 370–375, April. 1976.Google Scholar
  6. [6]
    A.T. Dahbura and G.M. Masson, “An O(n 2.5) fault identification algorithm for diagnosable systems,“ IEEE Trans. Comput., vol. C-33, pp. 486–492, June 1984.CrossRefGoogle Scholar
  7. [7]
    R.F. Madden, “On fault-set identification in some system level diagnostic models,” Science Institute, Univ. of Iceland, technical report, Oct. 1978, and personal correspondence, Jan. 1983.Google Scholar
  8. [8]
    II. Fujiwara and K. Kinoshita, “On the computational complexity of system diagnosis,” IEEE Trans. Comput., vol. C-27, pp. 881–885, Oct. 1978.CrossRefGoogle Scholar
  9. [9]
    G.G.L. Meyer and G.M. Masson, “An efficient fault diagnosis algorithm for symmetric multiple processor architectures,” IEEE Trans. Comput., vol. C27, pp. 1059–1063, Nov. 1978.Google Scholar
  10. [10]
    S. Mallela, “On diagnosable systems with simple algorithms,” in Proc. 1980 Conf. on Inf. Sci. and Sys., Princeton Univ., pp. 545–549, March 1980.Google Scholar
  11. [11]
    K.Y. Chwa and S.L. Hakimi, “On fault identification in diagnosable systems,” IEEE Trans. on Comput., vol. C-30, pp. 414–422, June 1981.CrossRefGoogle Scholar
  12. [12]
    S.L. Hakimi and K.Y. Chwa, “Schemes for fault tolerant computing: a comparison of modularly redundant and t-diagnosable systems”, Information and Control, vol. 49, pp. 212–238, June 1981.CrossRefGoogle Scholar
  13. [13]
    A.T. Dahbura, G.M. Masson, and C.L. Yang, “Self-implicating structures for diagnosable systems,” IEEE Trans. Computers, vol. C-34, pp. 718–723, Aug. 1985.CrossRefGoogle Scholar
  14. [14]
    G. Sullivan, “A polynomial time algorithm for fault diagnosability,” in Proc. 25th Ann. Symp. on Found. of Comp. Sci., IEEE Comput. Soc. Publ., October 1984, pp. 148–155.Google Scholar
  15. [15]
    P. Maestrini, “Complexity aspects of system diagnosis,” in Proc. 17th Ann. Allerton Conf. Commun,Contr., Comput., Allerton House, Monticello, IL, Oct. 1979, pp. 329–338.Google Scholar
  16. [16]
    A. Kavianpour and A.D. Friedman, “Efficient design of easily diagnosable systems,” in Proc. 3rd USA-Japan Comput. Conf.,1978, pp. 251–257.Google Scholar
  17. [17]362-365
    P. Ciompi and L. Simoncini, “Analysis and optimal design of self-diagnosable systems with repair”, IEEE Trans. Comput., vol. C-28, May 1979, pp. .Google Scholar
  18. [18]934-937
    U. Manber, “System diagnosis with repair,” IEEE Trans. Comput., vol. C-29, Oct. 1980, pp. .Google Scholar
  19. [19]121-133
    S. Karunanithi and A.D. Friedman, “Analysis of digital systems using a new measure of system diagnosis”, IEEE Trans. Comput., vol. C-28, Feb. 1979, pp. .Google Scholar
  20. [20]639-643
    C.-L. Yang, G.M. Masson, and R.A. Leonetti, “On fault isolation and identification in tl/t1 -diagnosable systems,” IEEE Trans. Comput., vol. C-35, July 1986, pp. .Google Scholar
  21. [21]234-240
    S.L. Hakimi and K. Nakajima, “On adaptive system diagnosis,” IEEE Trans. Comput., vol. C-33, pp. , March 1984.Google Scholar
  22. [22]
    P.M. Blecher, “On a logical problem,” Discrete Math., vol. 43, 1983, pp. 107–110.CrossRefGoogle Scholar
  23. [23]
    A.D. Friedman and L. Simoncini, “System-level diagnosis,” IEEE Computer, March 1980, pp. 47–53.Google Scholar
  24. [24]
    F. Saheban and A.D. Friedman, “Diagnostic and computational reconfiguration in multiprocessor systems,” in Proc. ACM Ann. Conl, Dec. 1978, pp. 68–78.Google Scholar
  25. [25]
    J.E. Smith, “Universal system diagnosis algorithms,” IEEE Trans. Comput., vol. C-28, May 1979, pp. 374–378.CrossRefGoogle Scholar
  26. [26]
    S.N. Maheshwari and S.L. Hakimi, “On models for diagnosable systems and probabilistic fault diagnosis,” IEEE Trans. Comput., vol. C-25, March 1976, pp. 228–236.CrossRefGoogle Scholar
  27. [27]
    A.T. Dahbura, “An efficient algorithm for identifying the most likely fault set in a probabilistically diagnosable system,” IEEE Trans, Comput., vol. C-36, April 1986, pp. 354–356.CrossRefGoogle Scholar
  28. [28]
    G. Sullivan, “The complexity of system-level fault diagnosis and diagnosability,” Ph.D. thesis, Yale University, 1986.Google Scholar
  29. [29]
    M.L. Blount, “Probabilistic treatment of diagnosis in digital systems,” in Proc. 1975 Symp. Fault Tolerant Comput., IEEE Comput. Soc. Publ., June 1975, pp. 72–77.Google Scholar
  30. [30]
    L. Simoncini and A.D. Friedman, “Incomplete fault coverage in modular multiprocessor systems,” in Proc. ACM Ann. Con f., Dec. 1978, pp. 210–216.Google Scholar
  31. [31]
    F. Barsi, “Probabilistic syndrome decoding in self-diagnosable digital systems,” Digital Processes, vol. 7, 1981, pp. 33–46.Google Scholar
  32. [32]
    F. Barsi, F. Grandoni, and P. Maestrini, “A theory of diagnosability of digital systems,” IEEE Trans. Comput., vol. C-25, pp. 585–593, June 1976.CrossRefGoogle Scholar
  33. [33]
    J.G. Kuhl and S.M. Reddy, “Fault-diagnosis in fully distributed systems,” in Proc. 1980Int. Symp. on Fault Tolerant Comput., IEEE Comput. Soc. Publ., June 1980, pp. 100–105.Google Scholar
  34. [34]
    S. Mallela and G.M. Masson, “Diagnosable systems for intermittent faults,” IEEE Trans. Comput., vol. C-27, pp. 560–566, June 1978.CrossRefGoogle Scholar
  35. [35]
    S. Mallela and G.M. Masson, “Diagnosis without repair for hybrid fault situations,” IEEE Trans. Comput., vol. C-29, June 1980, pp. 461–470.CrossRefGoogle Scholar
  36. [36]
    C.-L. Yang and G.M. Masson, “A fault identification algorithm for t i - diagnosable systems,“ in Proc. 1985 Int. Symp. on Fault Tolerant Comput., IEEE Comput. Soc. Publ., June 1985, pp. 78–83.Google Scholar
  37. [37]
    M. Malek, “A comparison connection assignment for diagnosis of multiprocessor systems,” in Proc. 1980Int. Symp. on Fault Tolerant Comput., IEEE Comput. Soc. Publ., June 1980, pp. 31–36.Google Scholar
  38. [38]
    K.-Y. Chwa and S.L. Hakimi, “Schemes for fault-tolerant computing: a comparison of modularly redundant and t-diagnosable systems,” Info. and Control, vol. 49, 1981, pp. 212–238.CrossRefGoogle Scholar
  39. [39]
    A.T. Dahbura, K.K. Sabnani, and L.L. King, “The comparison approach to multiprocessor fault diagnosis,” IEEE Trans. Comput., vol. C-36, March 1987, pp. 373–378.CrossRefGoogle Scholar
  40. [40]
    J. Maeng and M. Malek, “A comparison connection assignment for self-diagnosis of multiprocessor systems,” in Proc. 1981 Int. Symp. Fault Tolerant Comput., IEEE Comput. Soc. Publ., June 1981, pp. 173–175.Google Scholar
  41. [41]
    D.K. Pradhan, Fault-Tolerant Computing Theory and Techniques. Englewood Cliffs, NJ: Prentice-Hall, 1986.Google Scholar
  42. [42]
    D.P. Siewiorek and R.S. Swarz, The Theory and Practice of Reliable System Design. Digital Equipment Press, 1982.Google Scholar
  43. [43]
    Eds, “Computers that are ‘never’ down,” IEEE Spectrum, vol. 22, April 1985, pp. 46–54.Google Scholar
  44. [44]
    A.T. Dahbura, K.K. Sabnani, and W.J. Hery, “Performance analysis of a fault detection scheme in multiprocessor systems,” in Proc. 1987 ACM SIGMETRIGS Conf. on Meas. and Model. of Comp. Sys., ACM Press, May 1987, pp. 143–154.Google Scholar
  45. [45]
    A.T. Dahbura and K.K. Sabnani, “A distributed algorithm for system-level diagnosis,” AT&T Conference on Interconnection and Communication Issues in Future Systems, May 1986, pp. 43–45.Google Scholar
  46. [46]
    J. Gray, “Why do computers stop and what can be done about it?”, in Proc. 5th Symp. on Rel. in Dist. Software and Database Sys.,IEEE Comp. Soc. Publ., Jan. 1986, pp. 3–12.Google Scholar
  47. [47]
    J.A. Bondy and U.S.R. Murty, Graph Theory with Applications. New York: Elsevier North Holland, Inc., 1976.Google Scholar
  48. [48]
    M.R. Garey and D.S. Johnson, Computers and Intractability. San Francisco, CA: W.H. Freeman, 1979.Google Scholar

Copyright information

© Plenum Press, New York 1988

Authors and Affiliations

  • Anton T. Dahbura
    • 1
  1. 1.AT&T Bell LaboratoriesMurray HillUSA

Personalised recommendations