Fault Management

Chandra Verma, Dinesh

doi:10.1007/978-0-387-89009-8_6

Dinesh Chandra Verma²

1046 Accesses

Abstract

A fault in the computer system is the failure of a component which prevents the computer systems from operating normally. As the computer system operates, it may experience faults due to a variety of reasons. Each fault would generate some type of alerts or error messages to be reported in the monitoring infrastructure. These monitored alert messages will be stored in the management database that is responsible for fault management.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

E. Manoel, M.J. Nielsen, A. Salahshour, S. Sampath, and S. Sudarshanan, Problem Determination using Self-Managing Autonomic Technology, IBM Redbook Number SG-24-6665-00, June 2005.
Google Scholar
OASIS Web Services Distributed Management Working Group Common Base event Specification, October 2003.
Google Scholar
IBM Support Assistant, http://www.ibm.com/software/support/isa/
T. Acorn and Walden, S., SMART: Support management automated reasoning technology for Compaq customer service. In Proceedings of the Tenth National Conference Conference on Artificial Intelligence. MIT Press, Cambridge, 1992.
Google Scholar
M. Steinder and A.S. Sethi, A Survey of fault localization techniques in computer networks, Science of Computer Programming, Special Edition on Topics in System Administration, 53(2): 165–194, November 2004.
MathSciNet MATH Google Scholar
A. Ganek and T. Corbi, The dawning of the autonomic computing era, Autonomic Computing . IBM Systems Journal, 42(1): 5–18, 2003.
Google Scholar
A.T. Bouloutas, S.B. Calo, A. Finkel, and I. Katzela, Distributed fault identification in telecommunication networks, Journal of Network and Systems Management, 3(3): 295–312, 1995.
Google Scholar
S. Brugnoni, R. Manione, E. Montariolo, E. Paschetta, and L. Sisto, An expert system for real time diagnosis of the Italian telecommunications network, In: H.G. Hegering, Y. Yemini (Eds.), Integrated Network Management III, North-Holland, Amsterdam, 1993.
Google Scholar
G. Forman, M. Jain, J. Martinka, M. Mansouri-Samani, and A. Snoeren, Automated end-to-end system diagnosis of networked printing services using model based reasoning, In: Ninth International Workshop on Distributed Systems: Operations and Management, University of Delaware, Newark, DE, October 1998, pp. 142–154 [87].
Google Scholar
R.D. Gardner and D.A. Harle, Alarm correlation and network fault resolution using the Kohonen self-organizing map, In: Proceedings of IEEE GLOBECOM, Toronto, Canada, September 1997.
Google Scholar
P. Hong and P. Sen, Incorporating non-deterministic reasoning in managing heterogeneous network faults, Integrated Network Management II, North-Holland, Amsterdam, 1991, pp. 481–492.
Google Scholar
C. Joseph, J. Kindrick, K. Muralidhar, and T. Toth-Fejel, MAP fault management expert system, In: B. Meandzija, J. Westcott (Eds.), Integrated Network Management I, North-Holland, Amsterdam, 1989, pp. 627–636 [68].
Google Scholar
S. Katker, A modeling framework for integrated distributed systems fault management, Proceedings of the IFIP/IEEE International Conference on Distributed Platforms, Dresden, Germany, 1996, pp. 187–198.
Google Scholar
S. Katker and K. Geihs, A generic model for fault isolation in integrated management systems, Journal of Network and Systems Management, 5(2): 109–130, 1997.
Article Google Scholar
I. Katzela and M. Schwartz, Schemes for fault identification in communication networks, IEEE/ACM Transactions on Networking, 3(6): 733–764, 1995.
Article Google Scholar
S. Kliger, S. Yemini, Y. Yemini, D. Ohsie, and S. Stolfo, A coding approach to event correlation, Proceedings of Integrated Network Managemen, Chapman and Hall, London, 1995, pp. 266–277 [86].
Google Scholar
L. Lewis, A case-based reasoning approach to the resolution of faults in communications networks, In: Proceedings of Integrated Network Management III, North-Holland, Amsterdam, 1993, pp. 671–681 [36].
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, P.O.Box 704, Yorktown Heights, NY 10598, USA
Dr Dinesh Chandra Verma

Authors

Dr Dinesh Chandra Verma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dinesh Chandra Verma .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chandra Verma, D. (2009). Fault Management. In: Principles of Computer Systems and Network Management. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-89009-8_6

Download citation

DOI: https://doi.org/10.1007/978-0-387-89009-8_6
Published: 25 May 2009
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-89008-1
Online ISBN: 978-0-387-89009-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics