Skip to main content
  • 1046 Accesses

Abstract

A fault in the computer system is the failure of a component which prevents the computer systems from operating normally. As the computer system operates, it may experience faults due to a variety of reasons. Each fault would generate some type of alerts or error messages to be reported in the monitoring infrastructure. These monitored alert messages will be stored in the management database that is responsible for fault management.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. E. Manoel, M.J. Nielsen, A. Salahshour, S. Sampath, and S. Sudarshanan, Problem Determination using Self-Managing Autonomic Technology, IBM Redbook Number SG-24-6665-00, June 2005.

    Google Scholar 

  2. OASIS Web Services Distributed Management Working Group Common Base event Specification, October 2003.

    Google Scholar 

  3. IBM Support Assistant, http://www.ibm.com/software/support/isa/

  4. T. Acorn and Walden, S., SMART: Support management automated reasoning technology for Compaq customer service. In Proceedings of the Tenth National Conference Conference on Artificial Intelligence. MIT Press, Cambridge, 1992.

    Google Scholar 

  5. M. Steinder and A.S. Sethi, A Survey of fault localization techniques in computer networks, Science of Computer Programming, Special Edition on Topics in System Administration, 53(2): 165–194, November 2004.

    MathSciNet  MATH  Google Scholar 

  6. A. Ganek and T. Corbi, The dawning of the autonomic computing era, Autonomic Computing . IBM Systems Journal, 42(1): 5–18, 2003.

    Google Scholar 

  7. A.T. Bouloutas, S.B. Calo, A. Finkel, and I. Katzela, Distributed fault identification in telecommunication networks, Journal of Network and Systems Management, 3(3): 295–312, 1995.

    Google Scholar 

  8. S. Brugnoni, R. Manione, E. Montariolo, E. Paschetta, and L. Sisto, An expert system for real time diagnosis of the Italian telecommunications network, In: H.G. Hegering, Y. Yemini (Eds.), Integrated Network Management III, North-Holland, Amsterdam, 1993.

    Google Scholar 

  9. G. Forman, M. Jain, J. Martinka, M. Mansouri-Samani, and A. Snoeren, Automated end-to-end system diagnosis of networked printing services using model based reasoning, In: Ninth International Workshop on Distributed Systems: Operations and Management, University of Delaware, Newark, DE, October 1998, pp. 142–154 [87].

    Google Scholar 

  10. R.D. Gardner and D.A. Harle, Alarm correlation and network fault resolution using the Kohonen self-organizing map, In: Proceedings of IEEE GLOBECOM, Toronto, Canada, September 1997.

    Google Scholar 

  11. P. Hong and P. Sen, Incorporating non-deterministic reasoning in managing heterogeneous network faults, Integrated Network Management II, North-Holland, Amsterdam, 1991, pp. 481–492.

    Google Scholar 

  12. C. Joseph, J. Kindrick, K. Muralidhar, and T. Toth-Fejel, MAP fault management expert system, In: B. Meandzija, J. Westcott (Eds.), Integrated Network Management I, North-Holland, Amsterdam, 1989, pp. 627–636 [68].

    Google Scholar 

  13. S. Katker, A modeling framework for integrated distributed systems fault management, Proceedings of the IFIP/IEEE International Conference on Distributed Platforms, Dresden, Germany, 1996, pp. 187–198.

    Google Scholar 

  14. S. Katker and K. Geihs, A generic model for fault isolation in integrated management systems, Journal of Network and Systems Management, 5(2): 109–130, 1997.

    Article  Google Scholar 

  15. I. Katzela and M. Schwartz, Schemes for fault identification in communication networks, IEEE/ACM Transactions on Networking, 3(6): 733–764, 1995.

    Article  Google Scholar 

  16. S. Kliger, S. Yemini, Y. Yemini, D. Ohsie, and S. Stolfo, A coding approach to event correlation, Proceedings of Integrated Network Managemen, Chapman and Hall, London, 1995, pp. 266–277 [86].

    Google Scholar 

  17. L. Lewis, A case-based reasoning approach to the resolution of faults in communications networks, In: Proceedings of Integrated Network Management III, North-Holland, Amsterdam, 1993, pp. 671–681 [36].

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dinesh Chandra Verma .

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Chandra Verma, D. (2009). Fault Management. In: Principles of Computer Systems and Network Management. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-89009-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-89009-8_6

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-89008-1

  • Online ISBN: 978-0-387-89009-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics