Advertisement

Failure Diagnosis of Complex Systems

  • Soila P. Kavulya
  • Kaustubh Joshi
  • Felicita Di Giandomenico
  • Priya Narasimhan
Chapter

Abstract

Failure diagnosis is the process of identifying the causes of impairment in a system’s function based on observable symptoms, i.e., determining which fault led to an observed failure. Since multiple faults can often lead to very similar symptoms, failure diagnosis is often the first line of defense when things go wrong - a prerequisite before any corrective actions can be undertaken. The results of diagnosis also provide data about a system’s operational fault profile for use in offline resilience evaluation. While diagnosis has historically been a largely manual process requiring significant human input, techniques to automate as much of the process as possible have significantly grown in importance in many industries including telecommunications, Internet services, automotive systems, and aerospace. This chapter presents a survey of automated failure diagnosis techniques including both model-based and model-free approaches. Industrial applications of these techniques in the above domains are presented, and finally, future trends and open challenges in the field are discussed.

Keywords

Cloud Provider Fault Injection Transient Fault Performance Counter Recurrent Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Soila P. Kavulya
    • 1
  • Kaustubh Joshi
    • 1
  • Felicita Di Giandomenico
    • 2
  • Priya Narasimhan
    • 1
  1. 1.Carnegie Mellon UniversityPittsburghUSA
  2. 2.ISTI DepartmentItalian National Research CouncilPisa Italy

Personalised recommendations