Abstract
Embedded, safety-critical systems often have requirements for incredibly small probabilities of failure, e.g. 10-9 for a one hour exposure. One often hears designers of safety-critical systems say: "We have to tolerate all credible faults".
However, the word "credible" in this assertion contrasts starkly with the word "incredibly" in the sentence before. In fact, there are faults and failures that most designers think can’t happen which actually can and do happen with probabilities far greater than the requirements allow. The well known Murphy’s Law states that: "If anything can go wrong, it will go wrong." When requirements limit failure probabilities to one-in-a-million or less, this should be re-written as: "If anything can’t go wrong, it will go wrong anyway."
There are a couple of factors that lead to designers erroneously thinking that certain faults and failures are impossible; when in fact, not only are they possible, but some are actually highly probable.
One factor is that the requirements are outside any designer’s experience, even when that experience includes that of colleagues. Using the literature seems like an obvious way of expanding one’s (virtual) experience. However, there are two problems with this. The first problem is that people who actually design safety-critical systems are rarely given enough time to keep current with the literature. The second problem is that the literature on actual occurrences of rare failure modes is almost nonexistent. Reasons for this include: people and organizations don’t want to admit they had a failure; designers feel that rare failure occurrence aren’t worth reporting; and, if designers aren’t given enough time to read literature, they certainly aren’t given enough time to write it. Take away: Designers should fight their management for time to keep current with the literature and designers should use every report of a rare failure as an opportunity to imagine other similar modes of failure.
The other factor that leads to designers erroneously thinking that certain faults and failures are impossible stems from abstraction. The complexity of modern safety critical systems requires some form of attraction. However, when designers limit their thinking to one level of extraction, certain faults and failures can seem impossible, but would clearly be seen as probable if one were to examine layers below that level of abstraction. For example, a designer thinking about electrical components would not include in their FMEA the possibility that one component (e.g. a diode) could transmogrify into another component (e.g. a capacitor). But, at a lower level of extraction, it can be seen that a crack through a diode die can create a capacitor. And, a crack is one of the most highly probable failure modes at the physical material level of obstruction.
Examples of rare but actually occurring failures will be given. These will include a number of Byzantine faults, component transmogrification, fault mode transformation (e.g. stuck at faults that aren’t so stuck), the dangers of self-inflicted shrapnel, component creation via emergent properties, "evaporating" software, and exhaustively tested software that still failed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Driscoll, K.R. (2010). Murphy Was an Optimist. In: Schoitsch, E. (eds) Computer Safety, Reliability, and Security. SAFECOMP 2010. Lecture Notes in Computer Science, vol 6351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15651-9_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-15651-9_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15650-2
Online ISBN: 978-3-642-15651-9
eBook Packages: Computer ScienceComputer Science (R0)