How and Why Computer Systems Fail

Birman, Kenneth P.

doi:10.1007/978-1-4471-2416-0_9

Kenneth P. Birman²

Part of the book series: Texts in Computer Science ((TCS))

3196 Accesses

Abstract

Before jumping into the question of how to make systems reliable, it will be useful to briefly understand the reasons that distributed systems fail. In this chapter we discuss some of the thinking around failure: a surprisingly rich and varied technical topic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bartlett, J.F.: A nonstop kernel In: Proceedings of the Eighth ACM Symposium on Operating Systems Principles, Pacific Grove, CA, December 1981, pp. 22–29. ACM Press, New York (1981)
Chapter Google Scholar
Birman, K.P., van Renesse, R.: Software for reliable networks. Sci. Am. 274(5), 64–69 (1996)
Article Google Scholar
Borr, A., Wilhelmy, C.: Highly available data services for UNIX client/server networks: Why fault-tolerant hardware isn’t the answer. In: Banatre, M., Lee, P. (eds.) Hardware and Software Architectures for Fault Tolerance. Lecture Notes in Computer Science, vol. 774, pp. 385–404. Springer, Berlin (1994)
Chapter Google Scholar
Chilaragee, R.: Top five challenges facing the practice of fault tolerance. In: Banatre, M., Lee, P. (eds.) Hardware and Software Architectures for Fault Tolerance. Lecture Notes in Computer Science, vol. 774, pp. 3–12. Springer, Berlin (1994)
Google Scholar
Clarke, R., Knake, R.: Cyber War: The Next Threat to National Security and What to Do About It. HarperCollins e-books (April 20, 2010)
Google Scholar
Cristian, F.: Synchronous and asynchronous group communication. Commun. ACM 39(4), 88–97 (1996)
Article Google Scholar
Gibbs, B.W.: Software’s chronic crisis. Sci. Am. (1994)
Google Scholar
Gray, J.: A census of tandem system availability between 1985 and 1990. Technical Report 90.1, Tandem Computer Corporation, September (1990)
Google Scholar
Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques. Morgan Kaufmann, San Mateo (1993)
MATH Google Scholar
Gray, J., Bartlett, J., Horst, R.: Fault tolerance in tandem computer systems. In: Avizienis, A., Kopetz, H., Laprie, J.C. (eds.) The Evolution of Fault-Tolerant Computing. Springer, Berlin (1987)
Google Scholar
Gray, J., Helland, P., Shasha, D.: Dangers of replication and a solution. In: ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 1996
Google Scholar
Hunker, J.: Creeping Failure: How We Broke the Internet and What We Can Do to Fix It. McClelland and Stewart, Toronto (2011). Reprint edition (September 27). ISBN-10: 0771040245
Google Scholar
Peterson, I.: Fatal Defect: Chasing Killer Computer Bugs. Time Books/Random House, New York (1995)
Google Scholar
Vogels, W.: The private investigator. Technical Report, Department of Computer Science, Cornell University, April (1996)
Google Scholar
Vogels, W., Re, C.: WS-membership—failure management in a Web-Services World. In: 12th International World Wide Web Conference, Budapest, Hungary, May 2003
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Cornell University, Ithaca, NY, USA
Kenneth P. Birman

Authors

Kenneth P. Birman
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Birman, K.P. (2012). How and Why Computer Systems Fail. In: Guide to Reliable Distributed Systems. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-2416-0_9

Download citation

DOI: https://doi.org/10.1007/978-1-4471-2416-0_9
Publisher Name: Springer, London
Print ISBN: 978-1-4471-2415-3
Online ISBN: 978-1-4471-2416-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics