Abstract
A distributed system consists of several independent processing components that interact with each other via an interconnecting communication link network consisting of communication components. Distributed computing refers to the algorithmic controlling of the distributed system’s processing components by means of a distributed program in order to reach a collective goal, that is, to provide a certain service. Unfortunately, the components of literally every system are naturally imperfect and therefore prone to failures that may render the system unable to provide the service. In order to be able to tolerate the failure of some components, that is, to keep the service available despite these failures, the system must be equipped with redundancy in space and time. The former refers to redundant components that take over the part played by failed components. The latter refers to the additional overhead required to manage these components. Fault-tolerant distributed computing refers to the algorithmic controlling of the distributed system’s components to provide the desired service despite the presence of certain failures in the system by exploiting redundancy in space and time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2012 Vieweg+Teubner Verlag | Springer Fachmedien Wiesbaden
About this chapter
Cite this chapter
Storm, C. (2012). Fault Tolerance in Distributed Computing. In: Specification and Analytical Evaluation of Heterogeneous Dynamic Quorum-Based Data Replication Schemes. Vieweg+Teubner Verlag. https://doi.org/10.1007/978-3-8348-2381-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-8348-2381-6_2
Publisher Name: Vieweg+Teubner Verlag
Print ISBN: 978-3-8348-2380-9
Online ISBN: 978-3-8348-2381-6
eBook Packages: Computer ScienceComputer Science (R0)