Failure detection and consensus in the crash-recovery model

Aguilera, Marcos Kawazoe; Chen, Wei; Toueg, Sam

doi:10.1007/s004460050070

Failure detection and consensus in the crash-recovery model

Original articles
Published: April 2000

Volume 13, pages 99–125, (2000)
Cite this article

Distributed Computing Aims and scope Submit manuscript

Marcos Kawazoe Aguilera¹,
Wei Chen² &
Sam Toueg¹

328 Accesses
102 Citations
6 Altmetric
Explore all metrics

Summary. We study the problems of failure detection and consensus in asynchronous systems in which processes may crash and recover, and links may lose messages. We first propose new failure detectors that are particularly suitable to the crash-recovery model. We next determine under what conditions stable storage is necessary to solve consensus in this model. Using the new failure detectors, we give two consensus algorithms that match these conditions: one requires stable storage and the other does not. Both algorithms tolerate link failures and are particularly efficient in the runs that are most likely in practice – those with no failures or failure detector mistakes. In such runs, consensus is achieved within \(3 \delta\) time and with 4 n messages, where \(\delta\) is the maximum message delay and n is the number of processes in the system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Author information

Authors and Affiliations

Department of Computer Science, Cornell University, Ithaca, NY 14853-7501, USA (e-mail: {aguilera,sam}@cs.cornell.edu) , , , , , , US
Marcos Kawazoe Aguilera & Sam Toueg
Oracle Corporation, One Oracle Drive, Nashua, NH 03062, USA (e-mail: weichen@us.oracle.com) , , , , , , US
Wei Chen

Authors

Marcos Kawazoe Aguilera
View author publications
You can also search for this author in PubMed Google Scholar
Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Sam Toueg
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Received: May 1998 / Accepted: November 1999

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aguilera, M., Chen, W. & Toueg, S. Failure detection and consensus in the crash-recovery model. Distrib Comput 13, 99–125 (2000). https://doi.org/10.1007/s004460050070

Download citation

Issue Date: April 2000
DOI: https://doi.org/10.1007/s004460050070

Key words:Fault tolerance – Failure detection – Consensus – Process crash – Process recovery – Asynchronous systems – Stable storage

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Failure detection and consensus in the crash-recovery model

Access this article

Similar content being viewed by others

The missing piece: a distributed system-level diagnosis model for the implementation of unreliable failure detectors

A Closer Look at Fault Tolerance

Weak Failures: Definitions, Algorithms and Impossibility Results

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Navigation

Failure detection and consensus in the crash-recovery model

Access this article

Similar content being viewed by others

The missing piece: a distributed system-level diagnosis model for the implementation of unreliable failure detectors

A Closer Look at Fault Tolerance

Weak Failures: Definitions, Algorithms and Impossibility Results

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation