Seamless Paxos coordinators


The Paxos algorithm requires a single correct coordinator process to operate. After a failure, the replacement of the coordinator may lead to a temporary unavailability of the application implemented atop Paxos. So far, this unavailability has been addressed by reducing the coordinator replacement rate through the use of stable coordinator selection algorithms. We have observed that the cost of recovery of the newly elected coordinator’s state is at the core of this unavailability problem. In this paper we present a new technique to manage coordinator replacement that allows the recovery to occur concurrently with new consensus rounds. Experimental results show that our seamless approach effectively solves the temporary unavailability problem, its adoption entails uninterrupted execution of the application. Our solution removes the restriction that the occurrence of coordinator replacements is something to be avoided, allowing the decoupling of the application execution from the accuracy of the mechanism used to choose a coordinator. This result increases the performance of the application even in the presence of failures, it is of special importance to the autonomous operation of replicated applications that have to adapt to varying network conditions and partial failures.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


Gustavo M.D. Vieira was partially supported by CNPq grant 142638/2005-6. Luiz E. Buzato was partially supported by CNPq grant 473340/2009-7 and FAPESP grant 2009/06859-8.

The authors thank Prof. W. Zwaenepoel, and Olivier Cramieri, both from EPFL, Switzerland, for their support in the earlier stages of this research. We thank Daniel Cason for the support with the cluster management at IC-UNICAMP.

Gustavo M. D. Vieira

Vieira, G.M.D., Garcia, I.C. & Buzato, L.E. Seamless Paxos coordinators. Cluster Comput 17, 463–473 (2014).

  • Consensus
  • Failure detector
  • Fault tolerance
  • Paxos
  • Replication