Skip to main content
Log in

The Failure Discovery problem

  • Published:
Mathematical systems theory Aims and scope Submit manuscript

Abstract

We define a simple variant of the Byzantine agreement (BA) problem, called the Failure Discovery (FD) problem, that roughly speaking, amounts to reaching BA provided that no failures are discovered. We show how a protocol for FD can be extended to one for BA, with no message overhead in the failure-free runs. We also show that, for so-calledbenign failures, if the FD protocol satisfies an additional property, the message-preserving extension to a BA protocol can be accomplished with minimal time overhead in the failure-free runs. Our results show that FD is a useful building block for BA; indeed, it has been used in this way in a companion paper (Hadzilacos and Halpern, 1993).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Amdur, E. S., S. M. Weber, and V. Hadzilacos. On the Message Complexity of Binary Byzantine Agreement Under Crash Failures.Distributed Computing,5:175–186, 1992.

    Article  MATH  Google Scholar 

  • Attiya, H., N. A. Lynch, and N. Shavit. Are Wait-Free Algorithms Fast? InProc. 31st Symp. on Foundations of Computer Science, pp. 55–64, October 1990.

  • Bar-Noy, A., D. Dolev, C. Dwork, and H. R. Strong. Shifting Gears: Changing Algorithms on the Fly To Expedite Byzantine Agreement. InProc. 6th ACM Symp. on Principles of Distributed Computing, pp. 42–51, August 1987.

  • Bernstein, P. A., V. Hadzilacos, and N. Goodman.Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading, MA, 1987.

    Google Scholar 

  • Chandra, T. D., and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. InProc. 10th ACM Symp. on Principles of Distributed Computing, pp. 325–340, August 1991.

  • Dolev, D. The Byzantine Generals Strike Again.Journal of Algorithms,3(1): 14–30, January 1982.

    Article  MathSciNet  MATH  Google Scholar 

  • Dwork, C. Strong Verifiable Secret Sharing. InProc. 4th Internat. Workshop on Distributed Algorithms, pp. 213–227. Lecture Notes in Computer Science, Vol. 486. Springer-Verlag, Berlin, 1990.

    Google Scholar 

  • Eager, D. L., and K. C. Sevcik. Achieving Robustness in Distributed Database Systems.A CM Transactions on Database Systems,8(3): 354–381, September 1983.

    Article  Google Scholar 

  • Fischer, M. J. The Consensus Problem in Unreliable Distributed Systems. Research Report RR-273, Department of Computer Science, Yale University, June 1983.

  • Hadzilacos, V. Issues of Fault-Tolerance in Concurrent Computations. Ph.D. dissertation, Aiken Computation Laboratory, Harvard University, June 1984.

  • Hadzilacos, V. On the Relationship Between the Atomic Commitment and Consensus Problems. Workshop on Fault Tolerant Distributed Computing, March 17–19, 1986, Pacific Grove, CA. (Proceedings published as Lecture Notes in Computer Science, Vol. 448 (B. Simons and A. Spector, eds.), Springer-Verlag, Berlin 1990.)

    Google Scholar 

  • Hadzilacos, V., and J. Y. Halpern. Message-Optimal Protocols for Byzantine Agreement.Mathematical Systems Theory, this issue, pp. 41–102, 1993.

  • Halpern, J. Y., and Y. Moses. Knowledge and Common Knowledge in a Distributed Environment.Journal of the ACM,37(3):549–587, 1990. (Preliminary version inProc. 3rd ACM Symp. on Principles of Distributed Computing, pp. 50–61, August 1984.)

    Article  MathSciNet  MATH  Google Scholar 

  • Lamport, L. The Weak Generals Problem.Journal of the ACM,30(3):668–676, July 1983.

    Article  MathSciNet  MATH  Google Scholar 

  • Lamport, L., and M. J. Fischer. Byzantine Generals and Transaction Commit Protocols. Technical Report Op. 62, SRI International, April 1982.

  • Lamport, L., R. Shostak, and M. Pease. The Byzantine Generals Problem.ACM Transactions on Programming Languages and Systems,4(3):382–401, July 1982.

    Article  MATH  Google Scholar 

  • Moses, Y., and O. Waarts. Coordinate Traversal:(t + l)-Round Byzantine Agreement in Polynomial Time. InProc. 29th Symp. on Foundations of Computer Science, pp. 246–255, October 1988.

  • Neiger, G., and S. Toueg. Automatically Increasing the Fault-Tolerance of Distributed Algorithms.Journal of Algorithms,11(3): 374–419, September 1990.

    Article  MathSciNet  MATH  Google Scholar 

  • Pease, M., R. Shostak, and L. Lamport. Reaching Agreement in the Presence of Faults.Journal of the ACM,27(2):228–234, April 1980.

    Article  MathSciNet  MATH  Google Scholar 

  • Ricciardi, A. M., and K. P. Birman. Using Process Groups To Implement Failure Detection in Asynchronous Environments. InProc. 10th ACM Symp. on Principles of Distributed Computing, pp. 341–351, August 1991.

  • Skeen, M. D. Crash Recovery in a Distributed Database System. Ph.D. dissertation, Department of Electrical Engineering and Computer Science, University of California at Berkeley, 1982.

  • Srikanth, T. K., and S. Toueg. Simulating Authenticated Broadcasts To Derive Simple Fault-Tolerant Algorithms.Distributed Computing,2:80–94, 1987.

    Article  Google Scholar 

  • Zwaenepoel, W. Protocols for Large Data Transfers over Local Area Networks. InProc. 9th Data Communications Symposium, pp. 22–32, Sept. 1985.

Download references

Author information

Authors and Affiliations

Authors

Additional information

The work of V. Hadzilacos was supported, in part, by a grant from the Natural Sciences and Engineering Research Council of Canada.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hadzilacos, V., Halpern, J.Y. The Failure Discovery problem. Math. Systems Theory 26, 103–129 (1993). https://doi.org/10.1007/BF01187075

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01187075

Keywords

Navigation