Advertisement

Distributed Computing

, Volume 4, Issue 2, pp 69–80 | Cite as

Fault-tolerant atomic computations in an object-based distributed system

  • Mustaque Ahamad
  • Partha Dasgupta
  • Richard J. LeBlancJr.
Article

Abstract

A distributed system can support fault-tolerant applications by replicating data and computation at nodes that have independent failure modes. We present a scheme called parallel execution threads (PET) which can be used to implement fault-tolerant computations in an object-based distributed system. In a system that replicates objects, the PET scheme can be used to replicate a computation by creating a number of parallel threads which execute with different replicas of the invoked objects. A computation can be completed successfully if at least one thread does not encounter any failed nodes and its completion preserves the consistency of the objects. The PET scheme can tolerate failures that occur during the execution of the computation as long as all threads are not affected by the failures. We present the algorithms required to implement the PET scheme and also address some performance issues.

Key words

Fault-tolerant computing Atomicity Distributed systems and replication 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ahamad M, Dasgupta P, LeBlanc R, Wilkes T.: Fault-tolerant computing in object based distributed operating systems. In: Proc 6th Symp on Reliability in Distributed Systems, March 1987Google Scholar
  2. 2.
    Avizienis A: Then-version approach to fault-tolerant software. IEEE Trans Software Eng 11 (12): 1491–1501 (1985)Google Scholar
  3. 3.
    Bernabéu Aubán JM, Hutto PW, Khalidi MYA, Ahamad M, Appelbe WF, Dasgupta P, LeBlanc RJ, Ramachandran U: The architecture ofRa: a kernel forClouds. In Proc 22nd Annu Hawaii Int Conf on System Sciences, January 1989Google Scholar
  4. 4.
    Bernstein PA, Goodman N: An algorithm for concurrency control and recovery in replicated distributed databases. ACM Trans Database Syst 9(4):596–615 (1984)Google Scholar
  5. 5.
    Birman K, Joseph T, Raeuchle R, El Abbadi A: Implementing fault-tolerant distributed objects. IEEE Trans Software Eng 11(6):502–508 (1985)Google Scholar
  6. 6.
    Cooper E: Replicated distributed programs. In: Proc 10th ACM Symp on Operating Systems Principles, December 1985Google Scholar
  7. 7.
    Dasgupta P, LeBlanc RJ, Appelbee W: TheClouds distributed operating system. In: Proc Int Conf on Distributed Systems, June 1988Google Scholar
  8. 8.
    Garcia Molina H: Elections in a distributed computing system. IEEE Trans. Comput C-31(1):48–59 (1982)Google Scholar
  9. 9.
    Gifford D: Weighted voting for replicated data. In: Proc 7th Symp on Operating Systems (Pacific Grove, California). ACM, December 1979Google Scholar
  10. 10.
    Ng TP, Shi SSB: Replicated transactions. In: Proc 9th Int Conf on Distributed Computing Systems, pp 474–480. IEEE, June 1989Google Scholar
  11. 11.
    Oki B, Liskov B: Viewstamped replication: a general primary copy method to support highly-available distributed systems. In: Proc 7th Symp on Principles of Distributed Computing, August 1988Google Scholar
  12. 12.
    Ramachandran U, Ahamad M, Khalidi MY: Unifying synchronization and data transfer in maintaining coherence of distributed shared memory. In: Proc Int Conf on Parallel Processing, August 1989Google Scholar
  13. 13.
    Stonebreaker M: Concurrency control and consistency of multiple copies of data in distributed INGRES. IEEE Trans Software Eng 5(3):188–194 (1979)Google Scholar
  14. 14.
    Yap KS, Jalote P, Tripathi S: Fault tolerant remote procedure calls. In: 8th Int Conf on Distributed Computing, June 1988Google Scholar

Copyright information

© Springer-Verlag 1990

Authors and Affiliations

  • Mustaque Ahamad
    • 1
  • Partha Dasgupta
    • 1
  • Richard J. LeBlancJr.
    • 1
  1. 1.School of Information and Computer ScienceGeorgia Institute of TechnologyAtlantaUSA

Personalised recommendations