On the duality of fault tolerant system structures

Preliminary version
  • S. K. Shrivastava
  • L. V. Mancini
  • B. Randell
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 309)


An examination of the structure of fault tolerant systems incorporating error recovery, and in particular backward error recovery, indicates a partitioning into two broad classes. Two canonical models, each representing a particular class of systems have been constructed. The first model incorporates objects and actions as the entities for program construction while the second model employs communicating processes. Applications in the areas such as office information and database systems typically use the first model while applications in the area of real time process control are usually based on the second model. The paper claims that the two models are duals of each other and presents arguments and examples to substantiate this claim, which is in effect, an extension of the earlier duality argument presented by Lauer and Needham. An interesting conclusion to be drawn from this study is that there is no inherent reason for selecting one model over the other, but that the choice is governed by the architectural features of the layer over which the system is to be constructed. A pleasing consequence has been the recognition that the techniques which have been developed for one model, turn out to have interesting and hitherto unexplored duals in the other model.

Index Terms

fault tolerance reliability distributed systems object based systems real time systems operating systems 


  1. [1]
    J.N. Gray, “An approach to decentralized computer systems”, IEEE Trans. on Soft. Eng., SE-12, No.6, 1986, pp.684–689.Google Scholar
  2. [2]
    H.C. Lauer and R.M. Needham, “On the duality of operating system structures”, Proc. of 2nd Int. Symp. on Operating Systems, INRIA, Oct. 1978; reprinted in ACM Operating System Review, Vol.13, April 1979, pp.3–19.Google Scholar
  3. [3]
    B. Liskov and R. Scheifler, “Guardians and actions: linguistic support for robust distributed programs”, ACM TOPLAS, Vol.5, No.3, 1983, pp.381–404.Google Scholar
  4. [4]
    A.Z. Spector et al, “Support for distributed transactions in the TABS prototype”, IEEE Trans. on Soft. Eng., SE-11, No.6, 1985, pp.520–530.Google Scholar
  5. [5]
    L. Svobodova, “Resilient distributed computing”, IEEE Trans. on Soft. Eng., SE-10, No.3, 1984, pp.257–268.Google Scholar
  6. [6]
    S.K. Shrivastava, “Structuring distributed systems for recoverability and crash resistance”, IEEE Trans. on Soft. Eng., SE-7, No.4, 1981, pp.436–447.Google Scholar
  7. [7]
    K.P. Birman, “Replication and fault tolerance in the ISIS system”, Proc. of 10th Symp. on Princ. of Op. Sys., ACM Operating Systems Review, 19, No.4, 1985, pp.79–86.Google Scholar
  8. [8]
    E. Nett et al, “Profemo: design and implementation of a fault tolerant distributed system architecture”, GMD Studien, No. 100, Tech. report, GMD, St. Augustine, 1985.Google Scholar
  9. [9]
    K. Eswaren et al, “On the notions of consistency and predicate locks in a database system”, CACM, 19, No. 11, 1976, pp.624–633.Google Scholar
  10. [10]
    E. Best and B. Randell, “A formal model of atomicity in asynchronous systems”, Acta Informatica, 16, 1981, pp.93–124.Google Scholar
  11. [11]
    C.T Davies, “Recovery semantics for a DB/DC system”, Proc. of ACM Nat. Conf., 1973, pp. 136–141.Google Scholar
  12. [12]
    D.J. Taylor, “Concurrency and forward recovery in atomic actions”, IEEE Trans. on Soft. Eng., SE-12, No. 1, 1986, pp.69–78.Google Scholar
  13. [13]
    C.A.R. Hoare, “Communicating sequential processes”, CACM, 21, No.8, 1978, pp.666–677.Google Scholar
  14. [14]
    D.L. Russell, “State restoration in systems of communicating processes”, IEEE Trans. on Soft. Eng., SE-6, No.2, 1980, pp.183–193.Google Scholar
  15. [15]
    K.H. Kim, “Approaches to mechanization of the conversation scheme based on monitors”, IEEE Trans. on Soft. Eng., SE-8, No.3, 1982, pp.189–197.Google Scholar
  16. [16]
    S.K. Shrivastava and J.P. Banatre, “Reliable resource allocation between unreliable processes”, IEEE Trans. on Soft. Eng., SE-4, No.3, 1978, pp.230–241.Google Scholar
  17. [17]
    W.G. Wood, “A decentralized recovery control protocol”, Digest of papers, FTCS-11, Portland, 1981, pp.159–164.Google Scholar
  18. [18]
    R. Koo and S. Toueg, “Checkpointing and rollback recovery for distributed systems”, IEEE Trans. on Soft. Eng., SE-13, No.1, 1987, pp.23–31.Google Scholar
  19. [19]
    B. Randell, “System structure for software fault tolerance”, IEEE Trans. on Soft. Eng., SE-1, No.2, 1975, pp.220–232.Google Scholar
  20. [20]
    K.M. Chandy and L. Lamport, “Distributed snapshots: determining global states of distributed systems”, ACM TOCS, 3, No.1, 1985, pp.63–75.Google Scholar
  21. [21]
    B. Randell, P.A. Lee and P.C. Treleaven, “Reliability issues in computing system design”, ACM Comp. Surveys, 10, No.2, 1978, pp.123–166.Google Scholar
  22. [22]
    T. Anderson and J.C. Knight, “A framework for software fault tolerance in real time systems”, IEEE Trans. on Soft. Eng., SE-9, No.3, 1983, pp.355–364.Google Scholar
  23. [23]
    C. Mohan and B.G. Lindsay, “Efficient commit protocols for the tree of processes model of distributed transactions”, Proc. of 2nd ACM Symp. on Princ. of Dist. Comp., Montreal, 1983, pp.76–88.Google Scholar
  24. [24]
    F. Cristian, “Exception handling and software fault tolerance”, IEEE Trans on Computers, C-31, No. 6, 1982, pp.531–540.Google Scholar
  25. [25]
    T. Anderson and P.A. Lee, “Fault Tolerance: Principles and Practice”, Prentice Hall, London, 1981.Google Scholar
  26. [26]
    R.H. Campbell and B. Randell, “Error recovery in asynchronous systems”, IEEE Trans. on Soft. Eng., SE-12, No.8, 1986, pp.811–826.Google Scholar
  27. [27]
    M. Sloman and J. Karmer, “Distributed systems and computer networks”, Prentice Hall, London, 1987.Google Scholar
  28. [28]
    J.E. Dobson and B. Randell, “Building reliable secure computing systems out of unreliable insecure components”, Proc. of IEEE Symp. on Security and Privacy, Oakland, CA, April 1986, pp.187–193.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1988

Authors and Affiliations

  • S. K. Shrivastava
    • 1
  • L. V. Mancini
    • 1
  • B. Randell
    • 1
  1. 1.Computing LaboratoryThe University of Newcastle upon TyneNewcastle upon TyneU.K.

Personalised recommendations