Towards a theory of replicated processing

  • Luigi V. Mancini
  • Giuseppe Pappalardo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 331)

Abstract

In the N-Modular Redundancy (NMR) approach, a computation is made reliable by executing it on several computers, and determining its results by a decision algorithm. This paper investigates a formal approach to the use of NMR in replicated distributed systems, for which it introduces a notion of correctness based on consistency with their non-replicated counterpart, and a local correctness criterion. We discuss how a replicated system component may be implemented by N base copies, a majority of which is non-faulty. The formal approach sheds light on the necessity of coordinating the copies and on the requirements they should satisfy; in particular the difficulty of replicating synchronous communication is pointed out. A practical approach is also briefly examined and shown to be consistent with the formal model.

Inside every replicated system there is a non-replicated system trying to get out.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [AK]
    Avizienis, A., Kelly, J.K.J., “Fault tolerance by design diversity: concepts and experiments”, IEEE Computer, vol. 17, no. 8, pp. 67–80, Aug. 1984.Google Scholar
  2. [B]
    Bird, R. S., “The promotion and accumulation strategies in transformational programming”, ACM Transactions on Programming Languages and Systems, vol. 6, no. 4, Oct. 1984.Google Scholar
  3. [C]
    Cooper, E, “Replicated distributed programs”, Proc. of the 10th ACM Sym. on Operating Systems Principles, pp. 63–78, Washington, Dic. 1985.Google Scholar
  4. [G]
    Goldberg, J., “SIFT: A provable fault-tolerant computer for aircraft flight control”, Inform. Processing 80 Proc. IFIP Congr., pp. 151–156, Tokyo, Japan, Oct. 1980.Google Scholar
  5. [H]
    Hoare, C.A.R., “Communicating sequential processes”, Prentice Hall International, 1985.Google Scholar
  6. [KM]
    Koutny, M., and Mancini, L., “Synchronizing events in replicated computations”, Technical Report TR/237, Computing Laboratory, University of Newcastle upon Tyne, June 1987 (to appear in The Journal of Systems and Software).Google Scholar
  7. [L1]
    Lamport, L., “The implementation of reliable distributed multiprocess sustems”, Computer Networks, pp. 95–114, vol. 2, no. 2, May 1978.CrossRefGoogle Scholar
  8. [L2]
    Lamport, L., “Time, clocks and the ordering of events in a distributed system”, Comm. ACM, vol. 21, no. 7, pp. 558–565, July 1978.CrossRefGoogle Scholar
  9. [LSP]
    Lamport, L., Shostak, R., Pease, M., “The Byzantine Generals problem”, ACM Transactions on Programming Languages and Systems, pp. 382–401, vol. 4, no. 3, July 1982.CrossRefGoogle Scholar
  10. [LV]
    Lyons, R.E., Vanderkulk, W., “The use of triple-modular redundancy to improve computer reliability”, IBM Journal of Research and Development, pp. 200–209, vol. 6, no. 2, Apr. 1962.Google Scholar
  11. [M]
    Mancini, L., “Modular redundancy in a message passing system”, IEEE Trans. Software Eng., pp. 79–86, vol. SE-12, no. 1, Jan. 1986.Google Scholar
  12. [MK]
    Mancini, L., Koutny, M., “Formal specification of N-modular redundancy”, 1986 ACM Computer Science Conference, pp. 199–204, Cincinnati, Ohio, Feb. 1986.Google Scholar
  13. [MP1]
    Mancini, L., Pappalardo, G., “The Join algorithm: ordering messages in replicated systems”, Safecomp '86, pp. 51–55, Sarlat, France, Oct. 1986.Google Scholar
  14. [MP2]
    Mancini, L., Pappalardo G., “On resolving nondeterminism in replicated distributed systems”, IFIP Conf. on Distributed Processing, Amsterdam, The Netherlands, Oct. 1987.Google Scholar
  15. [MP3]
    Mancini, L., Pappalardo G., “Proving correctness properties of a replicated synchronous program”, to appear in The Computer Journal.Google Scholar
  16. [MS]
    Mancini, L., Shrivastava, S.K., “Exception handling in replicated systems with voting”, 16th Int. Conf. on Fault Tolerant Computing, pp. 384–389, Vienna, Austria, July 1986.Google Scholar
  17. [MSS]
    Melliar-Smith, P.M., Schwartz, R., “Formal specification and mechanical verification of SIFT: a fault-tolerant flight control system”, IEEE Trans. on Computers, vol. C-31, no. 7, pp. 616–630, July 1982.Google Scholar
  18. [S1]
    Schneider, F.B., “Synchronization in distributed programs”, ACM Transactions on Programming Languages and Systems, vol. 4, no. 2, pp. 125–148, Apr. 1982.CrossRefGoogle Scholar
  19. [S2]
    Schneider, F.B., “The state machine approach”, in Paul, M., and Siegert, H.J. (eds.), Distributed systems — methods and tools for specification, an advanced course, LNCS vol. 190, pp. 444–454, Springer-Verlag, 1985.Google Scholar

Copyright information

© Springer-Verlag 1988

Authors and Affiliations

  • Luigi V. Mancini
    • 1
    • 2
  • Giuseppe Pappalardo
    • 1
    • 3
  1. 1.Computing LaboratoryUniversity of Newcastle upon TyneUK
  2. 2.Dipartimento di InformaticaUniversità di PisaItaly
  3. 3.Università di Reggio CalabriaItaly

Personalised recommendations