Towards a theory of replicated processing
Part of the Lecture Notes in Computer Science book series (LNCS, volume 331)
- First Online:
In the N-Modular Redundancy (NMR) approach, a computation is made reliable by executing it on several computers, and determining its results by a decision algorithm. This paper investigates a formal approach to the use of NMR in replicated distributed systems, for which it introduces a notion of correctness based on consistency with their non-replicated counterpart, and a local correctness criterion. We discuss how a replicated system component may be implemented by N base copies, a majority of which is non-faulty. The formal approach sheds light on the necessity of coordinating the copies and on the requirements they should satisfy; in particular the difficulty of replicating synchronous communication is pointed out. A practical approach is also briefly examined and shown to be consistent with the formal model.
Inside every replicated system there is a non-replicated system trying to get out.
Unable to display preview. Download preview PDF.
- [AK]Avizienis, A., Kelly, J.K.J., “Fault tolerance by design diversity: concepts and experiments”, IEEE Computer, vol. 17, no. 8, pp. 67–80, Aug. 1984.Google Scholar
- [B]Bird, R. S., “The promotion and accumulation strategies in transformational programming”, ACM Transactions on Programming Languages and Systems, vol. 6, no. 4, Oct. 1984.Google Scholar
- [C]Cooper, E, “Replicated distributed programs”, Proc. of the 10th ACM Sym. on Operating Systems Principles, pp. 63–78, Washington, Dic. 1985.Google Scholar
- [G]Goldberg, J., “SIFT: A provable fault-tolerant computer for aircraft flight control”, Inform. Processing 80 Proc. IFIP Congr., pp. 151–156, Tokyo, Japan, Oct. 1980.Google Scholar
- [H]Hoare, C.A.R., “Communicating sequential processes”, Prentice Hall International, 1985.Google Scholar
- [KM]Koutny, M., and Mancini, L., “Synchronizing events in replicated computations”, Technical Report TR/237, Computing Laboratory, University of Newcastle upon Tyne, June 1987 (to appear in The Journal of Systems and Software).Google Scholar
- [LV]Lyons, R.E., Vanderkulk, W., “The use of triple-modular redundancy to improve computer reliability”, IBM Journal of Research and Development, pp. 200–209, vol. 6, no. 2, Apr. 1962.Google Scholar
- [M]Mancini, L., “Modular redundancy in a message passing system”, IEEE Trans. Software Eng., pp. 79–86, vol. SE-12, no. 1, Jan. 1986.Google Scholar
- [MK]Mancini, L., Koutny, M., “Formal specification of N-modular redundancy”, 1986 ACM Computer Science Conference, pp. 199–204, Cincinnati, Ohio, Feb. 1986.Google Scholar
- [MP1]Mancini, L., Pappalardo, G., “The Join algorithm: ordering messages in replicated systems”, Safecomp '86, pp. 51–55, Sarlat, France, Oct. 1986.Google Scholar
- [MP2]Mancini, L., Pappalardo G., “On resolving nondeterminism in replicated distributed systems”, IFIP Conf. on Distributed Processing, Amsterdam, The Netherlands, Oct. 1987.Google Scholar
- [MP3]Mancini, L., Pappalardo G., “Proving correctness properties of a replicated synchronous program”, to appear in The Computer Journal.Google Scholar
- [MS]Mancini, L., Shrivastava, S.K., “Exception handling in replicated systems with voting”, 16th Int. Conf. on Fault Tolerant Computing, pp. 384–389, Vienna, Austria, July 1986.Google Scholar
- [MSS]Melliar-Smith, P.M., Schwartz, R., “Formal specification and mechanical verification of SIFT: a fault-tolerant flight control system”, IEEE Trans. on Computers, vol. C-31, no. 7, pp. 616–630, July 1982.Google Scholar
- [S2]Schneider, F.B., “The state machine approach”, in Paul, M., and Siegert, H.J. (eds.), Distributed systems — methods and tools for specification, an advanced course, LNCS vol. 190, pp. 444–454, Springer-Verlag, 1985.Google Scholar
© Springer-Verlag 1988