Towards a theory of replicated processing
In the N-Modular Redundancy (NMR) approach, a computation is made reliable by executing it on several computers, and determining its results by a decision algorithm. This paper investigates a formal approach to the use of NMR in replicated distributed systems, for which it introduces a notion of correctness based on consistency with their non-replicated counterpart, and a local correctness criterion. We discuss how a replicated system component may be implemented by N base copies, a majority of which is non-faulty. The formal approach sheds light on the necessity of coordinating the copies and on the requirements they should satisfy; in particular the difficulty of replicating synchronous communication is pointed out. A practical approach is also briefly examined and shown to be consistent with the formal model.
Inside every replicated system there is a non-replicated system trying to get out.
- Avizienis, A., Kelly, J.K.J., “Fault tolerance by design diversity: concepts and experiments”, IEEE Computer, vol. 17, no. 8, pp. 67–80, Aug. 1984.
- Bird, R. S., “The promotion and accumulation strategies in transformational programming”, ACM Transactions on Programming Languages and Systems, vol. 6, no. 4, Oct. 1984.
- Cooper, E, “Replicated distributed programs”, Proc. of the 10th ACM Sym. on Operating Systems Principles, pp. 63–78, Washington, Dic. 1985.
- Goldberg, J., “SIFT: A provable fault-tolerant computer for aircraft flight control”, Inform. Processing 80 Proc. IFIP Congr., pp. 151–156, Tokyo, Japan, Oct. 1980.
- Hoare, C.A.R., “Communicating sequential processes”, Prentice Hall International, 1985.
- Koutny, M., and Mancini, L., “Synchronizing events in replicated computations”, Technical Report TR/237, Computing Laboratory, University of Newcastle upon Tyne, June 1987 (to appear in The Journal of Systems and Software).
- Lamport, L., “The implementation of reliable distributed multiprocess sustems”, Computer Networks, pp. 95–114, vol. 2, no. 2, May 1978. CrossRef
- Lamport, L., “Time, clocks and the ordering of events in a distributed system”, Comm. ACM, vol. 21, no. 7, pp. 558–565, July 1978. CrossRef
- Lamport, L., Shostak, R., Pease, M., “The Byzantine Generals problem”, ACM Transactions on Programming Languages and Systems, pp. 382–401, vol. 4, no. 3, July 1982. CrossRef
- Lyons, R.E., Vanderkulk, W., “The use of triple-modular redundancy to improve computer reliability”, IBM Journal of Research and Development, pp. 200–209, vol. 6, no. 2, Apr. 1962.
- Mancini, L., “Modular redundancy in a message passing system”, IEEE Trans. Software Eng., pp. 79–86, vol. SE-12, no. 1, Jan. 1986.
- Mancini, L., Koutny, M., “Formal specification of N-modular redundancy”, 1986 ACM Computer Science Conference, pp. 199–204, Cincinnati, Ohio, Feb. 1986.
- Mancini, L., Pappalardo, G., “The Join algorithm: ordering messages in replicated systems”, Safecomp '86, pp. 51–55, Sarlat, France, Oct. 1986.
- Mancini, L., Pappalardo G., “On resolving nondeterminism in replicated distributed systems”, IFIP Conf. on Distributed Processing, Amsterdam, The Netherlands, Oct. 1987.
- Mancini, L., Pappalardo G., “Proving correctness properties of a replicated synchronous program”, to appear in The Computer Journal.
- Mancini, L., Shrivastava, S.K., “Exception handling in replicated systems with voting”, 16th Int. Conf. on Fault Tolerant Computing, pp. 384–389, Vienna, Austria, July 1986.
- Melliar-Smith, P.M., Schwartz, R., “Formal specification and mechanical verification of SIFT: a fault-tolerant flight control system”, IEEE Trans. on Computers, vol. C-31, no. 7, pp. 616–630, July 1982.
- Schneider, F.B., “Synchronization in distributed programs”, ACM Transactions on Programming Languages and Systems, vol. 4, no. 2, pp. 125–148, Apr. 1982. CrossRef
- Schneider, F.B., “The state machine approach”, in Paul, M., and Siegert, H.J. (eds.), Distributed systems — methods and tools for specification, an advanced course, LNCS vol. 190, pp. 444–454, Springer-Verlag, 1985.
- Towards a theory of replicated processing
- Book Title
- Formal Techniques in Real-Time and Fault-Tolerant Systems
- Book Subtitle
- Proceedings of a Symposium Warwick, UK, September 22–23, 1988
- pp 175-192
- Print ISBN
- Online ISBN
- Series Title
- Lecture Notes in Computer Science
- Series Volume
- Series ISSN
- Springer Berlin Heidelberg
- Copyright Holder
- Additional Links
- Industry Sectors
- eBook Packages
To view the rest of this content please follow the download PDF link above.