Replicated distributed processing
Replicated processing with voting provides a powerful means of constructing highly reliable computing systems. We will consider a functionally distributed computing system intended for real-time applications, where each functional module — a node — has been configured in an NMR (N-modular redundant) fashion. Such a system receives processing requests from ‘actuators’ (the entities that demand services) that require distributed processing at various nodes. The paper will discuss various approaches to scheduling computations to ensure that each processor of an NMR node processes input messages in an identical order. The concept of exception handling for voters will be developed to detect failures in the system.
Keywordsreplicated processing voting exception handling fault-tolerance distributed systems message passing Byzantine agreement
Unable to display preview. Download preview PDF.
- M. Pease, R. Shostak and L. Lamport, ‘Reaching agreement in the presence of faults', Journal of ACM, April 1980, pp.228–234.Google Scholar
- L. Lamport, R. Shostak and M. Pease, ‘The Byzantine Generals problem', ACM TOPLAS, July 1982, pp.382–401.Google Scholar
- R.D. Schlichting and F.B. Schneider, ‘Fail-Stop processors: an approach to designing fault tolerant computing systems', ACM TOCS, August 1983, pp. 222–238.Google Scholar
- F.B. Schneider, ‘Byzantine generals in action: implementing fail-stop processors’ ACM TOCS, May 1984.Google Scholar
- P.D. Ezhilchelvan and S.K. Shrivastava, ‘Task scheduling for replicated processing in distributed real time systems', Tech. Report, Computing Laboratory, University of Newcastle upon Tyne (to appear).Google Scholar
- F. Cristian, M. Aghili, R. Strong and D. Dolev, ‘Atomic broadcast: from simple message diffusion to Byzantine agreement', Digest of papers, FTCS-15, Ann Arbor, June 1985, pp.200–206.Google Scholar
- L. Mancini, ‘Modular redundancy in a message passing system', IEEE Trans. on Software Eng., Jan. 1986, pp.79–86.Google Scholar
- L. Mancini and S.K. Shrivastava, ‘Exception handling in replicated systems with voting', Digest of papers, FTCS-16, Vienna, July 1986, pp.384–389.Google Scholar
- L. Lamport and P.M. Melliar-Smith, ‘Synchronizing clocks in the presence of faults', Journal of ACM, Jan. 1985, pp.52–78.Google Scholar
- R. Rivest, A. Shamir and L. Adleman, ‘A method for obtaining digital signatures and public-key cryptosystems', Comm. ACM, Feb. 1978, pp.120–126.Google Scholar
- T.V. McTigue, ‘F/A-18 Software development — a case study', Proc. of AGARD Conf. on software for avionics, Sept. 1982 (AGARD — CPP — 330).Google Scholar
- A. Avizienis, ‘The N-version approach to fault-tolerant software', IEEE Trans. on Software Eng., Dec. 1985, pp.1491–1501.Google Scholar