Adaptable fault tolerance for distributed process control using exclusively standard components
This paper describes an adaptable fault tolerance architecture for distributed process control which uses exclusively standard hardware, standard system software and standard protocols. It offers a quick and low cost solution to provide non-safety critical, technical facilities and plants with continuous service. Thereby a maximum of practicability for the application engineers is achieved. The architecture is composed from well known fault tolerance methods under the constraints of real-time requirements. The latitude of non-safety critical applications is carefully used to minimize the fault tolerance overhead. Because of the transparency of the fault tolerance each functional part of the process control, which is represented by an application task, can be implemented without regard to non-determinism and executing hosts. The configuration of a control system is easy and simply done by naming hosts, tasks and groups in a file, wherein every individual task has to be declared with the selected fault tolerance strategy.
It can be expected by a fault-tolerant system that reconfiguration, following a fault, is done automatically. The present system does more: it reintegrates repaired hosts automatically and re-establishes the redundant operation, while the entire system is working.
KeywordsFault Tolerance Configuration File Transient Fault Permanent Fault Application Task
Unable to display preview. Download preview PDF.
- K.Birman, R.Cooper, K.Marzullo: ISIS and META Projects Progress Report, 1990Google Scholar
- K.P. Birman: Reliable Enterprise Computing Systems, Lecture Notes in Comp. Science: HW and SW Architectures for Fault Tolerance, Springer 1994, pp. 140–150Google Scholar
- S.K. Shrivastava, G.N. Dixon, G.D. Parrington: An Overview of the Arjuna Distributed Programming System, IEEE Software, pp. 66–73, January 1991Google Scholar
- S.K. Shrivastava: Arjuna and Voltan: Case Studies in Building Fault-Tolerant Distributed Systems Using Standard Components, Lecture Notes in Comp. Science: HW and SW Architectures for Fault Tolerance, Springer 1994, pp. 218–226Google Scholar
- D. Powell (Editor): Delta-4: A generic Architecture for Dependable Distributed Computing. Research Reports ESPRIT, Project 818/2252, Springer 1991Google Scholar
- J. Bohne: Task-specific Fault Tolerance for Distributed Control Systems, Daimler-Benz Technical Report, May 1992Google Scholar
- J.C. Laprie (ed.) IFIP WG 10.4 (Dependable Computing and Fault Tolerance): Dependability: Basic Concepts and Terminology, Springer 1992Google Scholar
- F.B. Schneider: Byzantine Generals in Action: Implementing Fail-Stop Processors, ACM Transaction on Computer Systems, Vol.2, No.2, 5/1984Google Scholar
- H. Madeira, G. Quadros, J. Gabriel: Experimental Evaluation of a Set of Simple Error Detection Techniques, Microprocessing and Microprogramming No.30, 1990Google Scholar
- J.G. Silva, L.M. Silva, H. Madeira, J. Bernardino: Experimental Evaluation of the Fail-Silent Behavior in Computers Without Error Masking, FTCS-24, June 1994Google Scholar
- J. Karlson, P.Folkesson, J. Arlat, Y. Crouzet, G. Leber: Integration and Comparison of Three Physical Fault Injection Techniques, Esprit Basic Research Series: Predictably Dependable Computing Systems 1994Google Scholar
- H. Kopetz, H. Kants, G. Grünsteidl, P. Puschner, J. Reisinger: Tolerating Transient Faults in MARS, FTCS-20, June 1990Google Scholar
- J.G. Silva, L.M. Silva, H. Madeira, J. Bernadino: A Fault-Tolerant Mechanism for Simple Controllers, Dependable Computing — EDCC-1, October 1994Google Scholar
- L. Lamport: Time, clocks, and the ordering of events in a distributed system, Communications ACM 21, 7/1978Google Scholar