Abstract
Fault-tolerant systems have found wide applications in military, industrial and commercial areas. Most of these systems are constructed by multiple-modular redundancy or error control coding techniques. They need some fault-tolerant specific components (such as voter, switcher, encoder, or decoder) to implement error-detecting or error-correcting functions. However, the problem of error detection, location or correction for fault-tolerance specific components themselves has not been solved properly so far. Thus, the dependability of a whole fault-tolerant system will be greatly affected. This paper presents a theory of robust fault-masking digital circuits for characterizing fault-tolerant systems with the ability of concurrent error location and a new scheme of dual-modular redundant systems with partially robust fault-masking property. A basic robust fault-masking circuit is composed of a basic functional circuit and an error-locating corrector. Such a circuit not only has the ability of concurrent error correction, but also has the ability of concurrent error location. According to this circuit model, for a partially robust fault-masking dual-modular redundant system, two redundant modules based on alternating-complementary logic consist of the basic functional circuit. An error-correction specific circuit named as alternating-complementary corrector is used as the error-locating corrector. The performance (such as hardware complexity, time delay) of the scheme is analyzed.
Similar content being viewed by others
References
Hu M. Computer Fault Tolerant Techniques. Beijing: China Railway Press, 1995 (in Chinese)
Lala P K. Fault-Tolerant and Fault-Testable Hardware Design. NJ: Prentice-Hall, 1985.
Rao T R N, Fujiwara E. Error-Control Coding for Computer Systems. NJ: Prentice Hall, 1989.
Lo J C, Kitakami M, Fujiwara E. Reliable logic circuits with byte error control codes — A feasibility study. InProc. IEEE 1996 Int. Symp. Defect and Fault Tolerance in VLSI Systems, Boston, Oct., 1996, pp.286–294.
Barbour A E, Wojcik A S. A general, constructive approach to fault-tolerant design using redundancy.IEEE Trans. Computers, 1989, 38(1): 15–29.
Lo J C. Highly reliable systems with differential built-in current sensors. InProc IEEE 1998 Int. Symp. Defect and Fault Tolerance in VLSI Systems, Austin, Nov., 1998, pp.261–269.
Schwab T E, Yau S S. An algebraic model of fault-masking logic circuits.IEEE Trans. Computers, 1983, 32(9): 809–825.
Stroud C E, Tannehill J K. Applying built-in self-test to majority voting fault tolerant circuits. InProc. 16th IEEE Test VLSI Symposium, Monterey, Apr., 1998, pp.303–308.
Gaitanis N. Design of TSC N-modular redundancy systems. InProc 2nd Int. Conf. Supercomputing, Vol. III, San Francisco, May, 1987, pp. 238–244.
Gaitanis N. The design of TSC error C/D circuits for SEC/DED codes.IEEE Trans. Computers, 1988, 37(3): 258–265.
Gaitanis N. The design of totally self-checking TMR fault-tolerant systems.IEEE Trans. Computers, 1988, 37(11): 1450–1454.
Gaitanis N, Paschalis A, Gizopoulos D, Kostarakis P. A new totally self-checking reconfigurable duplication system. InProc. Int. Workshop on Computer-Aided Design, Test, and Evaluation for Dependability, July, 1996, Beijing: International Academic Publishers, pp.264–268.
Jiang J H, Shi H N, Min Y H, Zhao X D. A novel NMR structure with concurrent output error location capability. InProc. 1999 Pacific Rim Int. Symp. Dependable Computing, Los Alamitos: IEEE Computer Society, Hong Kong, Dec., 1999, pp.32–39.
Jiang J H. Alternating-complementary locator and its use for error location in dual-modular redundancy with comparison structure.Journal of Computer Research and Development, 2001, 38(6): 754–764. (in Chinese)
Jiang J H, Hu M. The extended self-checking properties of alternating-complementary logic systems. InProc. Int. Workshop on Computer-Aided Design, Test, and Evaluation for Dependability, Beijing: International Academic Publishers, July, 1996, pp.258–263.
Lubaszewski M, Courtois B. A reliable fail-safe system.IEEE Trans. Computers, 1998, 47(2): 236–241.
Jiang J H, Min Y H, Shi H B. The concepts and basic structure of concurrent error location for digital circuits.Journal of Computer Research and Development, 2000, 37(5): 532–542 (in Chinese)
Jiang J H, Min Y H, Shi H B. A theory of extended fault-masking digital circuits with concurrent error detection capability. InProc. 6th Int. Conf. Computer-Aided Design and Computer Graphics, Shanghai: Wen Hui Publishers, Dec., 1999, pp.696–697.
Liu M Y, Zhang D X, Ye M L, Li Y. The Theory of High-Level Synthesis for Application Specific Integrated Circuits. Beijing: Beijing Institute of Technology Publishing House, 1998. (in Chinese)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the Shanghai Academic Young Teachers Foundation of the Shanghai Education Commission under Grant No.95QD18, and now supported by the National Natural Science Foundation of China under Grant Nos.90207021, 69733010 and 69873010.
JIANG JianHui received his B.E., M.E. and Ph.D. degrees in traffic information engineering and control from Shanghai Tiedao University (in April 2000, it was merged to Tongji University) in 1985, 1988, and 1999, respectively. In September 2000, he joined Fudan University as a part-time Postdoctoral Research Fellow. He is currently a professor of computer science and technology at Tongji University. His research interests include fault-tolerant computing, digital system design and testing, hardware and software codesign, performance evaluation of computer systems, and distributed computing.
MIN YingHua graduated from Mathematics Department, Jilin University in 1962, and visited some US universities for years. He is a professor of computer science at the Institute of Computing Technology, Chinese Academy of Sciences, a guest professor at Hunan University, and the Chair of Technical Committee on Fault-Tolerant Computing, China Computer Federation. His research interests include IC design and test, fault-tolerant computing, software reliability. He is a fellow of IEEE, and a member of ACM.
PENG ChengLian graduated from Department of Mathematics, Fudan University, in 1964. He is currently a professor of computer science and technology at Fudan University. His research interests include CAD of digital systems, fault-tolerant computing and embedded computing.
Rights and permissions
About this article
Cite this article
Jiang, J., Min, Y. & Peng, C. Fault-tolerant systems with concurrent error-locating capability. J. Comput. Sci. & Technol. 18, 190–200 (2003). https://doi.org/10.1007/BF02948884
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02948884