Abstract
Designing checks to detect or locate errors in the data is an important problem and plays an important role in the area of fault tolerance. Our checks are assumed to be of the simplest kind, i.e. a check can operate without any restriction on any non-empty subset of the set of data elements and can reliably detect up to one error in this subset. In this paper, we show how to design the data-check (DC) relationship. For the first time, we give a general procedure for designing checks to locate s errors, given any value for s. We also consider the problem of designing checks to detect s errors in the data. We give the first optimal construction for this problem. The procedure for designing the checks are simple and novel. One can also modify these constructions to produce uniform checks, i.e. checks which are identical and check the same number of data elements. We give procedures for obtaining such checks as well.
Recently, the problem of designing the DC relationship has attracted a lot of attention due to the important role it plays in the design of algorithm-based fault tolerant (ABFT) systems. In this paper, we illustrate the above problem in this context. ABFT schemes have been shown to be a natural paradigm for concurrent error detection/location in multiprocessor systems and systolic array computations. Banerjee and Abraham have shown that an ABPT scheme can be modeled as a tripartite graph consisting of processors (P), data (D) and checks(C). Our constructions can be used along with any general technique for designing fault tolerant PDC graphs, e.g. for designing unit systems [NA89] or for designing ud-systems [VJ91] etc.
This work was supported by DARPA/ONR under Contract no. N00014-88-K-0459.
This work was supported in part by ONR under Contract no. N00014-91-J-1199 and in part by AFOSR under Contract no. AFOSR-90-0144.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. A. Abraham et al., “Fault tolerance techniques for systolic arrays,” IEEE Computer, pp. 65–74, July 1987.
P. Banerjee et al., “An evaluation of system-level fault tolerance on the Intel hypercube multiprocessor,” in Proc. Int. Symp. Fault Tolerant Comput., Tokyo, pp. 362–367, June 1988.
P. Banerjee and J. A. Abraham, “Bounds on algorithm-based fault tolerance in multiple processor systems,” IEEE Trans. Comput., vol. C-35, pp. 296–306, Apr. 1986.
P. Banerjee and J. A. Abraham, “A probabilistic model of algorithm-based fault tolerance in array processors for real-time systems,” in Proc. Real-Time Systems Symp., pp. 72–78, 1986.
Y-H. Choi and M. Malek, “A fault tolerant FFT processor,” IEEE Trans. Comput, vol. 37, no. 5, pp. 617–621, May 1988.
Y-H. Choi and M. Malek, “A fault tolerant systolic sorter,” IEEE Trans. Comput, vol. 37, no. 5, pp. 621–624, May 1988.
D. Gu, D. J. Rosenkrantz, and S. S. Ravi, “Design and analysis of test schemes for algorithm-based fault tolerance,” in Proc. Int. Symp. Fault Tolerant Comput., pp. 106–113, Newcastle-upon-Tyne, U K., June 1990.
K.-H. Huang and J. A. Abraham, “Algorithm-based fault tolerance for matrix operations” IEEE Trans. Comput., vol. C-33, pp. 518–528, June 1984.
J.-Y. Jou and J. A. Abraham, “Fault tolerant matrix arithmetic and signal processing on highly concurrent computing structures,” Proc. IEEE, vol. 74, no. 5, pp. 732–741, May 1986.
J. Y. Jou and J. A. Abraham, “Fault tolerant FFT networks,” IEEE Trans. Comput., vol. 37, no. 5, pp. 548–561, May 1988.
F. T. Luk and H. Park, “An analysis of algorithm-based fault tolerance techniques,” in Proc. SPIE Adv. Alg. amp; Arch, for Signal Proc., vol. 696, pp. 222–228, Aug. 1986.
V. S. S. Nair and J. A. Abraham, “A model for the analysis of fault tolerant signal processing architectures,” in Proc. 32nd Int. Tech. Symp. of SPIE, San Diego, pp. 246–257, Aug. 1988.
V. S. S. Nair and J. A. Abraham, “A model for the analysis, design and comparison of fault-tolerant WSI architectures,” in Proc. Workshop on Wafer Scale Integration, Como, Italy, June 1989.
V. S. S. Nair and J. A. Abraham, “Hierarchical design and analysis of fault- tolerant multiprocessor systems using concurrent error detection,” in Int. Symp. Fault Tolerant Comput., Newcastle-upon-Tyne, U.K., pp. 130–137, June 1990.
A. L. N. Reddy and P. Banerjee, “Algorithm-based fault detection for signal processing applications,” IEEE Trans. Comput., vol. 39, pp. 1304–1308, Oct. 1990.
D. J. Rosenkrantz and S. S. Ravi, “Improved upper bounds for algorithm-based fault tolerance,” in Proc. 26th Allerton Conf. Comm. Cont. amp; Comput., Allerton, IL, pp. 388 - 397, Sept. 1988.
B. Vinnakota and N. K. Jha, “Diagnosability and diagnosis of algorithm-based fault tolerant systems,” in Proc. 32nd Midwest Symp. Circuits & Systems, Urbana, IL, pp. 28–31, Aug. 1989.
B. Vinnakota and N. K. Jha, “A dependence graph-based approach to the design of algorithm-based fault tolerant systems,” in Proc. Int. Symp. Fault Tolerant Comput., pp. 122–129, Newcastle-upon-Tyne, U.K., June 1990.
B. Vinnakota and N. K. Jha, “Design of multiprocessor systems for concurrent error detection and fault diagnosis,” in Proc. Int. Symp. Fault Tolerant Comput., Montreal, June 1991.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1991 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sitaraman, R., Jha, N.K. (1991). Optimal Design of Checks for Error Detection and Location in Fault Tolerant Multiprocessor Systems. In: Cin, M.D., Hohl, W. (eds) Fault-Tolerant Computing Systems. Informatik-Fachberichte, vol 283. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76930-6_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-76930-6_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-54545-3
Online ISBN: 978-3-642-76930-6
eBook Packages: Springer Book Archive