Abstract
Distributed real-time systems are subject to stricter fault-tolerance requirements than non-real time systems. This work presents an application of system-level diagnosis to a real-time distributed system as a first step in providing fault-tolerance. An existing algorithm for distributed system-level diagnosis, Adaptive_DSD, is converted to a real-time framework, establishing a deadline for the end-to-end diagnosis latency. Rate monotonic analysis is chosen as the framework for achieving real-time performance. The ADSD algorithm is converted into a set of independent periodic tasks running at each node, and a systematic procedure is used to assign priorities and deadlines to minimize the hard deadline of the diagnosis function. The resulting algorithm, Real-Time Adaptive Distributed System-Level Diagnosis (RT-ADSD), is fully compatible with a real-time environment, where both the processors and the network support fixed-priority scheduling. The RT-ADSD algorithm provides a useful first step in adding fault-tolerance to distributed real-time systems by quickly and reliably diagnosis node failures. The key results presented here include a framework for specifying real-time distributed algorithms and a scheduling model for analyzing them that accounts for many requirements of distributed systems, including network I/O, task jitter, and critical sections caused by shared resources.
This research is supported in party by the Office of Naval Research under Grant N00014-91-J-1304 and under a National Science Foundation Graduate Research Fellowship. Any opinions, findings, conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the Office of Naval Research or the National Science Foundation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bianchini, R. P., and Buskens, R. “An Adaptive Distributed System-Level Diagnosis Algorithm and its Implementation.” Proceedings of the IEEE 23rd International Symposium on Fault-Tolerant Computing, June 1991, pp. 222–229.
Ezhilchelvan, P. D. and de Lemos, R. “A Robust Group Membership Algorithm for Distributed Real-Time Systems.” Proceedings of IEEE Real-Time Systems Symposium, December 1990, pp. 173–179.
Liu, C. L., and Layland, J. W. “Scheduling Algorithms for Multi-Programming in a Hard Real-Time Environment.” Journal of the Association for Computing Machinery, 20(1), January 1973, pp. 46–61.
Lehoczky, J. P., Sha, L. and Ding, Y. “The Rate-Monotonic Scheduling Algorithm: Exact Characterization and Average Case Behavior.” Proceedings of IEEE Real-Time Systems Symposium, 1989, pp. 166–171.
Lehoczky, J. P. “Fixed Priority Scheduling of Periodic Task Sets with Arbitrary Deadlines.” Proceedings of IEEE Real-Time System Symposium, 1990, pp. 201–209.
Harbour, M. G., Klein, M. H. and Lehoczky, J. P. “Fixed Priority Scheduling of Periodic Tasks with Varying Execution Priority.” Proceedings of IEEE Real-Time Systems Symposium, 1991.
Preparata, F. P., Metze, G. and Chien, R. T. “On the connection Assignment Problem of Diagnosable Systems.” IEEE Transactions on Electronic Computing, EC-16(12), December 1967, pp. 848–854.
Hakimi, S. L., and Amin, A. T. “Characterization of Connection Assignment of Diagnosable Systems.” IEEE Transactions on Computers, C-23(1), January 1974, pp. 86–88.
Dahbura, A.T. “System-Level Diagnosis: A Perspective for the Third Decade.” Concurrent Computation: Algorithms, Architectures, Technologies, Plenum Publishing Corp., 1988, pp. 411–434.
Hakimi, S. L. and Schmeichel, E. F. “An Adaptive Algorithm for System Level Diagnosis.” Journal of Algorithms, 5, June 1984, pp. 526–530.
Hosseini, S. H., Kuhl, J. G., and Reddy, S. M. “A Diagnosis Algorithm for Distributed Computing Systems with Dynamic Failure and Repair.” IEEE Transactions on Computers, C-33(3), March 1984, pp. 223–233.
Bondy, A. and Murty, U. S. R. Graph Theory and Applications. Elsevier North Holland, Inc., New York, N.Y., 1976.
Sha, L., Rajkumar, R., and Lehoczky, J. P. “Priority Inheritance Protocols: An Approach to Real-Time Synchronization.” IEEE Transactions on Computers, September 1990.
Sprunt, B., Sha, L., and Lehoczky, J. P. “Aperiodic Task Scheduling for Hard Real-Time Systems.” The Journal of Real-Time Systems, 1, 1989, pp. 27–60.
Klein, M. H. et al. A Practitioner’s Handbook for Real-Time Analysis: Guide to Rate Monotonic Analysis for Real-Time Systems. Kluwer Academic Publishers, Norwell MA, 1993.
“Real-Time Communications Network Operating System. RTCN-OS Users’s Guide.” XXXX-PX2-RTCN edition, IBM Systems Integration Division, Manassas, VA, 1989.
Sha, L., and Goodenough, J. B. “Real-Time Scheduling Theory and Ada.” IEEE Computer, 23(4), April 1990, pp. 53–62.
Leung, J. and Whitehead, J. “On Complexity of Fixed-Priority Scheduling of Periodic Real-Time Tasks.” Performance Evaluation, 2, 1982, pp. 237–250.
Klein, M. H., and Ralya, T. “An Analysis of Input/Output Paradigms for Real-Time Systems.” Tech. Report CMU/SEI-90-TR-19, Software Engineering Institute, July 1990.
Rajkumar, R., Sha, L., and Lehoczky, J. P. “Real-Time Synchronization Protocols for Multiprocessors.” Proceedings of IEEE Real-Time Systems Symposium, December 1988, pp. 259–269.
Golestani, S. J. “Congestion-Free Transmission of Real-Time Traffic in Packet Networks.” Proceedings IEEE Infocom’ 90, June 1990, pp. 527–536.
F. Cristian. “Understanding Fault-Tolerant Distributed Systems.” Communications of the ACM, 34(2), February 1991.
Smith, W. E. “Various Optimizers for Single Stage Production.” Naval Research Logistics Quarterly, 3, 1956, pp. 59–66.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Kluwer Academic Publishers
About this chapter
Cite this chapter
Stahl, M.E., Bianchini, R.P. (1994). Adaptive System-Level Diagnosis in Real-Time. In: Koob, G.M., Lau, C.G. (eds) Foundations of Dependable Computing. The Springer International Series in Engineering and Computer Science, vol 284. Springer, Boston, MA. https://doi.org/10.1007/978-0-585-27316-7_1
Download citation
DOI: https://doi.org/10.1007/978-0-585-27316-7_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-7923-9485-3
Online ISBN: 978-0-585-27316-7
eBook Packages: Springer Book Archive