Ein Diagnoseverfahren für Systeme mit Mehreren Verarbeitungseinheiten

Dal Cin, M.

doi:10.1007/978-3-642-45628-2_17

M. Dal Cin³

Part of the book series: Informatik-Fachberichte ((INFORMATIK,volume 147))

63 Accesses

Zusammenfassung

In dieser Arbeit wird ein Diagnoseprotokoll für Systeme mit mehrfachen Verarbeitungseinheiten vorgestellt. Das Protokoll entdeckt und lokalisiert Fehler auf Systemniveau, wie z.B. fehlerhafte Ausgaben oder Zusammenbrüche von Einheiten. Die Fehlerlokalisierung und die Koordination unter den Verarbeitungseinheitenbasiert auf dem Austausch von Syndromen als Lebenszeichen. Das Protokoll ist dezentralisiert und bewirkt ein Übereinstimmen zwischen allen intakten Einheiten einer sog. Diagnosegruppe hinsichtich des Status der defekten Einheiten.

Abstract

This paper describes the design of a (high-level) diagnosis protocol for a system with multiple processing nodes. The protocol detects and localizes system-level failures such as incorrect outputs or crashes of processing nodes. Failure localization and coordination among processing nodes is based on the use of Syndroms as “I am alive”-messages. The protocol is decentralized and forces an agreement among all operational nodes of a so called diagnosis group on the status of down nodes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Literatur

B. Koenemann, J. Mucha, G. Zwiehoff: Built-in test for complex integrated circuits, IEEE Jour. Solid State Circuits, 315-321, 1980.
Google Scholar
E. Maehle: Fault-tolerant DIRMU-Multiprocessor Configuration, IEEE Comp. Archit.Techn.Com. Newsletter, 51, 1985.
Google Scholar
E. Schmitter, P. Banes: The Basic Fault-Tolerant System, IEEE Micro Vol. 4, 66–76, 1984.
Article Google Scholar
F.B. Schneider, L. Lamport: Paradigms for distributed programs — in Distributed Systems (M. Paul, H.J. Siegert, Eds.), Springer Lecture Notes in Computer Sience 190, 203-286, 1985.
Google Scholar
G. Le Lann: Distributed Systems: Toward a formal approach, Proc. IFIP Congress 77, 155–160, 1977.
Google Scholar
F. Cristian: Atomic Broadcast, IEEE Proc. FTCS-15, Ann Arbor, 200-208, 1985.
Google Scholar
M. Dal Cin: Graphentheoretische Modelle zur Selbstdiagnose fehlertoleranter Mehrprozessor-und Mehrrechnersysteme, Infor. Spek. 5, 108–188, 1984.
Google Scholar
M. Dal Cin, K.-E. Großpietsch, M. Trautwein: Methoden der Fehlerdiagnose, Info. Spek. 9, 82–94, 1986.
Google Scholar
F.P. Preparata, G. Metze, R.T. Chien: On the connection assignment of diagnosable systems, IEEE Trans. Electron. Comp. EC-16, 848–854,1967.
Article Google Scholar
G.G.L. Meyer, G.M. Masson: An efficient fault diagnosis algorithm for symmetric multiple processor architectures, IEEE Trans, on Comp. C-27, 1059–1063, 1978.
Article Google Scholar
J.G. Kühl, S.M. Reddy: Distributed fault tolerance for large multiprocessor systems, Proc. 7th Ann. Symp. on Comp. Archit., La Baule, 23-20, 1980.
Google Scholar
J.H. Saltzer, D.P. Reed, D.D. Clark: End-to-End Arguments in system design, ACM Trans.Comp.Sci Vol 2, 277–288, 1984.
Article Google Scholar
E. Ammann, M. Dal Cin: Efficient algorithms for comparison-based self-diagnosis, in Self-Diagnosis and Fault Tolerance, ATTEMPTO-Verlag Tübingen, 1–18, 1981.
Google Scholar
W. Händler, H. Rohrer: Gedanken zu einem Rechner-Baukasten-System, Elect. Rechenanlagen 22, 3–13, 1980.
Google Scholar
INMOS Limited, Transputer Reference Manual, Bristol, 1985.
Google Scholar
M. Dal Cin, F.H. Florian: Analysis of a fault-tolerant distributed diagnosis algorithm, IEEE-Proc. FTCS-15 Ann Arbor, 159-165, 1985.
Google Scholar

Download references

Author information

Authors and Affiliations

Fachbereich Informatik, J. W. Goethe-Universität Frankfurt, Deutschland
M. Dal Cin

Authors

M. Dal Cin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fachbereich 2, Hochschule Bremerhaven, Bürgermeister-Smidt-Straße 20, D-2850, Bremerhaven, Germany
F. Belli
Institut für Rechnerentwurf und Fehlertoleranz Fakultät für Informatik, Universität Karlsruhe, Postfach 6980, D-7500, Karlsruhe 1, Germany
W. Görke

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dal Cin, M. (1987). Ein Diagnoseverfahren für Systeme mit Mehreren Verarbeitungseinheiten. In: Belli, F., Görke, W. (eds) Fehlertolerierende Rechensysteme / Fault-Tolerant Computing Systems. Informatik-Fachberichte, vol 147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45628-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-45628-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-18294-8
Online ISBN: 978-3-642-45628-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics