On-line distributed debugging on scaleable multicomputer architectures

  • Thomas Bemmerl
  • Roland Wismüller
Monitoring, Debugging, and Fault Tolerance
Part of the Lecture Notes in Computer Science book series (LNCS, volume 797)


Debugging parallel programs is one of the most tedious jobs in programming scalable multiprocessor architectures. Due to the distributed resources of these machines, programming is often architecture dependent. Most development tools still reflect this dependency even during the analysis phase of parallel programs. This paper presents the distributed debugger DETOP, which offers a global name space and hides architectural features like the mapping of processes. DETOP is part of the integrated tool environment TOPSYS implemented on iPSC hypercubes, networks of SPARCstations and partly on Transputer systems.


Parallel Program Deterministic Finite Automaton Predicate Transformation Light Weight Process Instant Replay 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    P. Bates and J. C. Wileden, “An Approach to High-Level Debugging of Distributed Systems”, In Proc. ACM SIGSOFT/SIGPLAN Software Engineering Symposium on High-Level Debugging, vol. 18(8) of ACM SIGPLAN Notices, pp. 107–111, Aug. 1983.Google Scholar
  2. 2.
    T. Bemmerl, A. Bode, P. Braun, O. Hansen, T. Treml, and R. Wismüller, “The Design and Implementation of TOPSYS”, Report 342/16/91 A, Technische Universität München, July 1991.Google Scholar
  3. 3.
    T. Bemmerl, R. Lindhof, and T. Treml, “The Distributed Monitoring System of TOPSYS”, In Proc. CONPAR 90 — VAPP IV Conference, Zürich, Schweiz, Sept. 1990.Google Scholar
  4. 4.
    T. Bemmerl and T. Ludwig, “MMK — A Distributed Operating System Kernel with Integrated Dynamic Loadbalancing”, In Proc. CONPAR 90 — Vapp IV Conference, Zürich, Schweiz, Sept. 1990.Google Scholar
  5. 5.
    G. C. Fox, M. Johnson, G. Lyzenga, O. S. W. Lyzenga, J. Salmon, and D. Walker, Solving problems on concurrent processors, Prentice Hall, Englewood Cliffs, 1988.Google Scholar
  6. 6.
    D. Haban and W. Weigel, “Global Events and Global Breakpoints in Distributed Systems”, In Proc. Twenty-First Annual Hawaii International Conference on System Sciences, vol. II, Sortware Track, pp. 166–175, 1988.Google Scholar
  7. 7.
    T. J. LeBlanc and J. M. Mellor-Crummey, “Debugging Parallel Programs with Instant Replay”, IEEE Trans. Comput., vol. C-35, no. 4, pp. 471–481, Apr. 1987.Google Scholar
  8. 8.
    B. P. Miller and J. D. Choi, “Breakpoints and Halting in Distributed Programs”, In Proc. Eighth International Conference on Distributed Computing Systems, pp. 316–323, Los Alamitos, California, 1988.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1994

Authors and Affiliations

  • Thomas Bemmerl
    • 1
    • 2
  • Roland Wismüller
    • 3
  1. 1.European Supercomputer Dev. CenterIntel Corp.FeldkirchenGermany
  2. 2.Lehrstuhl für Betriebssysteme (LfBS)RWTH AachenAachenGermany
  3. 3.Institut für InformatikTechnische Universität MünchenMünchenGermany

Personalised recommendations