On-line distributed debugging on scaleable multicomputer architectures
Debugging parallel programs is one of the most tedious jobs in programming scalable multiprocessor architectures. Due to the distributed resources of these machines, programming is often architecture dependent. Most development tools still reflect this dependency even during the analysis phase of parallel programs. This paper presents the distributed debugger DETOP, which offers a global name space and hides architectural features like the mapping of processes. DETOP is part of the integrated tool environment TOPSYS implemented on iPSC hypercubes, networks of SPARCstations and partly on Transputer systems.
KeywordsParallel Program Deterministic Finite Automaton Predicate Transformation Light Weight Process Instant Replay
Unable to display preview. Download preview PDF.
- 1.P. Bates and J. C. Wileden, “An Approach to High-Level Debugging of Distributed Systems”, In Proc. ACM SIGSOFT/SIGPLAN Software Engineering Symposium on High-Level Debugging, vol. 18(8) of ACM SIGPLAN Notices, pp. 107–111, Aug. 1983.Google Scholar
- 2.T. Bemmerl, A. Bode, P. Braun, O. Hansen, T. Treml, and R. Wismüller, “The Design and Implementation of TOPSYS”, Report 342/16/91 A, Technische Universität München, July 1991.Google Scholar
- 3.T. Bemmerl, R. Lindhof, and T. Treml, “The Distributed Monitoring System of TOPSYS”, In Proc. CONPAR 90 — VAPP IV Conference, Zürich, Schweiz, Sept. 1990.Google Scholar
- 4.T. Bemmerl and T. Ludwig, “MMK — A Distributed Operating System Kernel with Integrated Dynamic Loadbalancing”, In Proc. CONPAR 90 — Vapp IV Conference, Zürich, Schweiz, Sept. 1990.Google Scholar
- 5.G. C. Fox, M. Johnson, G. Lyzenga, O. S. W. Lyzenga, J. Salmon, and D. Walker, Solving problems on concurrent processors, Prentice Hall, Englewood Cliffs, 1988.Google Scholar
- 6.D. Haban and W. Weigel, “Global Events and Global Breakpoints in Distributed Systems”, In Proc. Twenty-First Annual Hawaii International Conference on System Sciences, vol. II, Sortware Track, pp. 166–175, 1988.Google Scholar
- 7.T. J. LeBlanc and J. M. Mellor-Crummey, “Debugging Parallel Programs with Instant Replay”, IEEE Trans. Comput., vol. C-35, no. 4, pp. 471–481, Apr. 1987.Google Scholar
- 8.B. P. Miller and J. D. Choi, “Breakpoints and Halting in Distributed Programs”, In Proc. Eighth International Conference on Distributed Computing Systems, pp. 316–323, Los Alamitos, California, 1988.Google Scholar