Monitoring for detecting bugs and blocking communication
Writing parallel programs is more difficult than serial programming. So or somehow similar most papers about parallel processing start. Parallel programming alone is not that difficult but writing efficient and error free parallel programs can be a very tedious job.
In our paper we propose a strategy for debugging parallel programs on distributed memory machines. We follow a method called trace driven simulation [10, 11, 12] that has proven to be successful in finding several types of errors in parallel programs. During our work we developed a monitoring strategy as well as tools for visualization and inspection of recorded program traces. The whole concept is covered by a similar philosophy as UNIX (“keep the tools small and clear”) but with an extension to the “look and feel” concept that is known from modern graphical user interface design. Therefore we are developing handy tools in a modular way. The integration of all tools leads to our debugging environment.
KeywordsDistributed Memory Debugging Monitoring Event Graph Trace Driven Simulation Communication Events
Unable to display preview. Download preview PDF.
- D.P. Agrawal, V.K. Janakiram, G.C. Pathak, “Evaluating the performance of multicomputer configurations”, IEEE Computer 19 (7), pp. 23–37, July 1986Google Scholar
- W.J. Dally, C.L. Seitz, “Deadlock-Free Message Routing in Multiprocessor Interconnection networks”, IEEE Trans. Computers 36 (5), pp. 547–553, May 1987Google Scholar
- A. Erzmann, “Messung des Kommunikationsverhaltens des nCUBE 2-Parallelrechners” Technical Report University Hannover, May 1993Google Scholar
- C.J. Fidge, “Partial orders for parallel debugging”, Proc. Workshop on Parallel and Distributed Debugging, ACM, pp. 183–194, 1988Google Scholar
- G. A. Geist, M.T. Heath, B.W. Peyton, P.H.Worley, “A Users' Guide to PICL — A Portable Instrumented Communication Library”, ORNL/TM-11616, Oak Ridge National Lab, August 1990Google Scholar
- S. Grabner, D. Kranzlmüller, “ATEMPT — A Tool for Event Manipulation”, submitted to HICSS-28, Maui, Hawaii, 1995Google Scholar
- S. Grabner, J. Volkert, “Debugging Parallel Programs using Event Manipulation”, Proc. 1st Intl. Meeting on Vector and Parallel Processing, Porto, Portugal, Sept 1993Google Scholar
- M. T. Heath, J.E. Finger, “ParaGraph: A Tool for Visualizing Performance of Parallel Programs”, Technical Report Oak Ridge Natl. Lab., Sept. 1993Google Scholar
- R. Kolmhofer, “Kommunikation in Parallelrechnern mit verteiltem Speicher”, Masters Thesis, Institute of Computer Science, Johannes Kepler University Linz, May 1993Google Scholar
- T.J. LeBlanc, J.M. Mellor-Crummey, “Debugging parallel programs with instant replay”, IEEE Trans. on Computing, pp. 471–482, April 1987Google Scholar
- D.C. Marinescu, J.E. Lumpp Jr., T.L. Casavant, “Models for Monitoring and Debugging Tools for Parallel and Distributed Software”, Journal of Parallel and Distributed Computing 9 (2), pp. 171–184, 1990Google Scholar
- C. E. McDowell, D.P. Helmbold, “Debugging Concurrent Programs”, ACM Computing Surveys 21 (4), pp. 593–622, Dec. 1989Google Scholar
- nCUBE Corporation, nCUBE 2 Processor Manual Rel. 3.0, 1992Google Scholar
- nCUBE Corporation, nCUBE 2 Programmer's Guide Rel. 3.0, 1992Google Scholar
- L.M. Ni, P.K. McKinley, “A Survey of Wormhole Routing Techniques in Direct Networks”, IEEE Computer, Vol. 26, No. 2, Feb. 93, pp. 62–76Google Scholar
- D.F. Snelling, G.-R. Hoffmann, “A comparative study of libraries for parallel processing”, Proc. of the Intl. Conference on Vector and Parallel Processors”, in Computational Science III, Parallel Computing, Vol. 8, (1–3), pp. 255–266, 1988Google Scholar