Advertisement

OCM — An OMIS compliant monitoring system

  • Thomas Ludwig
  • Roland Wismüller
  • Michael Oberhuber
Session F3: Tools for PVM
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1156)

Abstract

The OMIS project aims at defining a standard interface between tools for parallel systems and monitoring systems. Monitors act as mediators between tools and the parallel program running on some target architecture. Their task is to observe and manipulate the program according to the tool's commands. A standardized interface will allow different research groups to develop tools which can be used concurrently with the same program. OCM, an OMIS compliant monitoring system, is the first realization of such an environment. It is designed for PVM programs running on workstation clusters. The paper will give an outline of the goals of this project and describe important details of the monitoring system's design.

Keywords

Monitoring System Parallel Program Service Request Programming Library Target Architecture 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bernd Bruegge. A portable platform for distributed event environments. In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging, volume 26 of ACM SIGPLAN Notices, pages 184–193, December 1991.Google Scholar
  2. 2.
    C. Clemencon, J. Fritscher, and R. Rühl. Visualization, execution control and replay of massively parallel programs within annai's tool. Technical Report CSCS TR-94-09, CSCS, Manno, 1994. http://www.cscs.ch/pub/CSCS/techreports/1995/CSCS-TR-94-09.ps.gz.Google Scholar
  3. 3.
    M. Geischeder, M. Uemminghaus, H. Zeller, T. Ludwig, Michael Oberhuber, R. Wismüller, and A. Bode. OCM: OMIS Compliant Monitoring System — Design Documents. Technical report, to appear at Technische Universität München, Munich, Germany, October 1996.Google Scholar
  4. 4.
    G.A. Geist, M.T. Heath, B.W. Peyton, and P.H. Worley. A User's Guide to PICL, a portable instrumented communication library. Technical Report ORNL/TM-11616, Oak Ridge National Laboratory, Oak Ridge, TN, October 1990.Google Scholar
  5. 5.
    S. Grabner, D. Kranzlmüller, and J. Volkert. Debugging parallel programs using atempt. In B. Hertzberger and G. Serazzi, editors, High Performance Computing and Networking 1995, number 919 in Lecture Notes in Computer Science, pages 235–240. Springer, May 1995.Google Scholar
  6. 6.
    M. T. Helth and J.E. Finger. Paragraph: A toll for visualizing performance of parallel programs. Technical report, Oak Ridge National Laboratory, 1993.Google Scholar
  7. 7.
    S. Lamberts, T. Ludwig, C. Röder, and A. Bode. PFSLib — a file system for parallel programming environments. Technical Report TUM-I9619, SFB-Bericht Nr. 342/10/96 A, Technische Universität München, Munich, Germany, May 1996. http:/ludwig/papers/tl9602/WWW/report.html ftp:/ludwig/papers/tl9602/PS/article.ps.gz.Google Scholar
  8. 8.
    T. Ludwig and R. Wismüller. The Tool-set environment. In A. Bode, T. Ludwig, V. Sunderam, and R. Wismüller, editors, Workshop on PVM, MPI, Tools, and Applications, pages 28–32. Technische Universität München, November 1995. http:/ludwig/papers/tl9505/WWW/article.html ftp:/ludwig/papers/tl9505/PS/article.ps.gz.Google Scholar
  9. 9.
    T. Ludwig, R. Wismüller, V. Sunderam, and A. Bode. OMIS — On-line Monitoring Interface Specification. Technical Report TUM-I9609, SFB-Bericht Nr. 342/05/96 A, Technische Universität München, Munich, Germany, February 1996. http:/ludwig/papers/tl9601/WWW/article.html ftp:/ludwig/papers/tl9601/PS/article.ps.gz.Google Scholar
  10. 10.
    James E. Lumpp, Howard Jay Siegel, and Dan C. Marinescu. Specification and Identification of Events for Debugging and Performance Monitoring of Distributed Multiprocessor Systems. In Proceedings of the 10th International Conference on Distributed Computing Systems, pages 476–483, Paris, May 1990.Google Scholar
  11. 11.
    A. K. Petrenko. Methods for debugging and monitoring parallel programs: A survey. Programming and Computer Software, 20(3):113–129, may–june 1994.Google Scholar
  12. 12.
    Daniel A. Reed. Experimental analysis of parallel systems: Techniques and open problems. In Günter Haring and Gabriele Kotsis, editors, Computer Performance Evaluation, volume 794 of LNCS, pages 25–51. Springer, 1994.Google Scholar
  13. 13.
    Reinhard Schwarz and Friedemann Mattern. Detecting causal relationships on distributed computations: In search of the holy grail. Technical Report SFB 124-15, University of Kaiserslautern, December 1992.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Thomas Ludwig
    • 1
  • Roland Wismüller
    • 1
  • Michael Oberhuber
    • 1
  1. 1.Institut für Informatik Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR)Technische Universität München (TUM)München

Personalised recommendations