Monitoring and management-support of distributed systems

  • Dieter Haban
  • Dieter Wybranietz
  • Amnon Barak
Technical Paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 433)


This paper describes a tool for on-line monitoring of distributed systems. The tool consists of a hardware component and software level, i.e., a hybrid monitor, which is capable of presenting the interactive user and the local operating system with a high-level information and performance evaluation of the activities in the host system with minimal interferences. A special hardware support, which consists of a test and measurement processor (TMP), was designed and has been implemented in the nodes of an experimental multicomputer system. The main function of the TMP is to execute software for monitoring the local system behavior and to measure the performance of both the resident operating system and the application software. The TMP can also be used to execute low level operating system functions, to manage local resources and to trigger time driven events in order to reduce the overhead of the host operating system. The operations of the TMP are completely transparent to the users with a minimal, less than 0.1%, overhead to the hardware system. In the experimental system, all the TMPs were connected with a central monitoring station, using an independent communication network, in order to provide a global view of the monitored system. The central monitoring station displays the resulting information in easy-to-read charts and graphs. Our experience with the TMP shows that it promotes an improved understanding of run-time behavior and performance measurements, to derive qualitative and quantitative assessments of distributed systems.

Index Terms

Distributed systems multicomputer distributed operating system real-time monitoring measuring management events user interface graphical display load balancing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    N. Allon, A. Barak, and U. Manber, "On disseminating information reliably without broadcasting," Proc. 7th Int. Conf. on Distributed Computing Systems, pp. 74–81, Berlin, Sept. 1987.Google Scholar
  2. [2]
    A. Barak and A. Litman, "MOS: A multicomputer distributed operating system," Software Practice & Experience, vol. 15, no. 8, pp. 725–737, Aug. 1985.Google Scholar
  3. [3]
    A. Barak and A. Shiloh, "A distributed load balancing policy for a multicomputer," Software Practice & Experience, vol. 15, no. 9, pp. 901–913, Sept. 1985.Google Scholar
  4. [4]
    C. Brown et al., "Research with the butterfly multicomputer," Computer Science and Computer Engineering Research Review 1884–1885, University of Rochester, 1985.Google Scholar
  5. [5]
    J. Cohen, "Garbage collection of linked data structures," ACM Computing Surveys, vol. 13, no. 3, pp. 341–367, Sep. 1981.Google Scholar
  6. [6]
    Z. Drezner and A. Barak, "An asynchronous algorithm for scattering information between the active nodes of a multicomputer system," J. of Parallel and Distributed Computing, vol. 3, no. 3, pp. 344–351, Sept. 1986.Google Scholar
  7. [7]
    D. Ferrari and V. Minetti, "A hybrid measurement tool for minicomputers," Experimental Computer Performance and Evaluation, D.Ferrari and M. Spadoni (eds), North-Holland Publishing Company, 1981.Google Scholar
  8. [8]
    K.A. Frenkel, "Evaluating two massively parallel machines," Commun. ACM, vol. 29, no. 8, pp. 752–758, Aug. 1986.Google Scholar
  9. [9]
    R. Gusella and S. Latti, "TEMPO-A network time controller for a distributed Berkeley UNIX system," Distributed Processing Tech. Comm. Newsletter, IEEE, vol. 6, no. 2, pp. 7–15, June 1984.Google Scholar
  10. [10]
    D. Haban and W. Weigel, "Global events and global breakpoints in distributed systems," Proc. 21st Hawaii Int. Conf. on System Sciences, vol. 2, pp. 166–175, Jan. 1988.Google Scholar
  11. [11]
    P. Krueger and M. Livny, "A comparison of preemptive and non-preemptive load distributing," Proc. 8th Int. Conf. on Distributed Computing Systems, San Jose, CA, pp. 123–130, June 1988.Google Scholar
  12. [12]
    L. Lamport, "Time, clocks and the ordering of events in a distributed system," Commun. ACM, vol. 21, no. 7, pp. 558–565, 1978.Google Scholar
  13. [13]
    J.E. Lambert and F. Halsall, "Program debugging and performance evaluation aids for a multimicroprocessor system," Software & Microsystems, vol. 3, no. 1, pp. 2–10, Feb. 1984.Google Scholar
  14. [14]
    K.J. Lee and D. Towsley, "A comparison of priority-based decentralized load balancing policies," Proc ACM SIGMETRICS Conf., pp. 70–77, 1986.Google Scholar
  15. [15]
    B. Liskov, "Primitives for distributed computing," Proc. 7th Symp. Operating System Principles, pp. 33–42, 1979.Google Scholar
  16. [16]
    J. Nehmer, D. Haban, F. Mattern, D. Wybranietz and D. Rombach, "Key concepts of the INCAS multicomputer project," IEEE Trans. on Software Engineering, vol. 13, no. 8, pp. 913–923, Aug. 1987.Google Scholar
  17. [17]
    B. Plattner and J. Nievergelt, "Monitoring program execution: A survey," IEEE Computer, pp. 76–93, Nov. 1981.Google Scholar
  18. [18]
    C.L. Seitz, "The Cosmic Cube," Commun. ACM, vol. 28, no. 1, pp. 22–33, 1985.Google Scholar
  19. [19]
    L. Svobodova, "Online system performance measurements with software and hybrid monitors," Operating Systems Rev., vol. 7, no. 4, pp. 45–53, Oct. 1973.Google Scholar
  20. [20]
    A.S. Tanenbaum, "Operating systems: Design and implementation," Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1987.Google Scholar
  21. [21]
    D. Wybranietz and D. Haban, "Monitoring and performance measuring distributed systems," Proc. ACM SIGMETRICS, Santa Fe, in: ACM Performance Evaluation Review, vol. 16, no. 1, pp. 197–206, May 1988.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1990

Authors and Affiliations

  • Dieter Haban
  • Dieter Wybranietz
  • Amnon Barak

There are no affiliations available

Personalised recommendations