Abstract
Large distributed systems such as Computational Grids require a large amount of monitoring data be collected for a variety of tasks such as fault detection, performance analysis, performance tuning, performance prediction, and scheduling. Ensuring that all necessary monitoring is turned on and that data is being collected can be a very tedious and error-prone task. We have developed an agent-based system to automate the execution of monitoring sensors and the collection of event data.
Similar content being viewed by others
References
J. Abela and T. Debeaupuis, Universal format for logger messages, IETF Internet Draft, http://www.ietf.org/internetdrafts/ draft-abela-ulm-05.txt.
J. Case, R. Mundy, D. Partain and B. Stewart, Introduction to Version 3 of the Internet-standard Network Management Framework, IETF RFC 2570 (April 1999).
CORBA, Systems management: event management service, X/Open Document Number: P437, http://www.opengroup.org/ onlinepubs/008356299/.
L. DeRose and D. Reed, SvPablo: A multi-language architectureindependent performance analysis system, in: Proc. of the International Conference on Parallel Processing (ICPP'99), Fukushima (September 1999).
M. Genersereth and S. Ketchpel, Software agents, Communications of the ACM (July 1994).
S. Fitzgerald, I. Foster, C. Kesselman, G. von Laszewski, W. Smith and S. Tueke, A directory service for configuring high-performance distributed computations, in: Proc. 6th IEEE Symp. on High Performance Distributed Computing (August 1997).
Globus, http://www.globus.org.
Grid Forum, Grid Performance Working Group, http://wwwdidc. lbl.gov/GridPerf/.
I. Foster and C. Kesselman, eds., The Grid: Blueprint for a New Computing Infrastructure (Morgan Kaufmann, August 1998). ISBN 1–55860–475–8.
R. Housely, W. Ford, W. Polk and D. Solo, Internet X.509 Public Key Infrastructure, IETF RFC 2459 (January 1999).
Iperf, http://dast.nlanr.net/Projects/Iperf/ index.html.
JDMK, http://www.sun.com/software/ java-dynamic/.
Jini Distributed Event Specification, http://www.sun.com/ jini/specs/.
JMX, http://java.sun.com/products/ JavaManagement/.
Matisse, http://www.cnri.net/matisse/.
D. Mills, Simple Network Time Protocol (SNTP), RFC 1769 (March 1995).
Pablo Scalable Performance Tools, http://vibes.cs.uiuc. edu/.
X. Peng, Survey on Event Service, http://www-unix.mcs. anl.gov/∼peng/survey.html.
Performance Co-Pilot, http://oss.sgi.com/projects/ pcp/.
Supernet, http://www.ngi-supernet.org/.
tcpdump: NetLogger version, http://www.ittc.ukans.edu/ projects/enable/tcpdump/.
M. Thompson, W. Johnston, S. Mudumbai, G. Hoo, K. Jackson and A. Essiari, Certificate-based access control for widely distributed resources, in: Proc. of the Eighth Usenix Security Symposium (August 1999).
B. Tierney, J. Lee, B. Crowley, M. Holding, J. Hylton and F. Drake, A network-aware distributed storage cache for data intensive environments, in: Proc. of IEEE High Performance Distributed Computing conference (HPDC-8) (August 1999), http://wwwdidc. lbl.gov/DPSS/.
B. Tierney, W. Johnston, B. Crowley, G. Hoo, C. Brooks and D. Gunter, The NetLogger methodology for high performance distributed systems performance analysis, in: Proc. of IEEE High Performance Distributed Computing conference (July 1998), http:// www-didc.lbl.gov/NetLogger/.
M. Wahl, T. Howes and S. Kille, Lightweight Directory Access Protocol (v3), IETF RFC 2251 (December 1997).
R. Wolski, N. Spring and J. Hayes, The network weather service: A distributed resource performance forecasting service for metacomputing, Future Generation Computing Systems (1999), http://nsw.npaci.edu/.
R. Wolski, M. Swany and S. Fitzgerald, White Paper: Developing a Dynamic Performance Information Infrastructure for Grid Systems, http://dast.nlanr.net/GridForum/Perf-WG/ white.PDF.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Tierney, B., Crowley, B., Gunter, D. et al. A Monitoring Sensor Management System for Grid Environments. Cluster Computing 4, 19–28 (2001). https://doi.org/10.1023/A:1011408108941
Issue Date:
DOI: https://doi.org/10.1023/A:1011408108941