An Improved Ganglia-Like Clusters Monitoring System
Ganglia  is a scalable distributed monitoring system for high performance computing systems such as clusters and Grids. We propose an improved Ganglia-like clusters monitoring system, which has more reliability with federation node and associated link failures; some monitoring data is accessed by permission; adding control functions such as restart or shutdown confusion processes; send email or pager to cluster administrator when important event occurs; and optionally select some data to federation node based on user policy in order to speedup the WAN access. We have implemented a prototype system.
KeywordsMonitoring Data Leaf Node Parent Node Configuration File Monitor System
Unable to display preview. Download preview PDF.
- 1.Massie, M.L., Chun, B.N., Culler, D.E.: The Ganglia Distributed Monitoring System: Design, Implementation, and Experience (February 2003) (submitted for publication)Google Scholar
- 2.The TeraGrid Project. Teragrid project web page (2001), http://www.teragrid.org
- 4.Sottile, M., Minnich, R.: Supermon: A high speed cluster monitoring system. In: Proceedings of Cluster (September 2002)Google Scholar
- 5.Anderson, E., Patterson, D.: Extensible, scalable monitoring for clusters of computers. In: Proceedings of the 11th Systems Administration Conference (October 1997)Google Scholar
- 6.Amir, E., McCanne, S., Katz, R.H.: An active service framework and its application to realtime multimedia transcoding. In: Proceedings of the ACM SIGCOMM 1998 Conference on Communications Architectures and Protocols, pp. 178–189 (1998)Google Scholar
- 7.Chun, B.N., Culler, D.E.: Rexec: A decentralized, secure remote execution environment for clusters. In: Proceedings of the 4th Workshop on Communication, Architecture and Applications for Network based Parallel Computing (January 2000)Google Scholar
- 8.Hyarary, F.: Graph Theory. Addison-Wesley, Reading (1969)Google Scholar
- 9.Peterson, L., Culler, D., Anderson, T., Roscoe, T.: A blueprint for introducing disruptive technology into the internet. In: Proceedings of the 1st Workshop on Hot Topics in Networks, HotNets-I (October 2002)Google Scholar