Abstract
Developers and users of high-performance distributed systems often observe performance problems such as unexpectedly low throughput or high latency. Determining the source of the performance problems requires detailed end-toend instrumentation of all components, including the applications, operating systems, hosts, and networks. In this paper we describe a methodology that enables the real-time diagnosis of performance problems in complex high-performance distributed systems. The methodology includes tools for generating timestamped event logs that can be used to provide detailed end-toend application and system level monitoring; and tools for visualizing the log data and real-time state of the distributed system. This methodology, called NetLogger, has proven invaluable for diagnosing problems in networks and in distributed systems code. This approach is novel in that it combines network, host, and application-level monitoring, providing a complete view of the entire system. NetLogger is designed to be extremely lightweight, and includes a mechanism for reliably collecting monitoring events from multiple distributed locations.
The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-0-387-35674-7_66
Chapter PDF
Similar content being viewed by others
References
Allcock B., Bester, J., Bresnahan, J., Chervenak, A., Foster, I., et.al. Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing IEEE Mass Storage Conference, 2001.
Bethel, W., B. Tierney, J. Lee, D. Gunter, S. Lau. Using High-Speed WANs and Network Data Caches to Enable Remote and Distributed Visualization. Proceeding of the IEEE Supercomputing 2000 Conference, Nov. 2000.
Burns, L., JL Hellerstein, S Ma, CS Perng, DA Rabenhorst, D Taylor, A Systematic Approach to Discovering Correlation Rules for Event Management, IFIP/IEEE International Symposium on Integrated Network Management, 2001.
log4j: http://jakarta.apache.org/log4j/dots/index.html
Open Group, Enterprise Management Forum. 2002, http://www.opengroup.org/management/arm.htm.
Tierney, B., W. Johnston, B. Crowley, G. Hoo, C. Brooks, D. Gunter. The NetLogger Methodology for High Performance Distributed Systems Performance Analysis. Proceeding of IEEE High Performance Distributed Computing, July 1998, LBNL-42611. http://www-didc.lbl.gov/NetLogger/
Tierney, B. and D. Gunter, NetLogger: A Toolkit for Distributed System Performance Tuning and Debugging, LBNL Tech Report LBNL-51276. http://wwwdidc.lbl.gov/paners/NetLogger.overview.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 IFIP International Federation for Information Processing
About this chapter
Cite this chapter
Gunter, D., Tierney, B. (2003). Netlogger. In: Goldszmidt, G., Schönwälder, J. (eds) Integrated Network Management VIII. IM 2003. IFIP — The International Federation for Information Processing, vol 118. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-35674-7_9
Download citation
DOI: https://doi.org/10.1007/978-0-387-35674-7_9
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4757-5521-3
Online ISBN: 978-0-387-35674-7
eBook Packages: Springer Book Archive