Towards Scalable Event Tracing for High End Systems

Mohror, Kathryn; Karavanic, Karen L.

doi:10.1007/978-3-540-75444-2_65

Kathryn Mohror¹ &
Karen L. Karavanic¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4782))

Included in the following conference series:

International Conference on High Performance Computing and Communications

917 Accesses
4 Citations

Abstract

Although event tracing of parallel applications offers highly detailed performance information, tracing on current leading edge systems may lead to unacceptable perturbation of the target program and unmanageably large trace files. High end systems of the near future promise even greater scalability challenges. Development of more scalable approaches requires a detailed understanding of the interactions between current approaches and high end runtime environments. In this paper we present the results of studies that examine several sources of overhead related to tracing: instrumentation, differing trace buffer sizes, periodic buffer flushes to disk, system changes, and increasing numbers of processors in the target application. As expected, the overhead of instrumentation correlates strongly with the number of events; however, our results indicate that the contribution of writing the trace buffer increases with increasing numbers of processors. We include evidence that the total overhead of tracing is sensitive to the underlying file system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brown, P., Falgout, R., Jones, J.: Semicoarsening multigrid on distributed memory machines. SIAM Journal on Scientific Computing 21(5), 1823–1834 (2000) (also available as Lawrence Livermore National Laboratory technical report UCRL-JC-130720)
Google Scholar
Chung, I., Walkup, R., Wen, H., Yu, H.: MPI Performance Analysis Tools on Blue Gene/L. In: Proc. of SC2006, Tampa, Florida (November 11-17, 2006)
Google Scholar
Cluster File Systems, Inc.: Lustre: A Scalable, High-Performance File System. Cluster File Systems, Inc. whitepaper (2002), available at (June 2006) http://www.lustre.org/docs/whitepaper.pdf
Fagot, A., de Kergommeaux, J.: Systematic Assessment of the Overhead of Tracing Parallel Programs. In: Proc. of 4th Euromicro Workshop on Parallel and Distributed Processing, pp. 179–185 (1996)
Google Scholar
Fahringer, T., Gernt, M., Mohr, B., Wolf, F., Riley, G., Traff, J.: Knowledge Specification for Automatic Performance Analysis. APART Technical Report Revised Edition (2001), available at (October 5, 2006), http://www.fz-juelich.de/apart-1/reports/wp2-asl.ps.gz
Gait, J.: A Probe Effect in Concurrent Programs. Software - Practice and Experience 16(3), 225–233 (1986)
Article Google Scholar
Gannon, J., Williams, K., Andersland, M., Lumpp, J., Casavant, T.: Using Perturbation Tracking to Compensate for Intrusiuon Propagation in Message Passing Systems. In: Proc. of the14th International Conference on Distributed Computing Systems, Poznan, Poland, pp. 141–412 (1994)
Google Scholar
Garlick, J., Dunlap, C.: Building CHAOS: an Operating Environment for Livermore Linux Clusters. Lawrence Livermore National Laboratory, UCRL-ID-151968 (2002)
Google Scholar
Hollingsworth, J., Miller, B.: An Adaptive Cost Model for Parallel Program Instrumentation. In: Fraigniaud, P., Mignotte, A., Bougé, L., Robert, Y. (eds.) Euro-Par 1996. LNCS, vol. 1123, pp. 88–97. Springer, Heidelberg (1996)
Google Scholar
Kale, L., Kumar, S., Zheng, G., Lee, C.: Scaling Molecular Dynamics to 3000 Processors with Projections: A Performance Analysis Case Study. In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J.J., Zomaya, A.Y. (eds.) ICCS 2003. LNCS, vol. 2660, pp. 23–32. Springer, Heidelberg (2003)
Chapter Google Scholar
Kranzlmüller, D., Grabner, S., Volkert, J.: Monitoring Strategies for Hypercube Systems. In: Proc. of the Fourth Euromicro Workshop on Parallel and Distributed Processing, pp. 486–492 (1996)
Google Scholar
Lindlan, K., Cuny, J., Malony, A., Shende, S., Mohr, B., Rivenburgh, R., Rasmussen, C.: A Tool Framework for Static and Dynamic Analysis of Object-Oriented Software with Templates. In: Proc. of SC2000, Dallas (2000)
Google Scholar
Malony, A., Reed, D., Wijshoff, H.: Performance Measurement Intrusion and Perturbation Analysis. IEEE Transactions on Parallel and Distributed Systems 3(4), 433–450 (1992)
Article Google Scholar
Mohror, K., Karavanic, K.L.: A Study of Tracing Overhead on a High-Performance Linux Cluster. Portland State University CS Technical Report TR-06-06 (2006)
Google Scholar
Ogle, D., Schwan, K., Snodgrass, R.: Application-Dependent Dynamic Monitoring of Distributed and Parallel Systems. In: IEEE Transactions on Parallel and Distributed Systems, pp. 762–778. IEEE Computer Society Press, Los Alamitos (1993)
Google Scholar
Reed, D., Roth, P., Aydt, R., Shields, K., Tavera, L., Noe, R., Schwartz, B.: Scalable Performance Analysis: the Pablo Performance Analysis Environment. In: Proc. of the Scalable Parallel Libraries Conference, Mississippi State, MS, pp. 104–113 (1993)
Google Scholar
Sarukkai, S., Malony, A.: Perturbation Analysis of High Level Instrumentation for SPMD Programs. In: Proc. of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, pp. 44–53. ACM Press, New York (1993)
Google Scholar
SPHOT Benchmark (2006), available at (December 8, 2006), http://www.llnl.gov/asci/purple/benchmarks/limited/sphot/
Shende, S., Malony, A.: The TAU Parallel Performance System. The International Journal of High Performance Computing Applications 20(2), 287–331 (2006)
Article Google Scholar
Waheed, A., Melfi, V., Rover, D.: A Model for Instrumentation System Management in Concurrent Computer Systems. In: Proc. of the 28th Hawaii International Conference on System Sciences, pp. 432–441 (1995)
Google Scholar
Waheed, A., Rover, D., Hollingsworth, J.: Modeling and Evaluating Design Alternatives for an On-line Instrumentation System: A Case Study. IEEE Transactions on Software Engineering 24(6), 451–470 (1998)
Article Google Scholar
Williams, K., Andersland, M., Gannon, J., Lummp, J., Casavant, T.: Perturbation Tracking. In: Proc. of the 32nd IEEE Conference on Decision and Control, San Antonio, TX, pp. 674–679. IEEE Computer Society Press, Los Alamitos (1993)
Google Scholar
Wolf, F., Malony, A., Shende, S., Morris, A.: Trace-Based Parallel Performance Overhead Compensation. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J.J. (eds.) HPCC 2005. LNCS, vol. 3726, Springer, Heidelberg (2005)
Chapter Google Scholar
Yaghmour, K., Dagenais, D.: Measuring and Characterizing System Behavior Using Kernel-Level Event Logging. In: Proc. of the USENIX Annual 2000 Technical Conference, San Diego, CA, pp. 13–26 (2000)
Google Scholar
Yan, J., Listgarten, S.: Intrusion Compensation for Performance Evaluation of Parallel Programs on a Multicomputer. In: Proc. of the 6th International Conference on Parallel and Distributed Systems, Louisville, KY (1993)
Google Scholar
Zaki, O., Lusk, E., Gropp, W., Swider, D.: Toward Scalable Performance Visualization with Jumpshot. High-Performance Computing Applications 13(2), 277–288 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Portland State University, P.O. Box 751, Portland, OR 97207-0751,
Kathryn Mohror & Karen L. Karavanic

Authors

Kathryn Mohror
View author publications
You can also search for this author in PubMed Google Scholar
Karen L. Karavanic
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ronald Perrott Barbara M. Chapman Jaspal Subhlok Rodrigo Fernandes de Mello Laurence T. Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mohror, K., Karavanic, K.L. (2007). Towards Scalable Event Tracing for High End Systems. In: Perrott, R., Chapman, B.M., Subhlok, J., de Mello, R.F., Yang, L.T. (eds) High Performance Computing and Communications. HPCC 2007. Lecture Notes in Computer Science, vol 4782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75444-2_65

Download citation

DOI: https://doi.org/10.1007/978-3-540-75444-2_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75443-5
Online ISBN: 978-3-540-75444-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics