Abstract
Although event tracing of parallel applications offers highly detailed performance information, tracing on current leading edge systems may lead to unacceptable perturbation of the target program and unmanageably large trace files. High end systems of the near future promise even greater scalability challenges. Development of more scalable approaches requires a detailed understanding of the interactions between current approaches and high end runtime environments. In this paper we present the results of studies that examine several sources of overhead related to tracing: instrumentation, differing trace buffer sizes, periodic buffer flushes to disk, system changes, and increasing numbers of processors in the target application. As expected, the overhead of instrumentation correlates strongly with the number of events; however, our results indicate that the contribution of writing the trace buffer increases with increasing numbers of processors. We include evidence that the total overhead of tracing is sensitive to the underlying file system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brown, P., Falgout, R., Jones, J.: Semicoarsening multigrid on distributed memory machines. SIAM Journal on Scientific Computing 21(5), 1823–1834 (2000) (also available as Lawrence Livermore National Laboratory technical report UCRL-JC-130720)
Chung, I., Walkup, R., Wen, H., Yu, H.: MPI Performance Analysis Tools on Blue Gene/L. In: Proc. of SC2006, Tampa, Florida (November 11-17, 2006)
Cluster File Systems, Inc.: Lustre: A Scalable, High-Performance File System. Cluster File Systems, Inc. whitepaper (2002), available at (June 2006) http://www.lustre.org/docs/whitepaper.pdf
Fagot, A., de Kergommeaux, J.: Systematic Assessment of the Overhead of Tracing Parallel Programs. In: Proc. of 4th Euromicro Workshop on Parallel and Distributed Processing, pp. 179–185 (1996)
Fahringer, T., Gernt, M., Mohr, B., Wolf, F., Riley, G., Traff, J.: Knowledge Specification for Automatic Performance Analysis. APART Technical Report Revised Edition (2001), available at (October 5, 2006), http://www.fz-juelich.de/apart-1/reports/wp2-asl.ps.gz
Gait, J.: A Probe Effect in Concurrent Programs. Software - Practice and Experience 16(3), 225–233 (1986)
Gannon, J., Williams, K., Andersland, M., Lumpp, J., Casavant, T.: Using Perturbation Tracking to Compensate for Intrusiuon Propagation in Message Passing Systems. In: Proc. of the14th International Conference on Distributed Computing Systems, Poznan, Poland, pp. 141–412 (1994)
Garlick, J., Dunlap, C.: Building CHAOS: an Operating Environment for Livermore Linux Clusters. Lawrence Livermore National Laboratory, UCRL-ID-151968 (2002)
Hollingsworth, J., Miller, B.: An Adaptive Cost Model for Parallel Program Instrumentation. In: Fraigniaud, P., Mignotte, A., Bougé, L., Robert, Y. (eds.) Euro-Par 1996. LNCS, vol. 1123, pp. 88–97. Springer, Heidelberg (1996)
Kale, L., Kumar, S., Zheng, G., Lee, C.: Scaling Molecular Dynamics to 3000 Processors with Projections: A Performance Analysis Case Study. In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J.J., Zomaya, A.Y. (eds.) ICCS 2003. LNCS, vol. 2660, pp. 23–32. Springer, Heidelberg (2003)
Kranzlmüller, D., Grabner, S., Volkert, J.: Monitoring Strategies for Hypercube Systems. In: Proc. of the Fourth Euromicro Workshop on Parallel and Distributed Processing, pp. 486–492 (1996)
Lindlan, K., Cuny, J., Malony, A., Shende, S., Mohr, B., Rivenburgh, R., Rasmussen, C.: A Tool Framework for Static and Dynamic Analysis of Object-Oriented Software with Templates. In: Proc. of SC2000, Dallas (2000)
Malony, A., Reed, D., Wijshoff, H.: Performance Measurement Intrusion and Perturbation Analysis. IEEE Transactions on Parallel and Distributed Systems 3(4), 433–450 (1992)
Mohror, K., Karavanic, K.L.: A Study of Tracing Overhead on a High-Performance Linux Cluster. Portland State University CS Technical Report TR-06-06 (2006)
Ogle, D., Schwan, K., Snodgrass, R.: Application-Dependent Dynamic Monitoring of Distributed and Parallel Systems. In: IEEE Transactions on Parallel and Distributed Systems, pp. 762–778. IEEE Computer Society Press, Los Alamitos (1993)
Reed, D., Roth, P., Aydt, R., Shields, K., Tavera, L., Noe, R., Schwartz, B.: Scalable Performance Analysis: the Pablo Performance Analysis Environment. In: Proc. of the Scalable Parallel Libraries Conference, Mississippi State, MS, pp. 104–113 (1993)
Sarukkai, S., Malony, A.: Perturbation Analysis of High Level Instrumentation for SPMD Programs. In: Proc. of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, pp. 44–53. ACM Press, New York (1993)
SPHOT Benchmark (2006), available at (December 8, 2006), http://www.llnl.gov/asci/purple/benchmarks/limited/sphot/
Shende, S., Malony, A.: The TAU Parallel Performance System. The International Journal of High Performance Computing Applications 20(2), 287–331 (2006)
Waheed, A., Melfi, V., Rover, D.: A Model for Instrumentation System Management in Concurrent Computer Systems. In: Proc. of the 28th Hawaii International Conference on System Sciences, pp. 432–441 (1995)
Waheed, A., Rover, D., Hollingsworth, J.: Modeling and Evaluating Design Alternatives for an On-line Instrumentation System: A Case Study. IEEE Transactions on Software Engineering 24(6), 451–470 (1998)
Williams, K., Andersland, M., Gannon, J., Lummp, J., Casavant, T.: Perturbation Tracking. In: Proc. of the 32nd IEEE Conference on Decision and Control, San Antonio, TX, pp. 674–679. IEEE Computer Society Press, Los Alamitos (1993)
Wolf, F., Malony, A., Shende, S., Morris, A.: Trace-Based Parallel Performance Overhead Compensation. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J.J. (eds.) HPCC 2005. LNCS, vol. 3726, Springer, Heidelberg (2005)
Yaghmour, K., Dagenais, D.: Measuring and Characterizing System Behavior Using Kernel-Level Event Logging. In: Proc. of the USENIX Annual 2000 Technical Conference, San Diego, CA, pp. 13–26 (2000)
Yan, J., Listgarten, S.: Intrusion Compensation for Performance Evaluation of Parallel Programs on a Multicomputer. In: Proc. of the 6th International Conference on Parallel and Distributed Systems, Louisville, KY (1993)
Zaki, O., Lusk, E., Gropp, W., Swider, D.: Toward Scalable Performance Visualization with Jumpshot. High-Performance Computing Applications 13(2), 277–288 (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mohror, K., Karavanic, K.L. (2007). Towards Scalable Event Tracing for High End Systems. In: Perrott, R., Chapman, B.M., Subhlok, J., de Mello, R.F., Yang, L.T. (eds) High Performance Computing and Communications. HPCC 2007. Lecture Notes in Computer Science, vol 4782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75444-2_65
Download citation
DOI: https://doi.org/10.1007/978-3-540-75444-2_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75443-5
Online ISBN: 978-3-540-75444-2
eBook Packages: Computer ScienceComputer Science (R0)