Abstract
Communication traces are increasingly important, both for the performance analysis and optimization of parallel applications and for designing next-generation HPC systems. Meanwhile, the problem size and the execution scale on supercomputers keep growing, producing a prohibitive volume of communication traces. To reduce the size of communication traces, existing dynamic compression methods introduce large compression overhead as the job scale increases. We propose a hybrid static-dynamic method, called Cypress, which leverages information acquired from static analysis to facilitate more effective and efficient dynamic trace compression. Cypress extracts a program communication structure tree at compile time using inter-procedural analysis. This tree naturally contains crucial iterative computing features such as the loop structure, allowing subsequent runtime compression to “fill in,” in a “top-down” manner, event details into the known communication template. Results show that Cypress reduces intra-process and inter-process compression overhead up to 5\(\times \) and 9\(\times \), respectively, over state-of-the-art dynamic methods while only introducing very low compiling overhead. (Ⓒ 2014 IEEE. Reproduced, with permission, from Jidong Zhai, et al., Cypress: combining static and dynamic analysis for top-down communication trace compression, SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2014.)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Vetter, J. S., & Mueller, F. (2002). Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In Proceedings 16th International Parallel and Distributed Processing Symposium (pp. 853–865).
Becker, D., et al. (2007). Automatic trace-based performance analysis of metacomputing applications. In 2007 IEEE International Parallel and Distributed Processing Symposium (pp. 1–10). IEEE.
Snavely, A., et al. (2002). A framework for application performance modeling and prediction. In SC ’02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing (pp. 1–17).
Choudhury, N., Mehta, Y., & Wilmarth, T. L., et al. (2005). Scaling an optimistic parallel simulation of large-scale interconnection networks. In Proceedings of the Winter Simulation Conference, 2005 (pp. 591–600).
Susukita, R., Ando, H., & Aoyagi, M., et al. (2008). Performance prediction of large-scale parallel system and application using macro-level simulation. In Proceedings SC’08 (pp. 1–9).
Zhai, J., Chen, W., & Zheng, W. (2010). PHANTOM: Predicting performance of parallel applications on large-scale parallel machines using a singlenode. In PPoPP.
Intel Ltd. Intel trace analyzer & collector. http://www.intel.com/cd/software/products/asmo-na/eng/244171.htm
Nagel, W. E., et al. (1996). VAMPIR: Visualization and analysis of MPI resources.
Shende, S. S., & Malony, A. D. (2006). The TAU parallel performance system. The International Journal of High Performance Computing Applications, 20(2), 287–311.
Mohr, B., & Wolf, F. (2003). KOJAK-A tool set for automatic performance analysis of parallel programs. In Euro-Par 2003 Parallel Processing: 9th International Euro-Par Conference.
Geimer, M., et al. (2010). The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience, 22(6), 702–719.
Advanced Simulation and Computing Program. The ASC SMG2000 benchmark code, https://asc.llnl.gov/computing_resources/purple/archive/benchmarks/smg/.
Wylie, B. J. N., Geimer, M., & Wolf, F. (2008). Performance measurement and analysis of large-scale parallel applications on leadership computing systems. Scientific Programming, 16(2–3), 167–181.
Noeth, M., et al. (2009). ScalaTrace: Scalable compression and replay of communication traces for high-performance computing. Journal of Parallel and Distributed Computing, 69(8), 696–710.
Xu, Q., Subhlok, J., & Hammen, N. (2010). Efficient discovery of loop nests in execution traces. In 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (pp. 193–202).
Krishnamoorthy, S., & Agarwal, K. (2010). Scalable communication trace compression. In Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (pp. 408–417). IEEE Computer Society.
Shao, S., Jones, A. K., & Melhem, R. G. (2006). A compiler-based communication analysis approach for multiprocessor systems. In Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
Wu, X., & Mueller, F. (2013). Elastic and scalable tracing and accurate replay of non-deterministic events. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing. ICS’13 (pp. 59–68). ACM.
Zhai, J., et al. (2014). Cypress: Combining static and dynamic analysis for top-down communication trace compression. In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 143–153). IEEE.
The LLVM Compiler Framework. http://llvm.org
Muchnick, S. S. (1997). Advanced compiler design and implementation. San Francisco, CA, USA: Morgan Kaufmann Publishers.
Emami, M., Ghiya, R., & Hendren, L. J. (1994). Context-sensitive interprocedural points-to analysis in the presence of function pointers. In PLDI’94 (pp. 242–256). ACM.
Alexandrov, A., et al. (1997). LogGP: Incorporating long messages into the LogP model for parallel computation. Journal of Parallel and Distributed Computing, 44(1) , 71–79.
Zhang, J., et al. (2009). Process mapping for MPI collective communications. In Euro-Par 2009 Parallel Processing: 15th International Euro-Par Conference.
Bailey, D., et al. (1995). The NAS Parallel Benchmarks 2.0. Moffett Field, CA: NAS Systems Division, NASA Ames Research Center.
Duque, E. P., et al. (2012). Ifdt–intelligent in-situ feature detection, extraction, tracking and visualization for turbulent flow simulations. In 7th International Conference on Computational Fluid Dynamics (Vol. 2).
Knupfer, A., et al. (2006). Introducing the open trace format (OTF). In International Conference on Computational Science (pp. 526–533).
Chen, H., et al. (2006). MPIPP: An automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters. In Proceedings of the 20th Annual International Conference on supercomputing (pp. 353–360) ACM.
Vetter, J. S., & McCracken, M. O. (2001). Statistical scalability analysis of communication operations in distributed applications. In Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming (pp. 123–132).
Knupfer, A., & Nagel, W. E. (2005). Construction and compression of complete call graphs for post-mortem program trace analysis. In 2005 International Conference on Parallel Processing (pp. 165–172). IEEE.
Ratn, P., et al. (2008). Preserving time in large-scale communication traces. In Proceedings of the 22nd Annual International Conference on Supercomputing (pp. 46–55). New York, NY, USA: ACM.
Zhai, J., et al. (2009). FACT: Fast communication trace collection for parallel applications through program slicing. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. https://doi.org/10.1145/1654059.1654087
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Zhai, J., Jin, Y., Chen, W., Zheng, W. (2023). Structure-Based Communication Trace Compression. In: Performance Analysis of Parallel Applications for HPC. Springer, Singapore. https://doi.org/10.1007/978-981-99-4366-1_3
Download citation
DOI: https://doi.org/10.1007/978-981-99-4366-1_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4365-4
Online ISBN: 978-981-99-4366-1
eBook Packages: Computer ScienceComputer Science (R0)