Structure-Based Communication Trace Compression

Zhai, Jidong; Jin, Yuyang; Chen, Wenguang; Zheng, Weimin

doi:10.1007/978-981-99-4366-1_3

Jidong Zhai⁵,
Yuyang Jin⁵,
Wenguang Chen⁵ &
…
Weimin Zheng⁵

193 Accesses

Abstract

Communication traces are increasingly important, both for the performance analysis and optimization of parallel applications and for designing next-generation HPC systems. Meanwhile, the problem size and the execution scale on supercomputers keep growing, producing a prohibitive volume of communication traces. To reduce the size of communication traces, existing dynamic compression methods introduce large compression overhead as the job scale increases. We propose a hybrid static-dynamic method, called Cypress, which leverages information acquired from static analysis to facilitate more effective and efficient dynamic trace compression. Cypress extracts a program communication structure tree at compile time using inter-procedural analysis. This tree naturally contains crucial iterative computing features such as the loop structure, allowing subsequent runtime compression to “fill in,” in a “top-down” manner, event details into the known communication template. Results show that Cypress reduces intra-process and inter-process compression overhead up to 5\(\times \) and 9\(\times \), respectively, over state-of-the-art dynamic methods while only introducing very low compiling overhead. (Ⓒ 2014 IEEE. Reproduced, with permission, from Jidong Zhai, et al., Cypress: combining static and dynamic analysis for top-down communication trace compression, SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2014.)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Vetter, J. S., & Mueller, F. (2002). Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In Proceedings 16th International Parallel and Distributed Processing Symposium (pp. 853–865).
Google Scholar
Becker, D., et al. (2007). Automatic trace-based performance analysis of metacomputing applications. In 2007 IEEE International Parallel and Distributed Processing Symposium (pp. 1–10). IEEE.
Google Scholar
Snavely, A., et al. (2002). A framework for application performance modeling and prediction. In SC ’02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing (pp. 1–17).
Google Scholar
Choudhury, N., Mehta, Y., & Wilmarth, T. L., et al. (2005). Scaling an optimistic parallel simulation of large-scale interconnection networks. In Proceedings of the Winter Simulation Conference, 2005 (pp. 591–600).
Google Scholar
Susukita, R., Ando, H., & Aoyagi, M., et al. (2008). Performance prediction of large-scale parallel system and application using macro-level simulation. In Proceedings SC’08 (pp. 1–9).
Google Scholar
Zhai, J., Chen, W., & Zheng, W. (2010). PHANTOM: Predicting performance of parallel applications on large-scale parallel machines using a singlenode. In PPoPP.
Google Scholar
Intel Ltd. Intel trace analyzer & collector. http://www.intel.com/cd/software/products/asmo-na/eng/244171.htm
Nagel, W. E., et al. (1996). VAMPIR: Visualization and analysis of MPI resources.
Google Scholar
Shende, S. S., & Malony, A. D. (2006). The TAU parallel performance system. The International Journal of High Performance Computing Applications, 20(2), 287–311.
Article Google Scholar
Mohr, B., & Wolf, F. (2003). KOJAK-A tool set for automatic performance analysis of parallel programs. In Euro-Par 2003 Parallel Processing: 9th International Euro-Par Conference.
Google Scholar
Geimer, M., et al. (2010). The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience, 22(6), 702–719.
Google Scholar
Advanced Simulation and Computing Program. The ASC SMG2000 benchmark code, https://asc.llnl.gov/computing_resources/purple/archive/benchmarks/smg/.
Wylie, B. J. N., Geimer, M., & Wolf, F. (2008). Performance measurement and analysis of large-scale parallel applications on leadership computing systems. Scientific Programming, 16(2–3), 167–181.
Article Google Scholar
Noeth, M., et al. (2009). ScalaTrace: Scalable compression and replay of communication traces for high-performance computing. Journal of Parallel and Distributed Computing, 69(8), 696–710.
Article Google Scholar
Xu, Q., Subhlok, J., & Hammen, N. (2010). Efficient discovery of loop nests in execution traces. In 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (pp. 193–202).
Google Scholar
Krishnamoorthy, S., & Agarwal, K. (2010). Scalable communication trace compression. In Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (pp. 408–417). IEEE Computer Society.
Google Scholar
Shao, S., Jones, A. K., & Melhem, R. G. (2006). A compiler-based communication analysis approach for multiprocessor systems. In Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
Google Scholar
Wu, X., & Mueller, F. (2013). Elastic and scalable tracing and accurate replay of non-deterministic events. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing. ICS’13 (pp. 59–68). ACM.
Google Scholar
Zhai, J., et al. (2014). Cypress: Combining static and dynamic analysis for top-down communication trace compression. In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 143–153). IEEE.
Google Scholar
The LLVM Compiler Framework. http://llvm.org
Muchnick, S. S. (1997). Advanced compiler design and implementation. San Francisco, CA, USA: Morgan Kaufmann Publishers.
Google Scholar
Emami, M., Ghiya, R., & Hendren, L. J. (1994). Context-sensitive interprocedural points-to analysis in the presence of function pointers. In PLDI’94 (pp. 242–256). ACM.
Google Scholar
Alexandrov, A., et al. (1997). LogGP: Incorporating long messages into the LogP model for parallel computation. Journal of Parallel and Distributed Computing, 44(1) , 71–79.
Article MathSciNet Google Scholar
Zhang, J., et al. (2009). Process mapping for MPI collective communications. In Euro-Par 2009 Parallel Processing: 15th International Euro-Par Conference.
Google Scholar
Bailey, D., et al. (1995). The NAS Parallel Benchmarks 2.0. Moffett Field, CA: NAS Systems Division, NASA Ames Research Center.
Google Scholar
Duque, E. P., et al. (2012). Ifdt–intelligent in-situ feature detection, extraction, tracking and visualization for turbulent flow simulations. In 7th International Conference on Computational Fluid Dynamics (Vol. 2).
Google Scholar
Knupfer, A., et al. (2006). Introducing the open trace format (OTF). In International Conference on Computational Science (pp. 526–533).
Google Scholar
Chen, H., et al. (2006). MPIPP: An automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters. In Proceedings of the 20th Annual International Conference on supercomputing (pp. 353–360) ACM.
Google Scholar
Vetter, J. S., & McCracken, M. O. (2001). Statistical scalability analysis of communication operations in distributed applications. In Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming (pp. 123–132).
Google Scholar
Knupfer, A., & Nagel, W. E. (2005). Construction and compression of complete call graphs for post-mortem program trace analysis. In 2005 International Conference on Parallel Processing (pp. 165–172). IEEE.
Google Scholar
Ratn, P., et al. (2008). Preserving time in large-scale communication traces. In Proceedings of the 22nd Annual International Conference on Supercomputing (pp. 46–55). New York, NY, USA: ACM.
Chapter Google Scholar
Zhai, J., et al. (2009). FACT: Fast communication trace collection for parallel applications through program slicing. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. https://doi.org/10.1145/1654059.1654087

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, China
Jidong Zhai, Yuyang Jin, Wenguang Chen & Weimin Zheng

Authors

Jidong Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Yuyang Jin
View author publications
You can also search for this author in PubMed Google Scholar
Wenguang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Weimin Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhai, J., Jin, Y., Chen, W., Zheng, W. (2023). Structure-Based Communication Trace Compression. In: Performance Analysis of Parallel Applications for HPC. Springer, Singapore. https://doi.org/10.1007/978-981-99-4366-1_3

Download citation

DOI: https://doi.org/10.1007/978-981-99-4366-1_3
Published: 19 June 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4365-4
Online ISBN: 978-981-99-4366-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics