Skip to main content

Structure-Based Communication Trace Compression

  • Chapter
  • First Online:
Performance Analysis of Parallel Applications for HPC
  • 193 Accesses

Abstract

Communication traces are increasingly important, both for the performance analysis and optimization of parallel applications and for designing next-generation HPC systems. Meanwhile, the problem size and the execution scale on supercomputers keep growing, producing a prohibitive volume of communication traces. To reduce the size of communication traces, existing dynamic compression methods introduce large compression overhead as the job scale increases. We propose a hybrid static-dynamic method, called Cypress, which leverages information acquired from static analysis to facilitate more effective and efficient dynamic trace compression. Cypress extracts a program communication structure tree at compile time using inter-procedural analysis. This tree naturally contains crucial iterative computing features such as the loop structure, allowing subsequent runtime compression to “fill in,” in a “top-down” manner, event details into the known communication template. Results show that Cypress reduces intra-process and inter-process compression overhead up to 5\(\times \) and 9\(\times \), respectively, over state-of-the-art dynamic methods while only introducing very low compiling overhead. (Ⓒ 2014 IEEE. Reproduced, with permission, from Jidong Zhai, et al., Cypress: combining static and dynamic analysis for top-down communication trace compression, SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2014.)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vetter, J. S., & Mueller, F. (2002). Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In Proceedings 16th International Parallel and Distributed Processing Symposium (pp. 853–865).

    Google Scholar 

  2. Becker, D., et al. (2007). Automatic trace-based performance analysis of metacomputing applications. In 2007 IEEE International Parallel and Distributed Processing Symposium (pp. 1–10). IEEE.

    Google Scholar 

  3. Snavely, A., et al. (2002). A framework for application performance modeling and prediction. In SC ’02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing (pp. 1–17).

    Google Scholar 

  4. Choudhury, N., Mehta, Y., & Wilmarth, T. L., et al. (2005). Scaling an optimistic parallel simulation of large-scale interconnection networks. In Proceedings of the Winter Simulation Conference, 2005 (pp. 591–600).

    Google Scholar 

  5. Susukita, R., Ando, H., & Aoyagi, M., et al. (2008). Performance prediction of large-scale parallel system and application using macro-level simulation. In Proceedings SC’08 (pp. 1–9).

    Google Scholar 

  6. Zhai, J., Chen, W., & Zheng, W. (2010). PHANTOM: Predicting performance of parallel applications on large-scale parallel machines using a singlenode. In PPoPP.

    Google Scholar 

  7. Intel Ltd. Intel trace analyzer & collector. http://www.intel.com/cd/software/products/asmo-na/eng/244171.htm

  8. Nagel, W. E., et al. (1996). VAMPIR: Visualization and analysis of MPI resources.

    Google Scholar 

  9. Shende, S. S., & Malony, A. D. (2006). The TAU parallel performance system. The International Journal of High Performance Computing Applications, 20(2), 287–311.

    Article  Google Scholar 

  10. Mohr, B., & Wolf, F. (2003). KOJAK-A tool set for automatic performance analysis of parallel programs. In Euro-Par 2003 Parallel Processing: 9th International Euro-Par Conference.

    Google Scholar 

  11. Geimer, M., et al. (2010). The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience, 22(6), 702–719.

    Google Scholar 

  12. Advanced Simulation and Computing Program. The ASC SMG2000 benchmark code, https://asc.llnl.gov/computing_resources/purple/archive/benchmarks/smg/.

  13. Wylie, B. J. N., Geimer, M., & Wolf, F. (2008). Performance measurement and analysis of large-scale parallel applications on leadership computing systems. Scientific Programming, 16(2–3), 167–181.

    Article  Google Scholar 

  14. Noeth, M., et al. (2009). ScalaTrace: Scalable compression and replay of communication traces for high-performance computing. Journal of Parallel and Distributed Computing, 69(8), 696–710.

    Article  Google Scholar 

  15. Xu, Q., Subhlok, J., & Hammen, N. (2010). Efficient discovery of loop nests in execution traces. In 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (pp. 193–202).

    Google Scholar 

  16. Krishnamoorthy, S., & Agarwal, K. (2010). Scalable communication trace compression. In Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (pp. 408–417). IEEE Computer Society.

    Google Scholar 

  17. Shao, S., Jones, A. K., & Melhem, R. G. (2006). A compiler-based communication analysis approach for multiprocessor systems. In Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

    Google Scholar 

  18. Wu, X., & Mueller, F. (2013). Elastic and scalable tracing and accurate replay of non-deterministic events. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing. ICS’13 (pp. 59–68). ACM.

    Google Scholar 

  19. Zhai, J., et al. (2014). Cypress: Combining static and dynamic analysis for top-down communication trace compression. In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 143–153). IEEE.

    Google Scholar 

  20. The LLVM Compiler Framework. http://llvm.org

  21. Muchnick, S. S. (1997). Advanced compiler design and implementation. San Francisco, CA, USA: Morgan Kaufmann Publishers.

    Google Scholar 

  22. Emami, M., Ghiya, R., & Hendren, L. J. (1994). Context-sensitive interprocedural points-to analysis in the presence of function pointers. In PLDI’94 (pp. 242–256). ACM.

    Google Scholar 

  23. Alexandrov, A., et al. (1997). LogGP: Incorporating long messages into the LogP model for parallel computation. Journal of Parallel and Distributed Computing, 44(1) , 71–79.

    Article  MathSciNet  Google Scholar 

  24. Zhang, J., et al. (2009). Process mapping for MPI collective communications. In Euro-Par 2009 Parallel Processing: 15th International Euro-Par Conference.

    Google Scholar 

  25. Bailey, D., et al. (1995). The NAS Parallel Benchmarks 2.0. Moffett Field, CA: NAS Systems Division, NASA Ames Research Center.

    Google Scholar 

  26. Duque, E. P., et al. (2012). Ifdt–intelligent in-situ feature detection, extraction, tracking and visualization for turbulent flow simulations. In 7th International Conference on Computational Fluid Dynamics (Vol. 2).

    Google Scholar 

  27. Knupfer, A., et al. (2006). Introducing the open trace format (OTF). In International Conference on Computational Science (pp. 526–533).

    Google Scholar 

  28. Chen, H., et al. (2006). MPIPP: An automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters. In Proceedings of the 20th Annual International Conference on supercomputing (pp. 353–360) ACM.

    Google Scholar 

  29. Vetter, J. S., & McCracken, M. O. (2001). Statistical scalability analysis of communication operations in distributed applications. In Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming (pp. 123–132).

    Google Scholar 

  30. Knupfer, A., & Nagel, W. E. (2005). Construction and compression of complete call graphs for post-mortem program trace analysis. In 2005 International Conference on Parallel Processing (pp. 165–172). IEEE.

    Google Scholar 

  31. Ratn, P., et al. (2008). Preserving time in large-scale communication traces. In Proceedings of the 22nd Annual International Conference on Supercomputing (pp. 46–55). New York, NY, USA: ACM.

    Chapter  Google Scholar 

  32. Zhai, J., et al. (2009). FACT: Fast communication trace collection for parallel applications through program slicing. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. https://doi.org/10.1145/1654059.1654087

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zhai, J., Jin, Y., Chen, W., Zheng, W. (2023). Structure-Based Communication Trace Compression. In: Performance Analysis of Parallel Applications for HPC. Springer, Singapore. https://doi.org/10.1007/978-981-99-4366-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-4366-1_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-4365-4

  • Online ISBN: 978-981-99-4366-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics