Skip to main content
  • 187 Accesses

Abstract

Performance analysis is essential for understanding the performance behaviors of large-scale parallel applications on modern supercomputers. Current performance analysis techniques are based on either profiling or tracing. Profiling incurs low costs during runtime but misses important information for identifying underlying bottlenecks, while tracing brings unacceptable overhead at large scales. In this book, we leverage static information, such as program structures and data dependence, from source codes and executable binaries to guide dynamic analysis, which achieves the analyzability of tracing with the overhead of profiling. We apply this approach to many performance analysis tasks, including memory monitoring, communication analysis, scalability analysis, and noise detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. TOP500 website (2020). http://top500.org/

  2. Vetter, J., & Chambreau, C. (2005). mpip: Lightweight, scalable MPI profiling.

    Google Scholar 

  3. Tallent, N. R., Adhianto, L., & Mellor-Crummey, J. M. (2010). Scalable identification of load imbalance in parallel executions using call path profiles. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1–11). IEEE Computer Society.

    Google Scholar 

  4. Tallent, N. R., et al. (2009). Diagnosing performance bottlenecks in emerging petascale applications. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (pp. 1–11). IEEE.

    Google Scholar 

  5. Intel Trace Analyzer and Collector. https://software.intel.com/en-us/trace-analyzer

  6. Zhai, J., Chen, W., & Zheng, W. (2010). PHANTOM: Predicting performance of parallel applications on large-scale parallel machines using a single node. In PPoPP.

    Google Scholar 

  7. Geimer, M., et al. (2010). The scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience, 22(6), 702–719.

    Google Scholar 

  8. Linford, J. C., et al. (2017). Performance analysis of openSHMEM applications with TAU commander. In Workshop on OpenSHMEM and Related Technologies (pp. 161–179). Springer.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zhai, J., Jin, Y., Chen, W., Zheng, W. (2023). Background and Overview. In: Performance Analysis of Parallel Applications for HPC. Springer, Singapore. https://doi.org/10.1007/978-981-99-4366-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-4366-1_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-4365-4

  • Online ISBN: 978-981-99-4366-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics