Skip to main content

Lightweight Noise Detection

  • Chapter
  • First Online:
Performance Analysis of Parallel Applications for HPC
  • 200 Accesses

Abstract

Performance variance of parallel and distributed systems is becoming increasingly severe. The runtimes of different executions can vary greatly even with a fixed number of computing nodes. Many HPC applications on supercomputers exhibit such variance. Efficient online performance variance detection is an open problem in HPC research. To solve it, we propose an approach, called vSensor, to detect the performance variance of systems. The key finding of this study is that the source code of programs can better represent performance at runtime than an external detector. Specifically, many HPC applications contain code snippets that are fixed-workload patterns of execution, e.g., the workload of an invariant quantity and a linearly growing workload. This observation allows us to automatically identify these snippets of workload-related code and use them to detect performance variance. We evaluate vSensor on the Tianhe-2A system with a large number of parallel applications, and the results indicate that it can efficiently identify variations in system performance. The average overhead of 4,096 processes is less than 6% for fixed-workload v-sensors. We identify a problematic node with slow memory and network issues on Tianhe-2A system with vSensor that degrade programs’ performance by 21% and 3.37\(\times \), respectively. (Ⓒ 2022 IEEE. Reproduced, with permission, from Jidong Zhai et al., Leveraging code snippets to detect variations in the performance of HPC systems, IEEE Transactions on Parallel and Distributed Systems, 2022.)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Petrini, F., Kerbyson, D. J., & Pakin, S. (2003). The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of ASCI Q. In Proceedings of the 2003 ACM/IEEE Conference on Supercomputing. SC’03. Phoenix, AZ, USA: ACM.

    Google Scholar 

  2. Ferreira, K. B., et al. (2013). The impact of system design parameters on application noise sensitivity. In 2010 IEEE International Conference on Cluster Computing (Vol. 16, No 1, pp. 117–129).

    Google Scholar 

  3. Mondragon, O. H., et al. (2016). Understanding performance interference in next-generation HPC systems. In SC16: International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 384–395). IEEE.

    Google Scholar 

  4. Wright, N. J., et al. (2009). Measuring and understanding variation in benchmark performance. In DoD High Performance Computing Modernization Program Users Group Conference (HPCMP-UGC), 2009 (pp. 438–443). IEEE.

    Google Scholar 

  5. TOP500 website (2020). http://top500.org/.

  6. Skinner, D., & Kramer, W. (2005). Understanding the causes of performance variability in HPC workloads. In Proceedings of the IEEE International Workload Characterization Symposium, 2005 (pp. 137–149) IEEE.

    Google Scholar 

  7. Hoefler, T., Schneider, T., & Lumsdaine, A. (2010). Characterizing the influence of system noise on large-scale applications by simulation. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. SC’10 (pp. 1–11).

    Google Scholar 

  8. Gong, Y., He, B., & Li, D. (2014). Finding constant from change: Revisiting network performance aware optimizations on iaas clouds. In SC14: International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 982–993). IEEE.

    Google Scholar 

  9. Jones, T. R., Brenner, L. B., & Fier, J. M. (2003). Impacts of operating systems on the scalability of parallel applications. In Lawrence Livermore National Laboratory, Technical Report UCRL-MI-202629.

    Google Scholar 

  10. Tallent, N. R., Adhianto, L., & Mellor-Crummey, J. M. (2010). Scalable identification of load imbalance in parallel executions using call path profiles. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1–11). IEEE Computer Society.

    Google Scholar 

  11. Wylie, B. J. N., Geimer, M., & Wolf, F. (2008). Performance measurement and analysis of large-scale parallel applications on leadership computing systems. Scientific Programming, 16(2–3), 167–181.

    Article  Google Scholar 

  12. Geimer, M., et al. (2010). The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience, 22(6), 702–719.

    Google Scholar 

  13. Zhai, J., et al. (2014). Cypress: Combining static and dynamic analysis for top-down communication trace compression. In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 143–153). IEEE.

    Google Scholar 

  14. Zhai, J., et al. (2022). Leveraging code snippets to detect variations in the performance of HPC systems. IEEE Transactions on Parallel and Distributed Systems, 33(12), 3558–3574.

    Article  Google Scholar 

  15. Lattner, C., & Adve, V. (2004). LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (p. 75). IEEE Computer Society.

    Google Scholar 

  16. MPI Documents. http://mpi-forum.org/docs/

  17. Mucci, P., et al. (2004). Automating the large-scale collection and analysis of performance data on Linux clusters1. In Proceedings of the 5th LCI International Conference on Linux Clusters: The HPC Revolution.

    Google Scholar 

  18. Bailey, D., et al. (1995). The NAS Parallel Benchmarks 2.0. Moffett Field, CA: NAS Systems Division, NASA Ames Research Center.

    Google Scholar 

  19. Pfeiffer, W., & Stamatakis, A. (2010). Hybrid MPI/Pthreads parallelization of the RAxML phylogenetics code. In 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)(pp. 1–8). IEEE.

    Google Scholar 

  20. Yang, U. M., et al. (2002). BoomerAMG: A parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics, 41(1), 155–177.

    Article  MathSciNet  Google Scholar 

  21. Karlin, I., Keasler, J., & Neely, J. R. (2013). Lulesh 2.0 updates and changes. Technical Report Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States).

    Google Scholar 

  22. Weaver, V. M., Terpstra, D., & Moore, S. (2013). Non-determinism and overcount on modern hardware performance counter implementations. In 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (pp. 215–224). IEEE.

    Google Scholar 

  23. Vetter, J., & Chambreau, C. (2005). mpip: Lightweight, scalable MPI profiling.

    Google Scholar 

  24. Intel Trace Analyzer and Collector. https://software.intel.com/en-us/trace-analyzer

  25. Tsafrir, D., et al. (2005). System noise, OS clock ticks, and fine-grained parallel applications. In Proceedings of the 19th Annual International Conference on Supercomputing. ICS’05 (pp. 303–312). New York, NY, USA: ACM. ISBN: 1-59593-167-8.

    Google Scholar 

  26. Jones, T. (2012). Linux kernel co-scheduling and bulk synchronous parallelism. International Journal of High Performance Computing Applications, 26, 1094342011433523.

    Article  Google Scholar 

  27. Agarwal, S., Garg, R., & Vishnoi, N. K. (2005). The impact of noise on the scaling of collectives: A theoretical approach. In High Performance Computing–HiPC 2005 (pp. 280–289). Springer.

    Google Scholar 

  28. Beckman, P., et al. (2006). The influence of operating systems on the performance of collective operations at extreme scale. In 2006 IEEE International Conference on Cluster Computing (pp. 1–12). IEEE.

    Google Scholar 

  29. Phillips, J. C., et al. (2002). NAMD: Biomolecular simulation on thousands of processors. In Supercomputing, ACM/IEEE 2002 Conference (pp. 36–36).

    Google Scholar 

  30. Lo, Y. J., et al. (2014). Roofline model toolkit: A practical tool for architectural and program analysis. In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (pp. 129–148). Springer.

    Google Scholar 

  31. Williams, S., Waterman, A., & Patterson, D. (2009). Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM, 52(4), 65–76.

    Article  Google Scholar 

  32. Calotoiu, A., et al. (2016). Fast multi-parameter performance modeling. In 2016 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 172–181). IEEE.

    Google Scholar 

  33. Yeom, J.-S., et al. (2016). Data-driven performance modeling of linear solvers for sparse matrices. In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) (pp. 32–42). IEEE.

    Google Scholar 

  34. Lee, S., Meredith, J. S., & Vetter, J. S. (2015). Compass: A framework for automated performance modeling and prediction. In Proceedings of the 29th ACM on International Conference on Supercomputing (pp. 405–414). ACM.

    Google Scholar 

  35. Wu, X., & Mueller, F. (2013). Elastic and scalable tracing and accurate replay of non-deterministic events. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing. ICS’13 (pp. 59–68). ACM.

    Google Scholar 

  36. Tallent, N. R., et al. (2011). Scalable fine-grained call path tracing. In Proceedings of the International Conference on Supercomputing (pp. 63–74). ACM.

    Google Scholar 

  37. Mitra, S., et al. (2014). Accurate application progress analysis for large-scale parallel debugging. ACM SIGPLAN Notices, 49(6), 193–203. ACM.

    Google Scholar 

  38. Laguna, I., et al. (2015). Diagnosis of performance faults in LargeScale MPI applications via probabilistic progress-dependence inference. IEEE Transactions on Parallel and Distributed Systems, 26(5), 1280–1289.

    Article  Google Scholar 

  39. Arnold, D. C., et al. (2007). Stack trace analysis for large scale debugging. In 2007 IEEE International Parallel and Distributed Processing Symposium (pp. 1–10). IEEE.

    Google Scholar 

  40. Dean, D. J., et al. (2014). Perfscope: Practical online server performance bug inference in production cloud computing infrastructures. In Proceedings of the ACM Symposium on Cloud Computing (pp. 1–13).

    Google Scholar 

  41. Sahoo, S. K. et al. (2013). Using likely invariants for automated software fault localization. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 139–152).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zhai, J., Jin, Y., Chen, W., Zheng, W. (2023). Lightweight Noise Detection. In: Performance Analysis of Parallel Applications for HPC. Springer, Singapore. https://doi.org/10.1007/978-981-99-4366-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-4366-1_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-4365-4

  • Online ISBN: 978-981-99-4366-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics