Lightweight Noise Detection

Zhai, Jidong; Jin, Yuyang; Chen, Wenguang; Zheng, Weimin

doi:10.1007/978-981-99-4366-1_7

Jidong Zhai⁵,
Yuyang Jin⁵,
Wenguang Chen⁵ &
…
Weimin Zheng⁵

200 Accesses

Abstract

Performance variance of parallel and distributed systems is becoming increasingly severe. The runtimes of different executions can vary greatly even with a fixed number of computing nodes. Many HPC applications on supercomputers exhibit such variance. Efficient online performance variance detection is an open problem in HPC research. To solve it, we propose an approach, called vSensor, to detect the performance variance of systems. The key finding of this study is that the source code of programs can better represent performance at runtime than an external detector. Specifically, many HPC applications contain code snippets that are fixed-workload patterns of execution, e.g., the workload of an invariant quantity and a linearly growing workload. This observation allows us to automatically identify these snippets of workload-related code and use them to detect performance variance. We evaluate vSensor on the Tianhe-2A system with a large number of parallel applications, and the results indicate that it can efficiently identify variations in system performance. The average overhead of 4,096 processes is less than 6% for fixed-workload v-sensors. We identify a problematic node with slow memory and network issues on Tianhe-2A system with vSensor that degrade programs’ performance by 21% and 3.37\(\times \), respectively. (Ⓒ 2022 IEEE. Reproduced, with permission, from Jidong Zhai et al., Leveraging code snippets to detect variations in the performance of HPC systems, IEEE Transactions on Parallel and Distributed Systems, 2022.)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Petrini, F., Kerbyson, D. J., & Pakin, S. (2003). The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of ASCI Q. In Proceedings of the 2003 ACM/IEEE Conference on Supercomputing. SC’03. Phoenix, AZ, USA: ACM.
Google Scholar
Ferreira, K. B., et al. (2013). The impact of system design parameters on application noise sensitivity. In 2010 IEEE International Conference on Cluster Computing (Vol. 16, No 1, pp. 117–129).
Google Scholar
Mondragon, O. H., et al. (2016). Understanding performance interference in next-generation HPC systems. In SC16: International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 384–395). IEEE.
Google Scholar
Wright, N. J., et al. (2009). Measuring and understanding variation in benchmark performance. In DoD High Performance Computing Modernization Program Users Group Conference (HPCMP-UGC), 2009 (pp. 438–443). IEEE.
Google Scholar
TOP500 website (2020). http://top500.org/.
Skinner, D., & Kramer, W. (2005). Understanding the causes of performance variability in HPC workloads. In Proceedings of the IEEE International Workload Characterization Symposium, 2005 (pp. 137–149) IEEE.
Google Scholar
Hoefler, T., Schneider, T., & Lumsdaine, A. (2010). Characterizing the influence of system noise on large-scale applications by simulation. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. SC’10 (pp. 1–11).
Google Scholar
Gong, Y., He, B., & Li, D. (2014). Finding constant from change: Revisiting network performance aware optimizations on iaas clouds. In SC14: International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 982–993). IEEE.
Google Scholar
Jones, T. R., Brenner, L. B., & Fier, J. M. (2003). Impacts of operating systems on the scalability of parallel applications. In Lawrence Livermore National Laboratory, Technical Report UCRL-MI-202629.
Google Scholar
Tallent, N. R., Adhianto, L., & Mellor-Crummey, J. M. (2010). Scalable identification of load imbalance in parallel executions using call path profiles. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1–11). IEEE Computer Society.
Google Scholar
Wylie, B. J. N., Geimer, M., & Wolf, F. (2008). Performance measurement and analysis of large-scale parallel applications on leadership computing systems. Scientific Programming, 16(2–3), 167–181.
Article Google Scholar
Geimer, M., et al. (2010). The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience, 22(6), 702–719.
Google Scholar
Zhai, J., et al. (2014). Cypress: Combining static and dynamic analysis for top-down communication trace compression. In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 143–153). IEEE.
Google Scholar
Zhai, J., et al. (2022). Leveraging code snippets to detect variations in the performance of HPC systems. IEEE Transactions on Parallel and Distributed Systems, 33(12), 3558–3574.
Article Google Scholar
Lattner, C., & Adve, V. (2004). LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (p. 75). IEEE Computer Society.
Google Scholar
MPI Documents. http://mpi-forum.org/docs/
Mucci, P., et al. (2004). Automating the large-scale collection and analysis of performance data on Linux clusters1. In Proceedings of the 5th LCI International Conference on Linux Clusters: The HPC Revolution.
Google Scholar
Bailey, D., et al. (1995). The NAS Parallel Benchmarks 2.0. Moffett Field, CA: NAS Systems Division, NASA Ames Research Center.
Google Scholar
Pfeiffer, W., & Stamatakis, A. (2010). Hybrid MPI/Pthreads parallelization of the RAxML phylogenetics code. In 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)(pp. 1–8). IEEE.
Google Scholar
Yang, U. M., et al. (2002). BoomerAMG: A parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics, 41(1), 155–177.
Article MathSciNet Google Scholar
Karlin, I., Keasler, J., & Neely, J. R. (2013). Lulesh 2.0 updates and changes. Technical Report Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States).
Google Scholar
Weaver, V. M., Terpstra, D., & Moore, S. (2013). Non-determinism and overcount on modern hardware performance counter implementations. In 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (pp. 215–224). IEEE.
Google Scholar
Vetter, J., & Chambreau, C. (2005). mpip: Lightweight, scalable MPI profiling.
Google Scholar
Intel Trace Analyzer and Collector. https://software.intel.com/en-us/trace-analyzer
Tsafrir, D., et al. (2005). System noise, OS clock ticks, and fine-grained parallel applications. In Proceedings of the 19th Annual International Conference on Supercomputing. ICS’05 (pp. 303–312). New York, NY, USA: ACM. ISBN: 1-59593-167-8.
Google Scholar
Jones, T. (2012). Linux kernel co-scheduling and bulk synchronous parallelism. International Journal of High Performance Computing Applications, 26, 1094342011433523.
Article Google Scholar
Agarwal, S., Garg, R., & Vishnoi, N. K. (2005). The impact of noise on the scaling of collectives: A theoretical approach. In High Performance Computing–HiPC 2005 (pp. 280–289). Springer.
Google Scholar
Beckman, P., et al. (2006). The influence of operating systems on the performance of collective operations at extreme scale. In 2006 IEEE International Conference on Cluster Computing (pp. 1–12). IEEE.
Google Scholar
Phillips, J. C., et al. (2002). NAMD: Biomolecular simulation on thousands of processors. In Supercomputing, ACM/IEEE 2002 Conference (pp. 36–36).
Google Scholar
Lo, Y. J., et al. (2014). Roofline model toolkit: A practical tool for architectural and program analysis. In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (pp. 129–148). Springer.
Google Scholar
Williams, S., Waterman, A., & Patterson, D. (2009). Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM, 52(4), 65–76.
Article Google Scholar
Calotoiu, A., et al. (2016). Fast multi-parameter performance modeling. In 2016 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 172–181). IEEE.
Google Scholar
Yeom, J.-S., et al. (2016). Data-driven performance modeling of linear solvers for sparse matrices. In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) (pp. 32–42). IEEE.
Google Scholar
Lee, S., Meredith, J. S., & Vetter, J. S. (2015). Compass: A framework for automated performance modeling and prediction. In Proceedings of the 29th ACM on International Conference on Supercomputing (pp. 405–414). ACM.
Google Scholar
Wu, X., & Mueller, F. (2013). Elastic and scalable tracing and accurate replay of non-deterministic events. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing. ICS’13 (pp. 59–68). ACM.
Google Scholar
Tallent, N. R., et al. (2011). Scalable fine-grained call path tracing. In Proceedings of the International Conference on Supercomputing (pp. 63–74). ACM.
Google Scholar
Mitra, S., et al. (2014). Accurate application progress analysis for large-scale parallel debugging. ACM SIGPLAN Notices, 49(6), 193–203. ACM.
Google Scholar
Laguna, I., et al. (2015). Diagnosis of performance faults in LargeScale MPI applications via probabilistic progress-dependence inference. IEEE Transactions on Parallel and Distributed Systems, 26(5), 1280–1289.
Article Google Scholar
Arnold, D. C., et al. (2007). Stack trace analysis for large scale debugging. In 2007 IEEE International Parallel and Distributed Processing Symposium (pp. 1–10). IEEE.
Google Scholar
Dean, D. J., et al. (2014). Perfscope: Practical online server performance bug inference in production cloud computing infrastructures. In Proceedings of the ACM Symposium on Cloud Computing (pp. 1–13).
Google Scholar
Sahoo, S. K. et al. (2013). Using likely invariants for automated software fault localization. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 139–152).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, China
Jidong Zhai, Yuyang Jin, Wenguang Chen & Weimin Zheng

Authors

Jidong Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Yuyang Jin
View author publications
You can also search for this author in PubMed Google Scholar
Wenguang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Weimin Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhai, J., Jin, Y., Chen, W., Zheng, W. (2023). Lightweight Noise Detection. In: Performance Analysis of Parallel Applications for HPC. Springer, Singapore. https://doi.org/10.1007/978-981-99-4366-1_7

Download citation

DOI: https://doi.org/10.1007/978-981-99-4366-1_7
Published: 19 June 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4365-4
Online ISBN: 978-981-99-4366-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics