Abstract
Performance boosts in HPC nodes have come from making SIMD units wider and aggressively packing more and more cores in each processor. With multiple processors and so many cores it has become necessary to understand and manage process and thread affinity and pinning. However, affinity tools have not been designed specifically for HPC users to quickly evaluate process affinity and execution location. To fill in the gap, three HPC user-friendly tools, core_usage, show_affinity, and amask, have been designed to eliminate barriers that frustrate users and impede users from evaluating and analyzing affinity for applications. These tools focus on providing convenient methods, easy-to-understand affinity representations for large process counts, process locality, and run-time core load with socket aggregation. These tools will significantly help HPC users, developers and site administrators easily monitor processor utilization from an affinity perspective.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
2017 IXPUG US Annual Meeting, Austin, TX, USA (2017). https://www.ixpug.org/events/ixpug-2017-us. Accessed 27 Aug 2019
Linux Documentation: numactl(8): Linux man page (2019). https://linux.die.net/man/8/numactl. Accessed 27 Aug 2019
Linux Documentation: ps(1): Linux man page (2019). https://linux.die.net/man/1/ps. Accessed 27 Aug 2019
Linux Documentation: pthread\(\_\)setaffinity\(\_\)np(3) - Linux man page (2019). https://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html. Accessed 27 Aug 2019
Linux Documentation: sched\(\_\)getaffinity(2): Linux man page (2019). https://linux.die.net/man/2/sched_getaffinity. Accessed 27 Aug 2019
Linux Documentation: sched\(\_\)setaffinity(2): Linux man page (2019). https://linux.die.net/man/2/sched_setaffinity. Accessed 27 Aug 2019
Linux Documentation: taskset(1): Linux man page (2019). https://linux.die.net/man/1/taskset. Accessed 27 Aug 2019
Linux Documentation: top(1) - Linux man page (2019). https://linux.die.net/man/1/top. Accessed 27 Aug 2019
Broquedis, F., et al.: hwloc: A generic framework for managing hardware affinities in HPC applications. In: PDP 2010 - The 18th Euromicro International Conference on Parallel, Distributed and Network- Based Computing (2010)
Evans, T., et al.: Comprehensive resource use monitoring for HPC systems with TACC stats. In: 2014 First International Workshop on HPC User Support Tools, pp. 13–21, November 2014. https://doi.org/10.1109/HUST.2014.7
Hafner, J., Kresse, G.: The Vienna AB-initio simulation program VASP: an efficient and versatile tool for studying the structural, dynamic, and electronic properties of materials. In: Gonis, A., Meike, A., Turchi, P.E.A. (eds.) Properties of Complex Inorganic Solids, pp. 69–82. Springer, Boston (1997). https://doi.org/10.1007/978-1-4615-5943-6_10
Hennessy, J., Patterson, D.: Computer Architecture: A Quantitative Approach. The Morgan Kaufmann Series in Computer Architecture and Design, 6th edn. Elsevier, Amsterdam (2017)
IBM: POWER9 Servers Overview, Scalable servers to meet the business needs of tomorrow (2019). https://www.ibm.com/downloads/cas/KDQRVQRR. Accessed 27 Aug 2019
Intel: Intel Math Kernel Library Developer Reference (2019). https://software.intel.com/en-us/articles/mkl-reference-manual. Accessed 27 Aug 2019
Intel-developers (2019). https://software.intel.com/en-us/mpi-library. Accessed 27 Aug 2019
Lawrence Livermore National Laboratory: Sierra supercomputer (2019). https://computation.llnl.gov/computers/sierra. Accessed 27 Aug 2019
Mvapich-developers (2019). http://mvapich.cse.ohio-state.edu/. Accessed 27 Aug 2019
National Supercomputer Center in Wuxi: The Sunway TaihuLight system (2019). http://www.nsccwx.cn/wxcyw/soft1.php?word=soft&i=46. Accessed 27 Aug 2019
Oak Ridge National Lab: Summit: Oak Ridge National Laboratory’s 200 petaflop supercomputer (2019). https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/. Accessed 27 Aug 2019
OpenMP Architecture Review Board: OpenMP Application Programming Interface, Version 4.5, November 2015 (2015)
OpenMP Architecture Review Board: OpenMP Application Programming Interface, Version 5.0, November 2018 (2018)
Phillips, J.C., et al.: Scalable molecular dynamics with NAMD. J. Comput. Chem. 26, 1781–1802 (2005)
Roehl, T., Treibig, J., Hager, G., Wellein, G.: Overhead analysis of performance counter measurements. In: 43rd International Conference on Parallel Processing Workshops (ICCPW), pp. 176–185, September 2014. https://doi.org/10.1109/ICPPW.2014.34
TACC Staff: TACC: amask project page (2019). https://github.com/TACC/amask/. Accessed 27 Aug 2019
TACC Staff: TACC core\(\_\)usage project page (2019). https://github.com/TACC/core_usage/. Accessed 27 Aug 2019
TACC Staff: TACC show\(\_\)affinity project page (2019). https://github.com/TACC/show_affinity/. Accessed 27 Aug 2019
Texas Advanced Computing Center: Frontera User Guide (2019). https://portal.tacc.utexas.edu/user-guides/frontera. Accessed 27 Aug 2019
Texas Advanced Computing Center: Stampede2 User Guide (2019). https://portal.tacc.utexas.edu/user-guides/stampede2. Accessed 27 Aug 2019
Travis, O.: NumPy: A Guide to NumPy. Trelgol Publishing, USA (2006). http://www.numpy.org/. Accessed 27 Aug 2019
Treibig, J., Hager, G., Wellein, G.: LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego, CA (2010)
Wikipedia contributors: List of Intel CPU microarchitectures (2019). https://en.wikipedia.org/wiki/List_of_Intel_CPU_microarchitectures. Accessed 27 Aug 2019
Wikipedia contributors: The Sunway TaihuLight Supercomputer (2019). https://en.wikipedia.org/wiki/Sunway_TaihuLight. Accessed 27 Aug 2019
Acknowledgments
We would like to thank all our users who worked with these new tools and provided us with constructive feedback and suggestions to make improvements. We would also like to thank our colleagues in the High-Performance Computing group and Advanced Computing Systems group who provided expertise and insight that significantly assisted this work. Particularly, we would like to show our gratitude to Hang Liu, Albert Lu, John Cazes, Robert McLay, Victor Eijkhout, and Bill Barth who helped us design, test, and debug the early versions of these products. We also appreciate the technical writing assistance from Bob Garza.
All these tools are mainly developed and tested on TACC’s supercomputer systems, including Stampede, Stampede2, Lonestar5, Wrangler, Maverick2, and Frontera. The computation of all experiments was supported by the National Science Foundation, through the Frontera (OAC-1818253), Stampede2 (OAC-1540931) and XSEDE (ACI-1953575) awards.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, L., Milfeld, K., Liu, S. (2020). Tools for Monitoring CPU Usage and Affinity in Multicore Supercomputers. In: Juckeland, G., Chandrasekaran, S. (eds) Tools and Techniques for High Performance Computing. HUST SE-HER WIHPC 2019 2019 2019. Communications in Computer and Information Science, vol 1190. Springer, Cham. https://doi.org/10.1007/978-3-030-44728-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-44728-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44727-4
Online ISBN: 978-3-030-44728-1
eBook Packages: Computer ScienceComputer Science (R0)