When Fewer Cores Is Faster: A Parametric Study of Undersubscription in High-Performance Computing

Prichard, Reid; Strasser, Wayne

doi:10.1007/s10586-024-04353-2

When Fewer Cores Is Faster: A Parametric Study of Undersubscription in High-Performance Computing

Published: 16 April 2024

(2024)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Reid Prichard¹ &
Wayne Strasser¹

24 Accesses
Explore all metrics

Abstract

In the world of high-performance computing, it is known that it can be beneficial to leave a small number of CPU cores unused, a practice termed undersubscription. However, undersubscription is rarely implemented in scientific applications of high-performance computing. We demonstrate the importance of calibrated undersubscription in Computational Fluid Dynamics simulations through the aggregated results of 1844 benchmarks. These benchmarks measured three hardware configurations and five different CFD models. On average, performance increased by 14% (weighted by node count). Performance improvements were most significant at large node counts, particularly when nearing a regime of negative scalability. We found that undersubscription could increase maximum performance by up to 50%; this advantage diminished as node count decreased but remained as much as 13% with a single node. In some cases, maximum performance was achieved with large numbers of free cores—nearly half of the cores in one case. Producing a regression from our dataset, we universally predict the optimal number of free cores as a function of cells per core. This regression achieves a 15% speed increase on average (again weighted by node count).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Catwalk: A Quick Development Path for Performance Models

High-Performance Computing Basics

Fundamentals of High-Performance Computing for Finite Element Analysis

Abbreviations

CFD:: Computational fluid dynamics
DRAM:: Dynamic random-access memory
HPC:: High-performance computing
MPI:: Message passing interface
NUMA:: Non-uniform memory access
A :: Activity factor
\(c_v\) :: Coefficient of variation
f :: CPU clock frequency
N :: Node count
\(r_u\) :: Undersubscription speed ratio
s :: Standard deviation
\(\textrm{SEM}'\) :: Modified standard error of the mean
V :: CPU core voltage
\(\varepsilon\) :: Parallel efficiency
\(\varepsilon _u\) :: Undersubscribed parallel efficiency

References

Schwarzrock, J., de A. Rocha, H. M. G., Beck, A. C. S., Lorenzon, A. F. Effective Exploration of Thread Throttling and Thread/Page Mapping on NUMA Systems. In 2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) (Yanuca Island, Cuvu, Fiji), IEEE, pp. 239–246 (Dec. 2020)
Wang, W., Davidson, J. W., Soffa, M. L.: Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) (Barcelona) , IEEE, pp. 419–431 (2016)
Heirman, W., Carlson, T. E., Van Craeynest, K., Hur, I., Jaleel, A., Eeckhout, L.: Undersubscribed threading on clustered cache architectures. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) (Orlando), IEEE, pp. 678–689 (2014)
Chadha, G., Mahlke, S., Narayanasamy, S.: When less is more (LIMO):controlled parallelism for improved efficiency. In Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems (Tampere Finland), ACM, pp. 141–150 (2012)
Karl, E., Blaauw, D., Sylvester, D., Mudge, T.: Reliability modeling and management in dynamic microprocessor-based systems. In Proceedings of the 43rd annual conference on Design automation - DAC ’06 (San Francisco), ACM Press, p. 1057 (2006)
Jones, T., Dawson, S., Neely, R., Tuel, W., Brenner, L., Fier, J., Blackmore, R., Caffrey, P., Maskell, B., Tomlinson, P., Roberts, M.: Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System. In Proceedings of the 2003 ACM/IEEE conference on Supercomputing (Phoenix), ACM, p. 10 (2003)
Sahni, O., Carothers, C.D., Shephard, M.S., Jansen, K.E.: Strong scaling analysis of a parallel, unstructured, implicit solver and the influence of the operating system interference. Sci. Program. 17(3), 261–274 (2009)
Google Scholar
De, P., Kothari, R., Mann, V.: Identifying sources of Operating System Jitter through fine-grained kernel instrumentation. In 2007 IEEE International Conference on Cluster Computing (Austin), IEEE, pp. 331–340 (2007)
Ferreira, K. B., Bridges, P., Brightwell, R.: Characterizing application sensitivity to OS interference using kernel-level noise injection. In 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (Austin), IEEE, pp. 1–12 (2008)
Ferreira, K.B., Bridges, P.G., Brightwell, R., Pedretti, K.T.: The impact of system design parameters on application noise sensitivity. Clust. Comput. 16(1), 117–129 (2013)
Article Google Scholar
De, P., Mann, V., Mittal, U.: Handling OS jitter on multicore multithreaded systems. In 2009 IEEE International Symposium on Parallel & Distributed Processing (Rome), IEEE, pp. 1–12 (2009)
Hammouda, A., Siegel, A.R., Siegel, S.F.: Noise-tolerant explicit stencil computations for nonuniform process execution rates. ACM Trans. Parallel Comput. 2(1), 1–33 (2015)
Article Google Scholar
Oral, H. S., Wang, F., Dillow, D. A., Miller, R. G., Shipman, G. M., Maxwell, D. E., Becklehimer, J. L., Larkin, J. M., Henseler, D.: Reducing application runtime variability on Jaguar XT5. Tech. rep., Oak Ridge National Lab.(ORNL), Oak Ridge. National \(\ldots\) (2010)
Beckman, P., Iskra, K., Yoshii, K., Coghlan, S., Nataraj, A.: Benchmarking the effects of operating system interference on extreme-scale parallel machines. Clust. Comput. 11(1), 3–16 (2008)
Article Google Scholar
Petrini, F., Kerbyson, D. J., Pakin, S. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q. In Proceedings of the 2003 ACM/IEEE conference on Supercomputing (Phoenix), ACM, p. 55 (2003)
Oyama, Y., Ishiguro, S., Murakami, J., Sasaki, S., Matsumiya, R., Tatebe, O.: Experimental analysis of operating system jitter caused by page reclaim. J. Supercomput. 72(5), 1946–1972 (2016)
Article Google Scholar
Schonherr, J.H., Richling, J., Heiss, H.-U. Dynamic Teams in OpenMP. In 2010 22nd International Symposium on Computer Architecture and High Performance Computing (Petropolis) , IEEE, pp. 231–237 (2010)
Kacinski, R., Strasser, W., Leonard, S., Prichard, R., Truxel, B.: Validation of a human upper airway CFD model for turbulent mixing. J. Fluids Eng. 145, 121203 (2023)
Article Google Scholar
Strasser, W., Kacinski, R., Wilson, D., Petrov, V., Manera, A.: It’s about time: jet interactions in an asymmetrical plenum. Nucl. Technol. (2023). https://doi.org/10.1080/00295450.2023.2238156
Article Google Scholar
Wilson, D., Strasser, W., Prichard, R.: ‘Smart’ transonic atomization and heating of a pulsating non-Newtonian liquid sheet. Chem. Eng. Sci. 281, 119094 (2023)
Article Google Scholar
Turman, E., Strasser, W.: CFD modeling of LDPE autoclave reactor to reduce ethylene decomposition: part 1 validating computational methods. Chem. Eng. Sci. 257, 117720 (2022)
Article Google Scholar
Prichard, R., Strasser, W. A NOVEL HPC SCALING OPTIMIZATION METHODOLOGY. In Proceeding of 7th Thermal and Fluids Engineering Conference (TFEC) (Las Vegas), Begellhouse, pp. 183–192 (2022)

Download references

Author information

Authors and Affiliations

Liberty University, Lynchburg, VA, USA
Reid Prichard & Wayne Strasser

Authors

Reid Prichard
View author publications
You can also search for this author in PubMed Google Scholar
Wayne Strasser
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.P. wrote the main manuscript text and prepared all figures. All authors reviewed the manuscript.

Corresponding author

Correspondence to Reid Prichard.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Prichard, R., Strasser, W. When Fewer Cores Is Faster: A Parametric Study of Undersubscription in High-Performance Computing. Cluster Comput (2024). https://doi.org/10.1007/s10586-024-04353-2

Download citation

Received: 24 August 2023
Revised: 17 January 2024
Accepted: 10 February 2024
Published: 16 April 2024
DOI: https://doi.org/10.1007/s10586-024-04353-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

When Fewer Cores Is Faster: A Parametric Study of Undersubscription in High-Performance Computing

Abstract

Access this article

Similar content being viewed by others

Catwalk: A Quick Development Path for Performance Models

High-Performance Computing Basics

Fundamentals of High-Performance Computing for Finite Element Analysis

Abbreviations

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

When Fewer Cores Is Faster: A Parametric Study of Undersubscription in High-Performance Computing

Abstract

Access this article

Similar content being viewed by others

Catwalk: A Quick Development Path for Performance Models

High-Performance Computing Basics

Fundamentals of High-Performance Computing for Finite Element Analysis

Abbreviations

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation