Skip to main content
Log in

When Fewer Cores Is Faster: A Parametric Study of Undersubscription in High-Performance Computing

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

In the world of high-performance computing, it is known that it can be beneficial to leave a small number of CPU cores unused, a practice termed undersubscription. However, undersubscription is rarely implemented in scientific applications of high-performance computing. We demonstrate the importance of calibrated undersubscription in Computational Fluid Dynamics simulations through the aggregated results of 1844 benchmarks. These benchmarks measured three hardware configurations and five different CFD models. On average, performance increased by 14% (weighted by node count). Performance improvements were most significant at large node counts, particularly when nearing a regime of negative scalability. We found that undersubscription could increase maximum performance by up to 50%; this advantage diminished as node count decreased but remained as much as 13% with a single node. In some cases, maximum performance was achieved with large numbers of free cores—nearly half of the cores in one case. Producing a regression from our dataset, we universally predict the optimal number of free cores as a function of cells per core. This regression achieves a 15% speed increase on average (again weighted by node count).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Abbreviations

CFD:

Computational fluid dynamics

DRAM:

Dynamic random-access memory

HPC:

High-performance computing

MPI:

Message passing interface

NUMA:

Non-uniform memory access

A :

Activity factor

\(c_v\) :

Coefficient of variation

f :

CPU clock frequency

N :

Node count

\(r_u\) :

Undersubscription speed ratio

s :

Standard deviation

\(\textrm{SEM}'\) :

Modified standard error of the mean

V :

CPU core voltage

\(\varepsilon\) :

Parallel efficiency

\(\varepsilon _u\) :

Undersubscribed parallel efficiency

References

  1. Schwarzrock, J., de A. Rocha, H. M. G., Beck, A. C. S., Lorenzon, A. F. Effective Exploration of Thread Throttling and Thread/Page Mapping on NUMA Systems. In 2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) (Yanuca Island, Cuvu, Fiji), IEEE, pp. 239–246 (Dec. 2020)

  2. Wang, W., Davidson, J. W., Soffa, M. L.: Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) (Barcelona) , IEEE, pp. 419–431 (2016)

  3. Heirman, W., Carlson, T. E., Van Craeynest, K., Hur, I., Jaleel, A., Eeckhout, L.: Undersubscribed threading on clustered cache architectures. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) (Orlando), IEEE, pp. 678–689 (2014)

  4. Chadha, G., Mahlke, S., Narayanasamy, S.: When less is more (LIMO):controlled parallelism for improved efficiency. In Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems (Tampere Finland), ACM, pp. 141–150 (2012)

  5. Karl, E., Blaauw, D., Sylvester, D., Mudge, T.: Reliability modeling and management in dynamic microprocessor-based systems. In Proceedings of the 43rd annual conference on Design automation - DAC ’06 (San Francisco), ACM Press, p. 1057 (2006)

  6. Jones, T., Dawson, S., Neely, R., Tuel, W., Brenner, L., Fier, J., Blackmore, R., Caffrey, P., Maskell, B., Tomlinson, P., Roberts, M.: Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System. In Proceedings of the 2003 ACM/IEEE conference on Supercomputing (Phoenix), ACM, p. 10 (2003)

  7. Sahni, O., Carothers, C.D., Shephard, M.S., Jansen, K.E.: Strong scaling analysis of a parallel, unstructured, implicit solver and the influence of the operating system interference. Sci. Program. 17(3), 261–274 (2009)

    Google Scholar 

  8. De, P., Kothari, R., Mann, V.: Identifying sources of Operating System Jitter through fine-grained kernel instrumentation. In 2007 IEEE International Conference on Cluster Computing (Austin), IEEE, pp. 331–340 (2007)

  9. Ferreira, K. B., Bridges, P., Brightwell, R.: Characterizing application sensitivity to OS interference using kernel-level noise injection. In 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (Austin), IEEE, pp. 1–12 (2008)

  10. Ferreira, K.B., Bridges, P.G., Brightwell, R., Pedretti, K.T.: The impact of system design parameters on application noise sensitivity. Clust. Comput. 16(1), 117–129 (2013)

    Article  Google Scholar 

  11. De, P., Mann, V., Mittal, U.: Handling OS jitter on multicore multithreaded systems. In 2009 IEEE International Symposium on Parallel & Distributed Processing (Rome), IEEE, pp. 1–12 (2009)

  12. Hammouda, A., Siegel, A.R., Siegel, S.F.: Noise-tolerant explicit stencil computations for nonuniform process execution rates. ACM Trans. Parallel Comput. 2(1), 1–33 (2015)

    Article  Google Scholar 

  13. Oral, H. S., Wang, F., Dillow, D. A., Miller, R. G., Shipman, G. M., Maxwell, D. E., Becklehimer, J. L., Larkin, J. M., Henseler, D.: Reducing application runtime variability on Jaguar XT5. Tech. rep., Oak Ridge National Lab.(ORNL), Oak Ridge. National \(\ldots\) (2010)

  14. Beckman, P., Iskra, K., Yoshii, K., Coghlan, S., Nataraj, A.: Benchmarking the effects of operating system interference on extreme-scale parallel machines. Clust. Comput. 11(1), 3–16 (2008)

    Article  Google Scholar 

  15. Petrini, F., Kerbyson, D. J., Pakin, S. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q. In Proceedings of the 2003 ACM/IEEE conference on Supercomputing (Phoenix), ACM, p. 55 (2003)

  16. Oyama, Y., Ishiguro, S., Murakami, J., Sasaki, S., Matsumiya, R., Tatebe, O.: Experimental analysis of operating system jitter caused by page reclaim. J. Supercomput. 72(5), 1946–1972 (2016)

    Article  Google Scholar 

  17. Schonherr, J.H., Richling, J., Heiss, H.-U. Dynamic Teams in OpenMP. In 2010 22nd International Symposium on Computer Architecture and High Performance Computing (Petropolis) , IEEE, pp. 231–237 (2010)

  18. Kacinski, R., Strasser, W., Leonard, S., Prichard, R., Truxel, B.: Validation of a human upper airway CFD model for turbulent mixing. J. Fluids Eng. 145, 121203 (2023)

    Article  Google Scholar 

  19. Strasser, W., Kacinski, R., Wilson, D., Petrov, V., Manera, A.: It’s about time: jet interactions in an asymmetrical plenum. Nucl. Technol. (2023). https://doi.org/10.1080/00295450.2023.2238156

    Article  Google Scholar 

  20. Wilson, D., Strasser, W., Prichard, R.: ‘Smart’ transonic atomization and heating of a pulsating non-Newtonian liquid sheet. Chem. Eng. Sci. 281, 119094 (2023)

    Article  Google Scholar 

  21. Turman, E., Strasser, W.: CFD modeling of LDPE autoclave reactor to reduce ethylene decomposition: part 1 validating computational methods. Chem. Eng. Sci. 257, 117720 (2022)

    Article  Google Scholar 

  22. Prichard, R., Strasser, W. A NOVEL HPC SCALING OPTIMIZATION METHODOLOGY. In Proceeding of 7th Thermal and Fluids Engineering Conference (TFEC) (Las Vegas), Begellhouse, pp. 183–192 (2022)

Download references

Author information

Authors and Affiliations

Authors

Contributions

R.P. wrote the main manuscript text and prepared all figures. All authors reviewed the manuscript.

Corresponding author

Correspondence to Reid Prichard.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Prichard, R., Strasser, W. When Fewer Cores Is Faster: A Parametric Study of Undersubscription in High-Performance Computing. Cluster Comput (2024). https://doi.org/10.1007/s10586-024-04353-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10586-024-04353-2

Keywords

Navigation