FILL: a heterogeneous resource scheduling system addressing the low throughput problem in GROMACS

Zhou, Yueyuan; Ren, ZiYi; Shao, En; Ma, Lixian; Hu, Qiang; Wang, Leping; Tan, Guangming

doi:10.1007/s42514-023-00169-5

FILL: a heterogeneous resource scheduling system addressing the low throughput problem in GROMACS

Regular Paper
Published: 23 September 2023

Volume 6, pages 17–31, (2024)
Cite this article

CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Yueyuan Zhou ORCID: orcid.org/0009-0005-1735-5072¹^na1,
ZiYi Ren^1,2^na1,
En Shao^1,2,3^na1,
Lixian Ma¹,
Qiang Hu^1,2,
Leping Wang¹ &
…
Guangming Tan^1,2

174 Accesses
1 Citation
Explore all metrics

Abstract

Despite advancements in computer hardware, the performance of GROMACS simulations has not exhibited significant improvement, primarily due to the inefficient utilization of substantial hardware resources. Enhancing resource utilization in GROMACS simulations can be achieved through effective resource scheduling when running multiple simulations concurrently on a single computing node, particularly benefiting small-scale system simulations which are frequently employed. Previous research focused on co-running multiple GROMACS simulations through the utilization of time-slice technology. However, this approach introduced notable context-switching overhead and predominantly concentrated on optimizing GPU resources utilization, while neglecting the collaborative scheduling of heterogeneous CPU and GPU devices. Nowadays, various GPU vendors have introduced hardware partitioning technologies for spatial resources allocation, complementing traditional time-sharing techniques. Moreover, GROMACS operates as a heterogeneous computing application, alternating computations between the CPU and GPU devices. Notably, GPU utilization sometimes accounts for as little as 35%. Consequently, a comprehensive approach involving coordinated scheduling between both the GPU and CPU is imperative. To leverage the potential of hardware partitioning technologies in alignment with GROMACS’ runtime characteristics, we propose FILL: a resource scheduling system designed for co-running multiple GROMACS jobs. FILL employs space partitioning technology to effectively allocate hardware resources and facilitates collaborative scheduling of CPU and GPU resources, thereby ensuring precise and deterministic allocation of GROMACS job resources. The scheduling aims to improve system throughput while considering the turnaround time of simulations. Implemented on servers equipped with NVIDIA and AMD GPUs, FILL has showcased noteworthy advancements in system throughput. On NVIDIA GPU servers, FILL achieved an impressive improvement of up to 167% compared to the baseline approach and an astonishing boost of 27,928% compared to state-of-the-art alternatives. Similarly, on AMD GPU servers, FILL demonstrated significant enhancements of 459% and 24% over the baseline and state-of-the-art methods, respectively. These remarkable results validate the effectiveness of FILL in optimizing system throughput for multiple GROMACS simulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

E-OSched: a load balancing scheduler for heterogeneous multicores

Article 23 May 2018

Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-core Architectures

Cluster-aware scheduling in multitasking GPUs

Article 22 November 2023

References

Abraham, M.J., Gready, J.E.: Optimization of parameters for molecular dynamics simulation using smooth particle-mesh Ewald in GROMACS 4.5. J. Comput. Chem. 32(9), 2031–2040 (2011)
Article Google Scholar
Abraham, M.J., Murtola, T., Schulz, R., et al.: GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1, 19–25 (2015)
Article Google Scholar
AMD Corporation.: Radeon: Dissecting the polaris architecture (white paper). Online at https://www.amd.com/system/files/documents/polaris-whitepaper.pdf(2019)
Andersson, M.I., Murugan, N.A., Podobas, A., et al.: Breaking Down the Parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software[C]//Parallel Processing and Applied Mathematics: 14th International Conference, PPAM, Gdansk, Poland, September 11-14, 2022, Revised Selected Papers, Part I. Cham: Springer International Publishing: 333-345 (2023) (2022)
Bjelkmar, P., Larsson, P., Cuendet, M.A., et al.: Implementation of the CHARMM force field in GROMACS: analysis of protein stability effects from correction maps, virtual interaction sites, and water models. J. Chem. Theo. Comput. 6(2), 459–466 (2010)
Article Google Scholar
Cui, W., et al.: Enable simultaneous dnn services based on deterministic operator overlap and precise latency prediction. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2021)
Intel Corporation.:SUBDEVICE. https://www.intel.com/content/www/us/en/docs/fortran-compiler/developer-guide-reference/2023-0/subdevice.html (2019)
Kim, H., et al.: Coda: Enabling co-location of computation and data for multiple gpu systems. ACM Trans Archit Code Optim (TACO) 15(3), 1–23 (2018)
Article Google Scholar
Kohnke, B., Kutzner, C., Grubmuller, H.: A GPU-accelerated fast multipole method for GROMACS: performance and accuracy. J. Chem. Theo. Comput. 16(11), 6938–6949 (2020)
Article Google Scholar
Kubernetes.: . Kuvernetes. https://github.com/kubernetes/kubernetes (2019)
Kutzner, C., Kniep, C., Cherian, A., et al.: GROMACS in the cloud: A global supercomputer to speed up alchemical drug design. J. Chem. Inf. Model. 62(7), 1691–1711 (2022)
Article Google Scholar
Lemkul, JA., Roux, B., van der Spoel, D., et al.: Implementation of extended L agrangian dynamics in GROMACS for polarizable simulations using the classical D rude oscillator model. (2015)
Lim, Gangmuk, et al.: Zico: Efficient GPU Memory Sharing for Concurrent DNN Training. USENIX Annual Technical Conference (2021)
Lou, J., et al. ”ArkGPU: enabling applications’ high-goodput co-location execution on multitasking GPUs.” CCF Trans High Performance Comput. 1–18 (2023)
Nvidia Corporation.: Multi-Instance GPU. https://docs.nvidia.com/cuda/mig/index.html (2021)
NVIDIA Corporation.: Multi-Process Service. https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf (2021)
Páll, S., et al.: Heterogeneous parallelization and acceleration of molecular dynamics simulations in GROMACS. J. Chem. Phys. 153(13), 134110 (2020)
Article Google Scholar
Shen, W., Liu, Z., Tan, Y., et al.: KubeGPU: efficient sharing and isolation mechanisms for GPU resource management in container cloud. J. Supercomput. 79(1), 591–625 (2023)
Article Google Scholar
Stegailov, V., Dlinnova, E., Ismagilov, T., et al.: Angara interconnect makes GPU-based Desmos supercomputer an efficient tool for molecular dynamics calculations. Int. J. High Performance Comput. Appl. 33(3), 507–521 (2019)
Article Google Scholar
Zhang, W., et al.: Toward QoS-awareness and improved utilization of spatial multitasking GPUs. IEEE Trans. Comput. 71(4), 866–879 (2021)
Article MathSciNet Google Scholar
Zhao, H., et al. ISPA: Exploiting Intra-SM Parallelism in GPUs via Fine-grained Resource Management. IEEE. Trans. Comput. (2022)
Zhao, W., et al.: Themis: Predicting and reining in application-level slowdown on spatial multitasking GPUs.” 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, (2019)
Zhao, X., Magnus, J., Lieven, E.: Hsm: A hybrid slowdown model for multitasking gpus. Proceedings of the twenty-fifth international conference on architectural support for programming languages and operating systems (2020)

Download references

Acknowledgements

This work was sponsored in part by NKRDP (2021YFB0300800), and in part by NSFC(62102396), Beijing Nova Program (Z211100002121143, 20220484217), Youth Innovation Promotion Association of Chinese Academy of Sciences (2021099), Pilot for Major Scientific Research Facility of Jiangsu Province of China (NO. BM2021800).

Author information

Y. Zhou, Z. Ren and E. Shao have on behalf of all authors, the corresponding author states that there is no conflict of interest.

Authors and Affiliations

State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, 100190, China
Yueyuan Zhou, ZiYi Ren, En Shao, Lixian Ma, Qiang Hu, Leping Wang & Guangming Tan
University of Chinese Academy of Sciences, Beijing, 100049, China
ZiYi Ren, En Shao, Qiang Hu & Guangming Tan
Nanjing Institute of InforSuperBahn, Nanjing, 211100, China
En Shao

Authors

Yueyuan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
ZiYi Ren
View author publications
You can also search for this author in PubMed Google Scholar
En Shao
View author publications
You can also search for this author in PubMed Google Scholar
Lixian Ma
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Leping Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guangming Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yueyuan Zhou or En Shao.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhou, Y., Ren, Z., Shao, E. et al. FILL: a heterogeneous resource scheduling system addressing the low throughput problem in GROMACS. CCF Trans. HPC 6, 17–31 (2024). https://doi.org/10.1007/s42514-023-00169-5

Download citation

Received: 12 July 2023
Accepted: 28 August 2023
Published: 23 September 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s42514-023-00169-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FILL: a heterogeneous resource scheduling system addressing the low throughput problem in GROMACS

Abstract

Access this article

Similar content being viewed by others

E-OSched: a load balancing scheduler for heterogeneous multicores

Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-core Architectures

Cluster-aware scheduling in multitasking GPUs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

FILL: a heterogeneous resource scheduling system addressing the low throughput problem in GROMACS

Abstract

Access this article

Similar content being viewed by others

E-OSched: a load balancing scheduler for heterogeneous multicores

Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-core Architectures

Cluster-aware scheduling in multitasking GPUs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation