Skip to main content
Log in

FILL: a heterogeneous resource scheduling system addressing the low throughput problem in GROMACS

  • Regular Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

Despite advancements in computer hardware, the performance of GROMACS simulations has not exhibited significant improvement, primarily due to the inefficient utilization of substantial hardware resources. Enhancing resource utilization in GROMACS simulations can be achieved through effective resource scheduling when running multiple simulations concurrently on a single computing node, particularly benefiting small-scale system simulations which are frequently employed. Previous research focused on co-running multiple GROMACS simulations through the utilization of time-slice technology. However, this approach introduced notable context-switching overhead and predominantly concentrated on optimizing GPU resources utilization, while neglecting the collaborative scheduling of heterogeneous CPU and GPU devices. Nowadays, various GPU vendors have introduced hardware partitioning technologies for spatial resources allocation, complementing traditional time-sharing techniques. Moreover, GROMACS operates as a heterogeneous computing application, alternating computations between the CPU and GPU devices. Notably, GPU utilization sometimes accounts for as little as 35%. Consequently, a comprehensive approach involving coordinated scheduling between both the GPU and CPU is imperative. To leverage the potential of hardware partitioning technologies in alignment with GROMACS’ runtime characteristics, we propose FILL: a resource scheduling system designed for co-running multiple GROMACS jobs. FILL employs space partitioning technology to effectively allocate hardware resources and facilitates collaborative scheduling of CPU and GPU resources, thereby ensuring precise and deterministic allocation of GROMACS job resources. The scheduling aims to improve system throughput while considering the turnaround time of simulations. Implemented on servers equipped with NVIDIA and AMD GPUs, FILL has showcased noteworthy advancements in system throughput. On NVIDIA GPU servers, FILL achieved an impressive improvement of up to 167% compared to the baseline approach and an astonishing boost of 27,928% compared to state-of-the-art alternatives. Similarly, on AMD GPU servers, FILL demonstrated significant enhancements of 459% and 24% over the baseline and state-of-the-art methods, respectively. These remarkable results validate the effectiveness of FILL in optimizing system throughput for multiple GROMACS simulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Abraham, M.J., Gready, J.E.: Optimization of parameters for molecular dynamics simulation using smooth particle-mesh Ewald in GROMACS 4.5. J. Comput. Chem. 32(9), 2031–2040 (2011)

    Article  Google Scholar 

  • Abraham, M.J., Murtola, T., Schulz, R., et al.: GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1, 19–25 (2015)

    Article  Google Scholar 

  • AMD Corporation.: Radeon: Dissecting the polaris architecture (white paper). Online at https://www.amd.com/system/files/documents/polaris-whitepaper.pdf(2019)

  • Andersson, M.I., Murugan, N.A., Podobas, A., et al.: Breaking Down the Parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software[C]//Parallel Processing and Applied Mathematics: 14th International Conference, PPAM, Gdansk, Poland, September 11-14, 2022, Revised Selected Papers, Part I. Cham: Springer International Publishing: 333-345 (2023) (2022)

  • Bjelkmar, P., Larsson, P., Cuendet, M.A., et al.: Implementation of the CHARMM force field in GROMACS: analysis of protein stability effects from correction maps, virtual interaction sites, and water models. J. Chem. Theo. Comput. 6(2), 459–466 (2010)

    Article  Google Scholar 

  • Cui, W., et al.: Enable simultaneous dnn services based on deterministic operator overlap and precise latency prediction. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2021)

  • Intel Corporation.:SUBDEVICE. https://www.intel.com/content/www/us/en/docs/fortran-compiler/developer-guide-reference/2023-0/subdevice.html (2019)

  • Kim, H., et al.: Coda: Enabling co-location of computation and data for multiple gpu systems. ACM Trans Archit Code Optim (TACO) 15(3), 1–23 (2018)

    Article  Google Scholar 

  • Kohnke, B., Kutzner, C., Grubmuller, H.: A GPU-accelerated fast multipole method for GROMACS: performance and accuracy. J. Chem. Theo. Comput. 16(11), 6938–6949 (2020)

    Article  Google Scholar 

  • Kubernetes.: . Kuvernetes. https://github.com/kubernetes/kubernetes (2019)

  • Kutzner, C., Kniep, C., Cherian, A., et al.: GROMACS in the cloud: A global supercomputer to speed up alchemical drug design. J. Chem. Inf. Model. 62(7), 1691–1711 (2022)

    Article  Google Scholar 

  • Lemkul, JA., Roux, B., van der Spoel, D., et al.: Implementation of extended L agrangian dynamics in GROMACS for polarizable simulations using the classical D rude oscillator model. (2015)

  • Lim, Gangmuk, et al.: Zico: Efficient GPU Memory Sharing for Concurrent DNN Training. USENIX Annual Technical Conference (2021)

  • Lou, J., et al. ”ArkGPU: enabling applications’ high-goodput co-location execution on multitasking GPUs.” CCF Trans High Performance Comput. 1–18 (2023)

  • Nvidia Corporation.: Multi-Instance GPU. https://docs.nvidia.com/cuda/mig/index.html (2021)

  • NVIDIA Corporation.: Multi-Process Service. https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf (2021)

  • Páll, S., et al.: Heterogeneous parallelization and acceleration of molecular dynamics simulations in GROMACS. J. Chem. Phys. 153(13), 134110 (2020)

    Article  Google Scholar 

  • Shen, W., Liu, Z., Tan, Y., et al.: KubeGPU: efficient sharing and isolation mechanisms for GPU resource management in container cloud. J. Supercomput. 79(1), 591–625 (2023)

    Article  Google Scholar 

  • Stegailov, V., Dlinnova, E., Ismagilov, T., et al.: Angara interconnect makes GPU-based Desmos supercomputer an efficient tool for molecular dynamics calculations. Int. J. High Performance Comput. Appl. 33(3), 507–521 (2019)

    Article  Google Scholar 

  • Zhang, W., et al.: Toward QoS-awareness and improved utilization of spatial multitasking GPUs. IEEE Trans. Comput. 71(4), 866–879 (2021)

    Article  MathSciNet  Google Scholar 

  • Zhao, H., et al. ISPA: Exploiting Intra-SM Parallelism in GPUs via Fine-grained Resource Management. IEEE. Trans. Comput. (2022)

  • Zhao, W., et al.: Themis: Predicting and reining in application-level slowdown on spatial multitasking GPUs.” 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, (2019)

  • Zhao, X., Magnus, J., Lieven, E.: Hsm: A hybrid slowdown model for multitasking gpus. Proceedings of the twenty-fifth international conference on architectural support for programming languages and operating systems (2020)

Download references

Acknowledgements

This work was sponsored in part by NKRDP (2021YFB0300800), and in part by NSFC(62102396), Beijing Nova Program (Z211100002121143, 20220484217), Youth Innovation Promotion Association of Chinese Academy of Sciences (2021099), Pilot for Major Scientific Research Facility of Jiangsu Province of China (NO. BM2021800).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yueyuan Zhou or En Shao.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Y., Ren, Z., Shao, E. et al. FILL: a heterogeneous resource scheduling system addressing the low throughput problem in GROMACS. CCF Trans. HPC 6, 17–31 (2024). https://doi.org/10.1007/s42514-023-00169-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-023-00169-5

Keywords

Navigation