K-Scheduler: dynamic intra-SM multitasking management with execution profiles on GPUs

Kim, Sejin; Kim, Yoonhee

doi:10.1007/s10586-021-03429-7

K-Scheduler: dynamic intra-SM multitasking management with execution profiles on GPUs

Published: 12 October 2021

Volume 25, pages 597–617, (2022)
Cite this article

Cluster Computing Aims and scope Submit manuscript

398 Accesses
1 Citation
Explore all metrics

Abstract

Data centers and cloud environments have recently started providing graphic processing unit (GPU)-based infrastructure services. Actual general purpose GPU (GPGPU) applications have low GPU utilization, unlike GPU-friendly applications. To improve the resource utilization of GPUs, there is the need for the concurrent execution of different applications while sharing resources in a streaming multiprocessor (SM). However, it is difficult to predict the execution performance of applications because resource contention can be caused by intra-SM multitasking. Furthermore, it is crucial to find the best resource partitioning and an execution set of applications that show the best performance among many applications. To address this, the current paper proposes K-Scheduler, a multitasking placement scheduler based on the intra-SM resource-use characteristics of applications. First, the resource-use and multitasking characteristics of applications are analyzed according to their classification and their individual execution characteristics. Rules for concurrent execution are derived according to each observation, and scheduling is performed according to the corresponding rules. The results verified that the total workload execution performance of K-Scheduler improved by 18% compared to previous studies, and individual execution performance improved by 32%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multitask Oriented GPU Resource Sharing and Virtualization in Cloud Environment

A dynamic CTA scheduling scheme for massive parallel computing

Article 14 February 2017

FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs

Article 19 May 2021

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

TOP-500. https://www.top500.org/
Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)
Article Google Scholar
EC2 ELASTIC GPUS, A. (2017). https://aws.amazon.com/ec2/Elastic-GPUs/
NIMBIX. https://www.nimbix.net/cloud-computing-nvidia/
MICROSOFT-AZURE. https://docs.microsoft.com/en-au/azure/virtual-machines/windows/sizes-gpu
Dongarra, J.J., Luszczek, P., Petitet, A.: The linpack benchmark: past, present and future. Concurr. Comput.: Pract. Exp. 15(9), 803–820 (2003)
Article Google Scholar
Dongarra, J., Heroux, M.A.: Toward a new metric for ranking high performance computing systems. Sandia report, SAND2013-4744 312, 150 (2013)
Allen, T., Feng, X., Ge, R.: Slate: enabling workload-aware efficient multiprocessing for modern gpgpus. In: 2019 IEEE international parallel and distributed processing symposium (IPDPS), pp. 252–261. IEEE
NVIDIA-MULTI-PROCESS-SERVICE. (2020). https://docs.nvidia.com/deploy/pdf/CUDA-Multi-Process-Service-Overview.pdf
Schulte, M.J., Ignatowski, M., Loh, G.H., Beckmann, B.M., Brantley, W.C., Gurumurthi, S., Jayasena, N., Paul, I., Reinhardt, S.K., Rodgers, G.: Achieving exascale capabilities through heterogeneous computing. IEEE Micro 35(4), 26–36 (2015)
Article Google Scholar
Zhang, W., Cui, W., Fu, K., Chen, Q., Mawhirter, D.E., Wu, B., Li, C., Guo, M.: Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters. In: Proceedings of the ACM international conference on supercomputing, pp. 58–68 (2019)
Zhao, X., Jahre, M., Eeckhout, L.: Hsm: A hybrid slowdown model for multitasking gpus. In: Proceedings of the twenty-fifth international conference on architectural support for programming languages and operating systems, pp. 1371–1385 (2020)
Zhao, X., Wang, Z., Eeckhout, L.: Classification-driven search for effective sm partitioning in multitasking gpus. In: Proceedings of the 2018 international conference on supercomputing, pp. 65–75 (2018)
Dai, H., Lin, Z., Li, C., Zhao, C., Wang, F., Zheng, N., Zhou, H.: Accelerate gpu concurrent kernel execution by mitigating memory pipeline stalls. In: 2018 IEEE international symposium on high performance computer architecture (HPCA), pp. 208–220. IEEE (2018)
Wang, Z., Yang, J., Melhem, R., Childers, B., Zhang, Y., Guo, M.: Simultaneous multikernel gpu: multi-tasking throughput processors via fine-grained sharing. In: 2016 IEEE international symposium on high performance computer architecture (HPCA), pp. 358–369. IEEE (2016)
Xu, Q., Jeon, H., Kim, K., Ro, W.W., Annavaram, M. Warped-slicer.: Efficient intra-sm slicing through dynamic resource partitioning for gpu multiprogramming. In: 2016 ACM/IEEE 43rd annual international symposium on computer architecture (ISCA) (2016), pp. 230–242. IEEE (2016)
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.-H., Skadron, K. Rodinia.: Rodinia: A benchmark suite for heterogeneous computing. In: 2009 IEEE international symposium on workload characterization (IISWC), pp. 44–54. IEEE (2009)
NVIDIA-CUDA-SAMPLE. https://docs.nvidia.com/cuda/cuda-samples/index.html
Stratton, J.A., Rodrigues, C., Sung, I.-J., Obeid, N., Chang, L.-W., Anssari, N., Liu, G.D., Hwu, W.-M.W.: Parboil: a revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing 127,(2012)
Danalis, A., Marin, G., McCurdy, C., Meredith, J.S., Roth, P.C., Spafford, K., Tipparaju, V., Vetter, J.S.: The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd workshop on general-purpose computation on graphics processing units, pp. 63–74 (2010)
POLYHEDRAL-BENCHMARK-SUITE. http://web.cse.ohio-state.edu/pouchet.2/software/polybench/
PROGRAMMING GUIDE, C.-C. (2021). https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
TUNING GUIDE, K. (2021). https://docs.nvidia.com/cuda/kepler-tuning-guide/index.html
Kim, S., Qichen Chen, H.Y., Kim, Y.: Performance analysis of concurrent multitasking for efficient resource utilization of gpus. J. KIISE 48(6), 604–611 (2021)
Article Google Scholar
NVCC. (2021). https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html
Chen, Q., Chung, H., Son, Y., Kim, Y., and Yeom, H.Y.: Smcompactor: a workload-aware fine-grained resource management framework for gpgpus. In: Proceedings of the 36th annual ACM symposium on applied computing, SAC ’21, pp. 1147–1155 (2021)
Park, J.J.K., Park, Y., Mahlke, S.: Dynamic resource management for efficient utilization of multitasking gpus. In: Proceedings of the twenty-second international conference on architectural support for programming languages and operating systems, pp. 527–540 (2017)
Thomas, W., Toraskar, S., Singh, V.: Dynamic optimizations in gpu using roofline model. In: 2021 IEEE international symposium on circuits and systems (ISCAS), pp. 1–5 (2021)
Wei, M., Zhao, W., Chen, Q., Dai, H., Leng, J., Li, C., Zheng, W., Guo, M.: Predicting and reining in application-level slowdown on spatial multitasking gpus. J. Parallel Distrib. Comput. 141, 99–114 (2020)
Article Google Scholar
Alizadeh, N.S., Momtazpour, M.: Machine learning-based interference detection in gpgpu concurrent kernel execution. In: 2020 25th international computer conference, computer society of Iran (CSICC), pp. 1–4. IEEE (2020)

Download references

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2021R1A2C1003379).

Author information

Sejin Kim and Yoonhee Kim have contributed equally to the work.

Authors and Affiliations

Department of Computer Science, Sookmyung Women’s University, Seoul, South Korea
Sejin Kim & Yoonhee Kim

Authors

Sejin Kim
View author publications
You can also search for this author in PubMed Google Scholar
Yoonhee Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yoonhee Kim.

Ethics declarations

Informed consent

Written informed consent for publication of this paper was obtained from Sookmyung Women’s University and all authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, S., Kim, Y. K-Scheduler: dynamic intra-SM multitasking management with execution profiles on GPUs. Cluster Comput 25, 597–617 (2022). https://doi.org/10.1007/s10586-021-03429-7

Download citation

Received: 12 April 2021
Revised: 23 July 2021
Accepted: 24 September 2021
Published: 12 October 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10586-021-03429-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

K-Scheduler: dynamic intra-SM multitasking management with execution profiles on GPUs

Abstract

Access this article

Similar content being viewed by others

Multitask Oriented GPU Resource Sharing and Virtualization in Cloud Environment

A dynamic CTA scheduling scheme for massive parallel computing

FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

K-Scheduler: dynamic intra-SM multitasking management with execution profiles on GPUs

Abstract

Access this article

Similar content being viewed by others

Multitask Oriented GPU Resource Sharing and Virtualization in Cloud Environment

A dynamic CTA scheduling scheme for massive parallel computing

FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation