Skip to main content
Log in

K-Scheduler: dynamic intra-SM multitasking management with execution profiles on GPUs

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Data centers and cloud environments have recently started providing graphic processing unit (GPU)-based infrastructure services. Actual general purpose GPU (GPGPU) applications have low GPU utilization, unlike GPU-friendly applications. To improve the resource utilization of GPUs, there is the need for the concurrent execution of different applications while sharing resources in a streaming multiprocessor (SM). However, it is difficult to predict the execution performance of applications because resource contention can be caused by intra-SM multitasking. Furthermore, it is crucial to find the best resource partitioning and an execution set of applications that show the best performance among many applications. To address this, the current paper proposes K-Scheduler, a multitasking placement scheduler based on the intra-SM resource-use characteristics of applications. First, the resource-use and multitasking characteristics of applications are analyzed according to their classification and their individual execution characteristics. Rules for concurrent execution are derived according to each observation, and scheduling is performed according to the corresponding rules. The results verified that the total workload execution performance of K-Scheduler improved by 18% compared to previous studies, and individual execution performance improved by 32%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. TOP-500. https://www.top500.org/

  2. Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)

    Article  Google Scholar 

  3. EC2 ELASTIC GPUS, A. (2017). https://aws.amazon.com/ec2/Elastic-GPUs/

  4. NIMBIX. https://www.nimbix.net/cloud-computing-nvidia/

  5. MICROSOFT-AZURE. https://docs.microsoft.com/en-au/azure/virtual-machines/windows/sizes-gpu

  6. Dongarra, J.J., Luszczek, P., Petitet, A.: The linpack benchmark: past, present and future. Concurr. Comput.: Pract. Exp. 15(9), 803–820 (2003)

    Article  Google Scholar 

  7. Dongarra, J., Heroux, M.A.: Toward a new metric for ranking high performance computing systems. Sandia report, SAND2013-4744 312, 150 (2013)

  8. Allen, T., Feng, X., Ge, R.: Slate: enabling workload-aware efficient multiprocessing for modern gpgpus. In: 2019 IEEE international parallel and distributed processing symposium (IPDPS), pp. 252–261. IEEE

  9. NVIDIA-MULTI-PROCESS-SERVICE. (2020). https://docs.nvidia.com/deploy/pdf/CUDA-Multi-Process-Service-Overview.pdf

  10. Schulte, M.J., Ignatowski, M., Loh, G.H., Beckmann, B.M., Brantley, W.C., Gurumurthi, S., Jayasena, N., Paul, I., Reinhardt, S.K., Rodgers, G.: Achieving exascale capabilities through heterogeneous computing. IEEE Micro 35(4), 26–36 (2015)

    Article  Google Scholar 

  11. Zhang, W., Cui, W., Fu, K., Chen, Q., Mawhirter, D.E., Wu, B., Li, C., Guo, M.: Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters. In: Proceedings of the ACM international conference on supercomputing, pp. 58–68 (2019)

  12. Zhao, X., Jahre, M., Eeckhout, L.: Hsm: A hybrid slowdown model for multitasking gpus. In: Proceedings of the twenty-fifth international conference on architectural support for programming languages and operating systems, pp. 1371–1385 (2020)

  13. Zhao, X., Wang, Z., Eeckhout, L.: Classification-driven search for effective sm partitioning in multitasking gpus. In: Proceedings of the 2018 international conference on supercomputing, pp. 65–75 (2018)

  14. Dai, H., Lin, Z., Li, C., Zhao, C., Wang, F., Zheng, N., Zhou, H.: Accelerate gpu concurrent kernel execution by mitigating memory pipeline stalls. In: 2018 IEEE international symposium on high performance computer architecture (HPCA), pp. 208–220. IEEE (2018)

  15. Wang, Z., Yang, J., Melhem, R., Childers, B., Zhang, Y., Guo, M.: Simultaneous multikernel gpu: multi-tasking throughput processors via fine-grained sharing. In: 2016 IEEE international symposium on high performance computer architecture (HPCA), pp. 358–369. IEEE (2016)

  16. Xu, Q., Jeon, H., Kim, K., Ro, W.W., Annavaram, M. Warped-slicer.: Efficient intra-sm slicing through dynamic resource partitioning for gpu multiprogramming. In: 2016 ACM/IEEE 43rd annual international symposium on computer architecture (ISCA) (2016), pp. 230–242. IEEE (2016)

  17. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.-H., Skadron, K. Rodinia.: Rodinia: A benchmark suite for heterogeneous computing. In: 2009 IEEE international symposium on workload characterization (IISWC), pp. 44–54. IEEE (2009)

  18. NVIDIA-CUDA-SAMPLE. https://docs.nvidia.com/cuda/cuda-samples/index.html

  19. Stratton, J.A., Rodrigues, C., Sung, I.-J., Obeid, N., Chang, L.-W., Anssari, N., Liu, G.D., Hwu, W.-M.W.: Parboil: a revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing 127,(2012)

  20. Danalis, A., Marin, G., McCurdy, C., Meredith, J.S., Roth, P.C., Spafford, K., Tipparaju, V., Vetter, J.S.: The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd workshop on general-purpose computation on graphics processing units, pp. 63–74 (2010)

  21. POLYHEDRAL-BENCHMARK-SUITE. http://web.cse.ohio-state.edu/pouchet.2/software/polybench/

  22. PROGRAMMING GUIDE, C.-C. (2021). https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

  23. TUNING GUIDE, K. (2021). https://docs.nvidia.com/cuda/kepler-tuning-guide/index.html

  24. Kim, S., Qichen Chen, H.Y., Kim, Y.: Performance analysis of concurrent multitasking for efficient resource utilization of gpus. J. KIISE 48(6), 604–611 (2021)

    Article  Google Scholar 

  25. NVCC. (2021). https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html

  26. Chen, Q., Chung, H., Son, Y., Kim, Y., and Yeom, H.Y.: Smcompactor: a workload-aware fine-grained resource management framework for gpgpus. In: Proceedings of the 36th annual ACM symposium on applied computing, SAC ’21, pp. 1147–1155 (2021)

  27. Park, J.J.K., Park, Y., Mahlke, S.: Dynamic resource management for efficient utilization of multitasking gpus. In: Proceedings of the twenty-second international conference on architectural support for programming languages and operating systems, pp. 527–540 (2017)

  28. Thomas, W., Toraskar, S., Singh, V.: Dynamic optimizations in gpu using roofline model. In: 2021 IEEE international symposium on circuits and systems (ISCAS), pp. 1–5 (2021)

  29. Wei, M., Zhao, W., Chen, Q., Dai, H., Leng, J., Li, C., Zheng, W., Guo, M.: Predicting and reining in application-level slowdown on spatial multitasking gpus. J. Parallel Distrib. Comput. 141, 99–114 (2020)

    Article  Google Scholar 

  30. Alizadeh, N.S., Momtazpour, M.: Machine learning-based interference detection in gpgpu concurrent kernel execution. In: 2020 25th international computer conference, computer society of Iran (CSICC), pp. 1–4. IEEE (2020)

Download references

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2021R1A2C1003379).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoonhee Kim.

Ethics declarations

Informed consent

Written informed consent for publication of this paper was obtained from Sookmyung Women’s University and all authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, S., Kim, Y. K-Scheduler: dynamic intra-SM multitasking management with execution profiles on GPUs. Cluster Comput 25, 597–617 (2022). https://doi.org/10.1007/s10586-021-03429-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03429-7

Keywords

Navigation