Dynamic Selective Warp Scheduling for GPUs Using L1 Data Cache Locality Information

Kim, Gwang Bok; Kim, Jong Myon; Kim, Cheol Hong

doi:10.1007/978-981-13-5907-1_24

Gwang Bok Kim¹²,
Jong Myon Kim¹³ &
Cheol Hong Kim¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 931))

Included in the following conference series:

International Conference on Parallel and Distributed Computing: Applications and Technologies

868 Accesses
1 Citations

Abstract

Warp scheduling policy for GPUs has significant impact on performance since the order of executed warps determines the degree of data cache locality. Greedy warp scheduling policy such as GTO shows better performance than fair scheduling policy for numerous applications. However, cache locality by multiple warps is underutilized when the GTO is adopted, resulting in overall performance degradation. In this paper, we propose a dynamic selective warp scheduling exploiting data locality of workload. Inter-warp locality and intra-warp locality are determined based on the access history information of the L1 data cache. By adjusting scheduling policy dynamically, the performance and cache efficiency are improved compared LRR and GTO significantly. According to our experimental results, the proposed technique provides IPC improvement by 19% and 3.8% over LRR and GTO, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Narasiman, V., Shebanow, M., Lee, C.J., Miftakhutdinov, R., Mutlu, O., Patt, Y.N.: Improving GPU performance via large warps and two-level warp scheduling. In: 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 308–317. IEEE (2011)
Google Scholar
Zhang, Y., Xing, Z., Liu, C., Tang, C., Wang, Q.: Locality based warp scheduling in GPGPUs, Futur. Gener. Comput. Syst. (2017)
Google Scholar
Wang, B., Zhu, Y., Yu, W.: OAWS: memory occlusion aware warp scheduling. In: International Conference on Parallel Architecture and Compilation Techniques, pp. 45–55. IEEE (2016)
Google Scholar
Wang, J., Rubin, N., Sidelnik, A., Yalamanchili, S.: LaPerm: locality aware scheduler for dynamic parallelism on GPUs. ACM SIGARCH Comput. Arch. News 44(3), 583–595 (2016)
Article Google Scholar
Zhang, W.: Enhancing data cache reliability by the addition of a small fully-associative replication cache. In: Proceedings of the 18^th Annual International Conference on Supercomputing, pp. 12–19 (2004)
Google Scholar
Sato, M., Egawa, R., Takizawa, H., Kobayashi, H.: A voting-based working set assessment scheme for dynamic cache resizing mechanisms. In: IEEE International Conference on Computer Design (ICCD), pp. 98–105. IEEE (2010)
Google Scholar
Lee, M., Kim, G., Kim, J., Seo, W., Cho, Y., Ryu, S.: iPAWS: instruction-issue pattern-based adaptive warp scheduling for GPGPUs. In: IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 370–381. IEEE (2016)
Google Scholar
Oh, Y., Kim, K., Yoon, M.K., Park, J.H., Ro, W.W., Annavaram, M.: APRES: improving cache efficiency by exploiting load characteristics on GPUs. ACM SIGARCH Comput. Arch. News 44(3), 191–203 (2016)
Article Google Scholar
Aamodt, T.M., Fung, W.W.L.: GPGPU-Sim 3.x Manual (2014). http://gpgpu-sim.org/manual/index.php/GPGPU-Sim 3.x Manual
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing, workload characterization. In: IEEE International Symposium on IISWC 2009, pp. 44–54 (2009)
Google Scholar
Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: Performance Analysis of Systems and Software, pp. 163–174 (2009)
Google Scholar
Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU Codes. In: Innovative Parallel Computing, pp. 1–10 (2012)
Google Scholar
NVIDIA, NVIDIA CUDA C programming guide v4.2, April 2012. http://developer.nvidia.com/nvidia-gpu-computing-documentation
Nugteren, C., van den Braak, G.-J., Corporaal, H., Bal, H.: A detailed GPU cache model based on reuse distance theory. In: High Performance Computer Architecture, pp. 37–48 (2014)
Google Scholar
Wong, H., Papadopoulou, M.-M., Sadooghi-Alvandi, M., Moshovos, A.: Demystifying GPU microarchitecture through microbenchmarking. In: Performance Analysis of Systems & Software, pp. 235–246 (2010)
Google Scholar
Rogers, T.G., O’Connor, M., Aamodt, T.M.: Cache-conscious wavefront scheduling. In: Proceedings of the IEEE/ACM International Symposium on Microarchitecture, pp. 72–83 (2012)
Google Scholar

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF2018R1A2B6005740).

Author information

Authors and Affiliations

School of Electronics and Computer Engineering, Chonnam National University, Gwangju, Korea
Gwang Bok Kim & Cheol Hong Kim
School of Electronical Engineering, University of Ulsan, Ulsan, Korea
Jong Myon Kim

Authors

Gwang Bok Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jong Myon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Cheol Hong Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cheol Hong Kim .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Seoul National University of Science and Technology, Seoul, Korea (Republic of)
Jong Hyuk Park
School of Computer Science, University of Adelaide, Adelaide, SA, Australia
Hong Shen
Department of Multimedia Engineering, Dongguk University, Seoul, Korea (Republic of)
Yunsick Sung
School of ICT, Griffith University, Gold Coast, Australia
Hui Tian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, G.B., Kim, J.M., Kim, C.H. (2019). Dynamic Selective Warp Scheduling for GPUs Using L1 Data Cache Locality Information. In: Park, J., Shen, H., Sung, Y., Tian, H. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2018. Communications in Computer and Information Science, vol 931. Springer, Singapore. https://doi.org/10.1007/978-981-13-5907-1_24

Download citation

DOI: https://doi.org/10.1007/978-981-13-5907-1_24
Published: 08 February 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-5906-4
Online ISBN: 978-981-13-5907-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics