Skip to main content

Dynamic Selective Warp Scheduling for GPUs Using L1 Data Cache Locality Information

  • Conference paper
  • First Online:
Parallel and Distributed Computing, Applications and Technologies (PDCAT 2018)

Abstract

Warp scheduling policy for GPUs has significant impact on performance since the order of executed warps determines the degree of data cache locality. Greedy warp scheduling policy such as GTO shows better performance than fair scheduling policy for numerous applications. However, cache locality by multiple warps is underutilized when the GTO is adopted, resulting in overall performance degradation. In this paper, we propose a dynamic selective warp scheduling exploiting data locality of workload. Inter-warp locality and intra-warp locality are determined based on the access history information of the L1 data cache. By adjusting scheduling policy dynamically, the performance and cache efficiency are improved compared LRR and GTO significantly. According to our experimental results, the proposed technique provides IPC improvement by 19% and 3.8% over LRR and GTO, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Narasiman, V., Shebanow, M., Lee, C.J., Miftakhutdinov, R., Mutlu, O., Patt, Y.N.: Improving GPU performance via large warps and two-level warp scheduling. In: 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 308–317. IEEE (2011)

    Google Scholar 

  2. Zhang, Y., Xing, Z., Liu, C., Tang, C., Wang, Q.: Locality based warp scheduling in GPGPUs, Futur. Gener. Comput. Syst. (2017)

    Google Scholar 

  3. Wang, B., Zhu, Y., Yu, W.: OAWS: memory occlusion aware warp scheduling. In: International Conference on Parallel Architecture and Compilation Techniques, pp. 45–55. IEEE (2016)

    Google Scholar 

  4. Wang, J., Rubin, N., Sidelnik, A., Yalamanchili, S.: LaPerm: locality aware scheduler for dynamic parallelism on GPUs. ACM SIGARCH Comput. Arch. News 44(3), 583–595 (2016)

    Article  Google Scholar 

  5. Zhang, W.: Enhancing data cache reliability by the addition of a small fully-associative replication cache. In: Proceedings of the 18th Annual International Conference on Supercomputing, pp. 12–19 (2004)

    Google Scholar 

  6. Sato, M., Egawa, R., Takizawa, H., Kobayashi, H.: A voting-based working set assessment scheme for dynamic cache resizing mechanisms. In: IEEE International Conference on Computer Design (ICCD), pp. 98–105. IEEE (2010)

    Google Scholar 

  7. Lee, M., Kim, G., Kim, J., Seo, W., Cho, Y., Ryu, S.: iPAWS: instruction-issue pattern-based adaptive warp scheduling for GPGPUs. In: IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 370–381. IEEE (2016)

    Google Scholar 

  8. Oh, Y., Kim, K., Yoon, M.K., Park, J.H., Ro, W.W., Annavaram, M.: APRES: improving cache efficiency by exploiting load characteristics on GPUs. ACM SIGARCH Comput. Arch. News 44(3), 191–203 (2016)

    Article  Google Scholar 

  9. Aamodt, T.M., Fung, W.W.L.: GPGPU-Sim 3.x Manual (2014). http://gpgpu-sim.org/manual/index.php/GPGPU-Sim 3.x Manual

  10. Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing, workload characterization. In: IEEE International Symposium on IISWC 2009, pp. 44–54 (2009)

    Google Scholar 

  11. Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: Performance Analysis of Systems and Software, pp. 163–174 (2009)

    Google Scholar 

  12. Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU Codes. In: Innovative Parallel Computing, pp. 1–10 (2012)

    Google Scholar 

  13. NVIDIA, NVIDIA CUDA C programming guide v4.2, April 2012. http://developer.nvidia.com/nvidia-gpu-computing-documentation

  14. Nugteren, C., van den Braak, G.-J., Corporaal, H., Bal, H.: A detailed GPU cache model based on reuse distance theory. In: High Performance Computer Architecture, pp. 37–48 (2014)

    Google Scholar 

  15. Wong, H., Papadopoulou, M.-M., Sadooghi-Alvandi, M., Moshovos, A.: Demystifying GPU microarchitecture through microbenchmarking. In: Performance Analysis of Systems & Software, pp. 235–246 (2010)

    Google Scholar 

  16. Rogers, T.G., O’Connor, M., Aamodt, T.M.: Cache-conscious wavefront scheduling. In: Proceedings of the IEEE/ACM International Symposium on Microarchitecture, pp. 72–83 (2012)

    Google Scholar 

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF2018R1A2B6005740).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cheol Hong Kim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kim, G.B., Kim, J.M., Kim, C.H. (2019). Dynamic Selective Warp Scheduling for GPUs Using L1 Data Cache Locality Information. In: Park, J., Shen, H., Sung, Y., Tian, H. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2018. Communications in Computer and Information Science, vol 931. Springer, Singapore. https://doi.org/10.1007/978-981-13-5907-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-5907-1_24

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-5906-4

  • Online ISBN: 978-981-13-5907-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics