Static detection of uncoalesced accesses in GPU programs

Alur, Rajeev; Devietti, Joseph; Leija, Omar S. Navarro; Singhania, Nimit

doi:10.1007/s10703-021-00362-8

Static detection of uncoalesced accesses in GPU programs

Published: 05 March 2021

Volume 60, pages 1–32, (2022)
Cite this article

Formal Methods in System Design Aims and scope Submit manuscript

Rajeev Alur¹,
Joseph Devietti¹,
Omar S. Navarro Leija¹ &
…
Nimit Singhania ORCID: orcid.org/0000-0003-1345-0505¹

271 Accesses
2 Citations
Explore all metrics

Abstract

GPU programming has become popular due to the high computational capabilities of GPUs. Obtaining significant performance gains with GPU is however challenging and the programmer needs to be aware of various subtleties of the GPU architecture. One such subtlety lies in accessing GPU memory, where certain access patterns can lead to poor performance. Such access patterns are referred to as uncoalesced global memory accesses. This work presents a light-weight compile-time static analysis to identify such accesses in GPU programs. The analysis relies on a novel abstraction which tracks the access pattern across multiple threads. The abstraction enables quick prediction while providing correctness guarantees. We have implemented the analysis in LLVM and compare it against a dynamic analysis implementation. The static analysis identifies 95 pre-existing uncoalesced accesses in Rodinia, a popular benchmark suite of GPU programs, and finishes within seconds for most programs, in comparison to the dynamic analysis which finds 69 accesses and takes orders of magnitude longer to finish.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

In-memory database acceleration on FPGAs: a survey

Article Open access 26 October 2019

An empirical study of automated unit test generation for Python

Article Open access 31 January 2023

A large-scale empirical study on mobile performance: energy, run-time and memory

Article Open access 27 December 2023

References

Allen JR, Kennedy K, Porterfield C, Warren J (1983) Conversion of control dependence to data dependence. In: Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on principles of programming languages, POPL ’83. ACM, New York, NY, USA, pp 177–189. https://doi.org/10.1145/567067.567085
Amilkanthwar M, Balachandran, S (2013) CUPL: A compile-time uncoalesced memory access pattern locator for CUDA. In: Proceedings of the 27th international ACM conference on international conference on supercomputing, ICS ’13. ACM, New York, NY, USA, pp 459–460. https://doi.org/10.1145/2464996.2467288
Baskaran MM, Bondhugula U, Krishnamoorthy S, Ramanujam J, Rountev A, Sadayappan P (2008) A compiler framework for optimization of affine loop nests for GPGPUs. In: Proceedings of the 22Nd annual international conference on supercomputing, ICS ’08. ACM, New York, NY, USA, pp 225–234. https://doi.org/10.1145/1375527.1375562
Betts A, Chong N, Donaldson A, Qadeer S, Thomson P (2012) GPUVerify: a verifier for GPU kernels. SIGPLAN Notice 47(10):113–132. https://doi.org/10.1145/2398857.2384625
Article Google Scholar
Betts A, Chong N, Donaldson AF, Ketema J, Qadeer S, Thomson P, Wickerson J (2015) The design and implementation of a verification technique for GPU kernels. ACM Trans Program Lang Syst 37(3):10:1-10:49. https://doi.org/10.1145/2743017
Article Google Scholar
Boyer RS, Elspas B, Levitt KN (1975) SELECT – a formal system for testing and debugging programs by symbolic execution. In: Proceedings of the international conference on reliable software. ACM, New York, NY, USA, pp 234–245. https://doi.org/10.1145/800027.808445
Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadron K (2009) Rodinia: A benchmark suite for heterogeneous computing. In: Proceedings of the 2009 IEEE international symposium on workload characterization (IISWC), IISWC ’09. IEEE Computer Society, Washington, DC, USA, pp 44–54. https://doi.org/10.1109/IISWC.2009.5306797
Cousot P, Cousot R (1977) Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on principles of programming languages, POPL ’77. ACM, New York, NY, USA, pp 238–252. https://doi.org/10.1145/512950.512973
Fauzia N, Pouchet LN, Sadayappan P (2015) Characterizing and enhancing global memory data coalescing on GPUs. In: Proceedings of the 13th Annual IEEE/ACM international symposium on code generation and optimization, CGO ’15. IEEE Computer Society, Washington, DC, USA, pp 12–22. http://dl.acm.org/citation.cfm?id=2738600.2738603
Karrenberg R (2015) Automatic SIMD Vectorization of SSA-based Control Flow Graphs. Springer, Berlin
Book Google Scholar
Kim Y, Shrivastava A (2011) CuMAPz: A tool to analyze memory access patterns in CUDA. In: Proceedings of the 48th design automation conference, DAC ’11. ACM, New York, NY, USA, pp 128–133. https://doi.org/10.1145/2024724.2024754
King JC (1975) A new approach to program testing. In: Proceedings of the International Conference on Reliable Software. ACM, New York, NY, USA, pp 228–233. https://doi.org/10.1145/800027.808444
Li G, Gopalakrishnan G (2010) Scalable SMT-based verification of GPU kernel functions. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’10. ACM, New York, NY, USA, pp 187–196. https://doi.org/10.1145/1882291.1882320
Li G, Li P, Sawaya G, Gopalakrishnan G, Ghosh I, Rajan SP (2012) GKLEE: Concolic verification and test generation for GPUs. In: Proceedings of the 17th ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP ’12. ACM, New York, NY, USA, pp 215–224. https://doi.org/10.1145/2145816.2145844
Moll S, Hack S (2018) Partial control-flow linearization. In: Proceedings of the 39th ACM SIGPLAN conference on programming language design and implementation, PLDI 2018. ACM, New York, NY, USA, pp 543–556. https://doi.org/10.1145/3192366.3192413
Nielson F, Nielson HR, Hankin C (2010) Principles of program analysis. Springer, Cham
MATH Google Scholar
Nvidia: CUDA C Programming Guide v9.0. http://docs.nvidia.com/cuda/cuda-c-programming-guide/
Nvidia: Nvidia Performance Analysis Tools. http://developer.nvidia.com/performance-analysis-tools/
Pharr M, Mark WR (2012) ispc: A spmd compiler for high-performance cpu programming. In: 2012 innovative parallel computing (InPar), pp 1–13. https://doi.org/10.1109/InPar.2012.6339601
Sung IJ, Stratton JA, Hwu WMW (2010) Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques, PACT ’10. ACM, New York, NY, USA, pp 513–522. https://doi.org/10.1145/1854273.1854336
Ueng SZ, Lathara M, Baghsorkhi SS, Hwu WMW (2008) Languages and compilers for parallel computing. chap. CUDA-Lite: reducing GPU Programming Complexity. Springer, Berlin, pp 1–15. https://doi.org/10.1007/978-3-540-89740-8_1
Wu J, Belevich A, Bendersky E, Heffernan M, Leary C, Pienaar J, Roune B, Springer R, Weng X, Hundt R (2016) Gpucc: An open-source GPGPU compiler. In: Proceedings of the 2016 international symposium on code generation and optimization, CGO ’16. ACM, New York, NY, USA, pp 105–116. https://doi.org/10.1145/2854038.2854041
Yang Y, Xiang P, Kong J, Zhou H (2010) A GPGPU compiler for memory optimization and parallelism management. In: Proceedings of the 31st ACM SIGPLAN conference on programming language design and implementation, PLDI ’10. ACM, New York, NY, USA, pp 86–97. https://doi.org/10.1145/1806596.1806606

Download references

Author information

Authors and Affiliations

University of Pennsylvania, Philadelphia, USA
Rajeev Alur, Joseph Devietti, Omar S. Navarro Leija & Nimit Singhania

Authors

Rajeev Alur
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Devietti
View author publications
You can also search for this author in PubMed Google Scholar
Omar S. Navarro Leija
View author publications
You can also search for this author in PubMed Google Scholar
Nimit Singhania
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nimit Singhania.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alur, R., Devietti, J., Leija, O.S.N. et al. Static detection of uncoalesced accesses in GPU programs. Form Methods Syst Des 60, 1–32 (2022). https://doi.org/10.1007/s10703-021-00362-8

Download citation

Received: 18 November 2018
Accepted: 10 February 2021
Published: 05 March 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10703-021-00362-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Static detection of uncoalesced accesses in GPU programs

Abstract

Access this article

Similar content being viewed by others

In-memory database acceleration on FPGAs: a survey

An empirical study of automated unit test generation for Python

A large-scale empirical study on mobile performance: energy, run-time and memory

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Static detection of uncoalesced accesses in GPU programs

Abstract

Access this article

Similar content being viewed by others

In-memory database acceleration on FPGAs: a survey

An empirical study of automated unit test generation for Python

A large-scale empirical study on mobile performance: energy, run-time and memory

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation