Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels

  • Ethel Bardsley
  • Alastair F. Donaldson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8430)


We describe the design and implementation of methods to support reasoning about data races in GPU kernels where constructs other than the standard barrier primitive are used for synchronization. At one extreme we consider kernels that exploit implicit, coarse-grained synchronization between threads in the same warp, a feature provided by many architectures. At the other extreme we consider kernels that reduce or avoid barrier synchronization through the use of atomic operations. We discuss design decisions associated with providing support for warps and atomics in GPUVerify, a formal verification tool for OpenCL and CUDA kernels. We evaluate the practical impact of these design decisions using a large set of benchmarks, showing that warps can be supported in a scalable manner, that a coarse abstraction suffices for efficient reasoning about most practical uses of atomic operations, and that a novel, refined abstraction captures an important design pattern where atomic operations are used to compute unique array indices. Our evaluation revealed two previously unknown bugs in publicly available benchmark suites.


Data Race Atomic Operation Race Analysis Barrier Synchronization Master Thread 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    AMD, Inc.: AMD graphics cores next (GCN) architecture, white paper (2012)Google Scholar
  2. 2.
    Barnett, M., Chang, B.-Y.E., DeLine, R., Jacobs, B., Leino, K.R.M.: Boogie: A modular reusable verifier for object-oriented programs. In: de Boer, F.S., Bonsangue, M.M., Graf, S., de Roever, W.-P. (eds.) FMCO 2005. LNCS, vol. 4111, pp. 364–387. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Betts, A., Chong, N., Donaldson, A.F., Qadeer, S., Thomson, P.: GPUVerify: a verifier for GPU kernels. In: OOPSLA (2012)Google Scholar
  4. 4.
    Chiang, W.-F., Gopalakrishnan, G., Li, G., Rakamarić, Z.: Formal analysis of GPU programs with atomics via conflict-directed delay-bounding. In: Brat, G., Rungta, N., Venet, A. (eds.) NFM 2013. LNCS, vol. 7871, pp. 213–228. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  5. 5.
    Collingbourne, P., Cadar, C., Kelly, P.H.J.: Symbolic testing of OpenCL code. In: Eder, K., Lourenço, J., Shehory, O. (eds.) HVC 2011. LNCS, vol. 7261, pp. 203–218. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  6. 6.
    Collingbourne, P.C.: Symbolic Crosschecking of Data-Parallel Floating Point Code. Ph.D. thesis, Imperial College London (2012)Google Scholar
  7. 7.
    Emmi, M., Qadeer, S., Rakamaric, Z.: Delay-bounded scheduling. In: POPL (2011)Google Scholar
  8. 8.
    Huisman, M., Mihelčić, M.: Specification and verification of GPGPU programs using permission-based separation logic. In: BYTECODE (2013)Google Scholar
  9. 9.
    Khronos Group: The OpenCL extension specification, version 2.0 (2013)Google Scholar
  10. 10.
    Khronos Group: The OpenCL specification, version 2.0 (2013)Google Scholar
  11. 11.
    Leino, K., Rustan, M.: This is Boogie 2 (2008), manuscript KRML 178 (2008)Google Scholar
  12. 12.
    Leung, A., Gupta, M., Agarwal, Y., Gupta, R., Jhala, R., Lerner, S.: Verifying GPU kernels by test amplification. In: PLDI (2012)Google Scholar
  13. 13.
    Li, G., Gopalakrishnan, G.: Scalable SMT-based verification of GPU kernel functions. In: FSE (2010)Google Scholar
  14. 14.
    Li, G., Li, P., Sawaya, G., Gopalakrishnan, G., Ghosh, I., Rajan, S.P.: GKLEE: concolic verification and test generation for GPUs. In: PPoPP. ACM (2012)Google Scholar
  15. 15.
    Li, P., Li, G., Gopalakrishnan, G.: Parametric flows: automated behavior equivalencing for symbolic analysis of races in CUDA programs. In: SC (2012)Google Scholar
  16. 16.
    NVIDIA Corporation: CUDA C programming guide, version 5.5 (2013)Google Scholar
  17. 17.
    Sengupta, S., Harris, M., Garland, M.: Efficient parallel scan algorithms for GPUs. Tech. Rep. NVR-2008-003, NVIDIA (2008)Google Scholar
  18. 18.
    Stratton, J.A., et al.: Parboil: A revised benchmark suite for scientific and commercial throughput computing. Tech. Rep. IMPACT-12-01, UIUC (2012)Google Scholar
  19. 19.
    Collingbourne, P., Donaldson, A.F., Ketema, J., Qadeer, S.: Interleaving and lock-step semantics for analysis and verification of GPU kernels. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013. LNCS, vol. 7792, pp. 270–289. Springer, Heidelberg (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Ethel Bardsley
    • 1
  • Alastair F. Donaldson
    • 1
  1. 1.Imperial College LondonUK

Personalised recommendations