Interleaving and Lock-Step Semantics for Analysis and Verification of GPU Kernels

  • Peter Collingbourne
  • Alastair F. Donaldson
  • Jeroen Ketema
  • Shaz Qadeer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7792)


We study semantics of GPU kernels — the parallel programs that run on Graphics Processing Units (GPUs). We provide a novel lock-step execution semantics for GPU kernels represented by arbitrary reducible control flow graphs and compare this semantics with a traditional interleaving semantics. We show for terminating kernels that either both semantics compute identical results or both behave erroneously.

The result induces a method that allows GPU kernels with arbitrary reducible control flow graphs to be verified via transformation to a sequential program that employs predicated execution. We implemented this method in the GPUVerify tool and experimentally evaluated it by comparing the tool with the previous version of the tool based on a similar method for structured programs, i.e., where control is organised using if and while statements. The evaluation was based on a set of 163 open source and commercial GPU kernels. Among these kernels, 42 exhibit unstructured control flow which our novel method can handle fully automatically, but the previous method could not. Overall the generality of the new method comes at a modest price: Verification across our benchmark set was 2.25 times slower overall; however, the median slow down across all kernels was 0.77, indicating that our novel technique yielded faster analysis in many cases.


Graphic Processing Unit Operational Semantic Execution Trace Data Race Local Outlier Factor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Pearson Education, 2nd edn. (2007)Google Scholar
  2. 2.
    Allen, J., Kennedy, K., Porterfield, C., Warren, J.: Conversion of control dependence to data dependence. In: POPL 1983, pp. 177–189 (1983)Google Scholar
  3. 3.
    Alshawabkeh, M., Jang, B., Kaeli, D.: Accelerating the local outlier factor algorithm on a GPU for intrusion detection systems. In: GPGPU-3, pp. 104–110 (2010)Google Scholar
  4. 4.
    AMD: AMD Accelerated Parallel Processing (APP) SDK,
  5. 5.
    Barnett, M., Leino, K.R.M.: Weakest-precondition of unstructured programs. In: PASTE 2005, pp. 82–87 (2005)Google Scholar
  6. 6.
    Barnett, M., Chang, B.-Y.E., DeLine, R., Jacobs, B., Leino, K.R.M.: Boogie: A Modular Reusable Verifier for Object-Oriented Programs. In: de Boer, F.S., Bonsangue, M.M., Graf, S., de Roever, W.-P. (eds.) FMCO 2005. LNCS, vol. 4111, pp. 364–387. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  7. 7.
    Betts, A., Chong, N., Donaldson, A.F., Qadeer, S., Thomson, P.: GPUVerify: a verifier for GPU kernels. In: OOPSLA 2012, pp. 113–132 (2012)Google Scholar
  8. 8.
    Collingbourne, P., Cadar, C., Kelly, P.H.J.: Symbolic Testing of OpenCL Code. In: Eder, K., Lourenço, J., Shehory, O. (eds.) HVC 2011. LNCS, vol. 7261, pp. 203–218. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  9. 9.
    DeMillo, R.A., Eisenstat, S.C., Lipton, R.J.: Space-time trade-offs in structured programming: An improved combinatorial embedding theorem. J. ACM 27(1), 123–127 (1980)MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Fung, W.W., Sham, I., Yuan, G., Aamodt, T.M.: Dynamic warp formation and scheduling for efficient GPU control flow. In: MICRO 2007, pp. 407–418 (2007)Google Scholar
  11. 11.
    Habermaier, A.: The model of computation of CUDA and its formal semantics. Tech. Rep. 2011-14, University of Augsburg (2011)Google Scholar
  12. 12.
    Habermaier, A., Knapp, A.: On the Correctness of the SIMT Execution Model of GPUs. In: Seidl, H. (ed.) ESOP 2012. LNCS, vol. 7211, pp. 316–335. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Khronos Group: The OpenCL specification, version 1.2 (2011)Google Scholar
  14. 14.
    Lamport, L.: What good is temporal logic? In: Information Processing 1983, pp. 657–668 (1983)Google Scholar
  15. 15.
    Leung, A., Gupta, M., Agarwal, Y., Gupta, R., Jhala, R., Lerner, S.: Verifying GPU kernels by test amplification. In: PLDI 2012, pp. 383–394 (2012)Google Scholar
  16. 16.
    Li, G., Gopalakrishnan, G.: Scalable SMT-based verification of GPU kernel functions. In: FSE 2010, pp. 187–196 (2010)Google Scholar
  17. 17.
    Li, G., Li, P., Sawaya, G., Gopalakrishnan, G., Ghosh, I., Rajan, S.P.: GKLEE: concolic verification and test generation for GPUs. In: PPoPP 2012, pp. 215–224 (2012)Google Scholar
  18. 18.
  19. 19.
    de Moura, L., Bjørner, N.: Z3: An Efficient SMT Solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  20. 20.
    NVIDIA: CUDA Toolkit Release Archive,
  21. 21.
    NVIDIA: NVIDIA CUDA C Programming Guide, Version 4.2 (2012)Google Scholar
  22. 22.
  23. 23.
    Zhu, F., Chen, P., Yang, D., Zhang, W., Chen, H., Zang, B.: A GPU-based high-throughput image retrieval algorithm. In: GPGPU-5, pp. 30–37 (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Peter Collingbourne
    • 1
  • Alastair F. Donaldson
    • 1
  • Jeroen Ketema
    • 1
  • Shaz Qadeer
    • 2
  1. 1.Imperial College LondonUK
  2. 2.Microsoft ResearchUSA

Personalised recommendations