Engineering a Static Verification Tool for GPU Kernels

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8559)


We report on practical experiences over the last 2.5 years related to the engineering of GPUVerify, a static verification tool for OpenCL and CUDA GPU kernels, plotting the progress of GPUVerify from a prototype to a fully functional and relatively efficient analysis tool. Our hope is that this experience report will serve the verification community by helping to inform future tooling efforts.


Sequential Program Invariant Generation Data Race Abstract Syntax Tree Cache Coherence Protocol 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Allen, J., Kennedy, K., Porterfield, C., Warren, J.: Conversion of control dependence to data dependence. In: POPL, pp. 177–189 (1983)Google Scholar
  2. 2.
    AMD: AMD Accelerated Parallel Processing (APP) SDK,
  3. 3.
    Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA workloads using a detailed gpu simulator. In: ISPASS, pp. 163–174 (2009)Google Scholar
  4. 4.
    Bardsley, E., Donaldson, A.F.: Warps and atomics: Beyond barrier synchronization in the verification of GPU kernels. In: Badger, J.M., Rozier, K.Y. (eds.) NFM 2014. LNCS, vol. 8430, pp. 230–245. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  5. 5.
    Bardsley, E., Donaldson, A.F., Wickerson, J.: KernelInterceptor: Automating gpu kernel verification by intercepting kernels and their parameters. In: IWOCL (2014)Google Scholar
  6. 6.
    Barnett, M., Chang, B.-Y.E., DeLine, R., Jacobs, B., M. Leino, K.R.: Boogie: A modular reusable verifier for object-oriented programs. In: de Boer, F.S., Bonsangue, M.M., Graf, S., de Roever, W.-P. (eds.) FMCO 2005. LNCS, vol. 4111, pp. 364–387. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  7. 7.
    Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanović, D., King, T., Reynolds, A., Tinelli, C.: CVC4. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 171–177. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  8. 8.
    Betts, A., Chong, N., Donaldson, A.F., Qadeer, S., Thomson, P.: GPUVerify: a verifier for GPU kernels. In: OOPSLA, pp. 113–132 (2012)Google Scholar
  9. 9.
    Che, S., et al.: Rodinia: A benchmark suite for heterogeneous computing. In: Workload Characterization, pp. 44–54 (2009)Google Scholar
  10. 10.
    Chiang, W.-F., Gopalakrishnan, G., Li, G., Rakamarić, Z.: Formal analysis of GPU programs with atomics via conflict-directed delay-bounding. In: Brat, G., Rungta, N., Venet, A. (eds.) NFM 2013. LNCS, vol. 7871, pp. 213–228. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  11. 11.
    Chong, N., Donaldson, A.F., Kelly, P., Ketema, J., Qadeer, S.: Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels. In: OOPSLA (2013)Google Scholar
  12. 12.
    Chong, N., Donaldson, A.F., Ketema, J.: A sound and complete abstraction for reasoning about parallel prefix sums. In: POPL, pp. 397–410 (2014)Google Scholar
  13. 13.
    Chou, C.-T., Mannava, P.K., Park, S.: A simple method for parameterized verification of cache coherence protocols. In: Hu, A.J., Martin, A.K. (eds.) FMCAD 2004. LNCS, vol. 3312, pp. 382–398. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    Collingbourne, P., Cadar, C., Kelly, P.H.J.: Symbolic crosschecking of data-parallel floating-point code. IEEE Trans. Software Eng. (to appear, 2014)Google Scholar
  15. 15.
    Collingbourne, P., Donaldson, A.F., Ketema, J., Qadeer, S.: Interleaving and lock-step semantics for analysis and verification of GPU kernels. In: Felleisen, M., Gardner, P. (eds.) Programming Languages and Systems. LNCS, vol. 7792, pp. 270–289. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  16. 16.
    Danalis, A., et al.: The scalable heterogeneous computing (SHOC) benchmark suite. In: GPGPU 2010, pp. 63–74 (2010)Google Scholar
  17. 17.
    Donaldson, A.F., Kroening, D., Rümmer, P.: Automatic analysis of DMA races using model checking and k-induction. Formal Methods in System Design 39(1), 83–113 (2011)CrossRefzbMATHGoogle Scholar
  18. 18.
    Ferrante, J., Ottenstein, K.J., Warren, J.D.: The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. 9(3), 319–349 (1987)CrossRefzbMATHGoogle Scholar
  19. 19.
    Filliâtre, J.-C., Paskevich, A.: Why3 — where programs meet provers. In: Felleisen, M., Gardner, P. (eds.) Programming Languages and Systems. LNCS, vol. 7792, pp. 125–128. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  20. 20.
    Flanagan, C., M. Leino, K.R.: Houdini, an annotation assistant for eSC/Java. In: Oliveira, J.N., Zave, P. (eds.) FME 2001. LNCS, vol. 2021, pp. 500–517. Springer, Heidelberg (2001)Google Scholar
  21. 21.
    Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. In: InPar (2012)Google Scholar
  22. 22.
    Harris, M., Buck, I.: GPU flow-control idioms. In: GPU Gems 2. Addison-Wesley (2005)Google Scholar
  23. 23.
    Huisman, M., Mihelčić, M.: Specification and verification of GPGPU programs using permission-based separation logic. In: BYTECODE (2013)Google Scholar
  24. 24.
    Karrenberg, R., Hack, S.: Improving performance of openCL on cPUs. In: O’Boyle, M. (ed.) CC 2012. LNCS, vol. 7210, pp. 1–20. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  25. 25.
    Khronos OpenCL Working Group: The OpenCL specification, version 2.0 (2013)Google Scholar
  26. 26.
    Leung, A., Gupta, M., Agarwal, Y., et al.: Verifying GPU kernels by test amplification. In: PLDI, pp. 383–394 (2012)Google Scholar
  27. 27.
    Li, G., Gopalakrishnan, G.: Scalable SMT-based verification of GPU kernel functions. In: FSE, pp. 187–196 (2010)Google Scholar
  28. 28.
    Li, G., Li, P., Sawaya, G., Gopalakrishnan, G., Ghosh, I., Rajan, S.P.: GKLEE: Concolic verification and test generation for GPUs. In: PPoPP, pp. 215–224 (2012)Google Scholar
  29. 29.
    Li, P., Li, G., Gopalakrishnan, G.: Parametric flows: Automated behavior equivalencing for symbolic analysis of races in CUDA programs. In: SC, pp. 29:1–29:10 (2012)Google Scholar
  30. 30.
    McMillan, K.L.: Verification of infinite state systems by compositional model checking. In: Pierre, L., Kropf, T. (eds.) CHARME 1999. LNCS, vol. 1703, pp. 219–237. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  31. 31.
    Microsoft Corporation: C++ AMP sample projects for download (MSDN blog),
  32. 32.
    de Moura, L., Bjørner, N.S.: Z3: An efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  33. 33.
    NVIDIA: CUDA C programming guide, version 5.5 (2013)Google Scholar
  34. 34.
    NVIDIA: GPU Computing SDK (accessed 2013),
  35. 35.
    Rakamaric, Z., Hu, A.J.: Automatic inference of frame axioms using static analysis. In: ASE, pp. 89–98 (2008)Google Scholar
  36. 36.
    Rakamarić, Z., Hu, A.J.: A scalable memory model for low-level code. In: Jones, N.D., Müller-Olm, M. (eds.) VMCAI 2009. LNCS, vol. 5403, pp. 290–304. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  37. 37.
  38. 38.
    Stratton, J., et al.: Parboil: A revised benchmark suite for scientific and commercial throughput computing. Tech. Rep. IMPACT-12-01, UIUC (2012)Google Scholar
  39. 39.
    Talupur, M., Tuttle, M.R.: Going with the flow: Parameterized verification using message flows. In: FMCAD, pp. 1–8 (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Imperial College LondonUK
  2. 2.GoogleUK
  3. 3.Microsoft ResearchUK

Personalised recommendations