Skip to main content

Formal Methods for GPGPU Programming: Is the Demand Met?

  • Conference paper
  • First Online:
Integrated Formal Methods (IFM 2020)

Abstract

Over the years, researchers have developed many formal method tools to support software development. However, hardly any studies are conducted to determine whether the actual problems developers encounter are sufficiently addressed. For the relatively young field of GPU programming, we would like to know whether the tools developed so far are sufficient, or whether some problems still need attention. To this end, we first look at what kind of problems programmers encounter in OpenCL and CUDA. We gather problems from Stack Overflow and categorise them with card sorting. We find that problems related to memory, synchronisation of threads, threads in general and performance are essential topics. Next, we look at (verification) tools in industry and research, to see how these tools addressed the problems we discovered. We think many problems are already properly addressed, but there is still a need for easy to use sound tools. Alternatively, languages or programming styles can be created, that allows for easier checking for soundness .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Although this is not exactly true any more for Nvidia’s Volta architecture and onward. See https://developer.nvidia.com/blog/inside-volta/.

  2. 2.

    https://data.stackexchange.com/stackoverflow/query/1258739/gpgpu-tags.

  3. 3.

    https://data.stackexchange.com/stackoverflow/query/1258838/gpgpu.

  4. 4.

    https://www.iwocl.org/resources/opencl-libraries-and-toolkits/.

  5. 5.

    https://academic.microsoft.com/home.

  6. 6.

    https://developer.nvidia.com/blog/cuda-pro-tip-write-flexible-kernels-grid-stride-loops/.

References

  1. CUDA-MEMCHECK, June 2020. https://docs.nvidia.com/cuda/cuda-memcheck

  2. CUDA Programming Guide, July 2020. http://docs.nvidia.com/cuda/cuda-c-programming-guide/

  3. Parallel Thread Execution ISA Version 7.0, July 2020. http://docs.nvidia.com/cuda/parallel-thread-execution/index.html

  4. SPIR - The Industry Open Standard Intermediate Language for Parallel Compute and Graphics, July 2020. https://www.khronos.org/spir/

  5. Ahmed, S., Bagherzadeh, M.: What do concurrency developers ask about? a large-scale study using stack overflow. In: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018, pp. 1–10. Association for Computing Machinery, Oulu, October 2018. https://doi.org/10.1145/3239235.3239524

  6. Alglave, J., et al.: GPU concurrency: weak behaviours and programming assumptions. In: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS 2015, pp. 577–591. ACM Press, Istanbul (2015). https://doi.org/10.1145/2694344.2694391

  7. Alur, R., Devietti, J., Singhania, N.: Block-size independence for GPU programs. In: Podelski, A. (ed.) SAS 2018. LNCS, vol. 11002, pp. 107–126. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99725-4_9

    Chapter  Google Scholar 

  8. Atzeni, S., et al.: ARCHER: effectively spotting data races in large OpenMP applications. In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 53–62. IEEE, Chicago, May 2016. https://doi.org/10.1109/IPDPS.2016.68

  9. Azad, H.S.: Advances in GPU Research and Practice, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (2016)

    Google Scholar 

  10. Banerjee, K., Banerjee, S., Sarkar, S.: Data-race detection: the missing piece for an end-to-end semantic equivalence checker for parallelizing transformations of array-intensive programs. In: Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ARRAY 2016, pp. 1–8. Association for Computing Machinery, Santa Barbara, June 2016. https://doi.org/10.1145/2935323.2935324

  11. Betts, A., Chong, N., Donaldson, A., Qadeer, S., Thomson, P.: GPUVerify: a verifier for GPU kernels. In: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA 2012, pp. 113–132. ACM, New York (2012). https://doi.org/10.1145/2384616.2384625

  12. Blom, S., Huisman, M.: The VerCors tool for verification of concurrent programs. In: Jones, C., Pihlajasaari, P., Sun, J. (eds.) FM 2014. LNCS, vol. 8442, pp. 127–131. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06410-9_9

    Chapter  Google Scholar 

  13. Blom, S., Huisman, M., Mihelčić, M.: Specification and verification of GPGPU programs. Sci. Comput. Program. 95, 376–388 (2014). https://doi.org/10.1016/j.scico.2014.03.013

    Article  Google Scholar 

  14. Collingbourne, P., Cadar, C., Kelly, P.H.J.: Symbolic testing of OpenCL code. In: Eder, K., Lourenço, J., Shehory, O. (eds.) HVC 2011. LNCS, vol. 7261, pp. 203–218. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34188-5_18

    Chapter  Google Scholar 

  15. Donaldson, A.F., Ketema, J., Sorensen, T., Wickerson, J.: Forward progress on GPU concurrency (invited talk). In: Meyer, R., Nestmann, U. (eds.) 28th International Conference on Concurrency Theory (CONCUR 2017). Leibniz International Proceedings in Informatics (LIPIcs), vol. 85, pp. 1:1–1:13. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl (2017). https://doi.org/10.4230/LIPIcs.CONCUR.2017.1

  16. Eizenberg, A., Peng, Y., Pigli, T., Mansky, W., Devietti, J.: BARRACUDA: binary-level analysis of runtime RAces in CUDA programs. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, pp. 126–140. Association for Computing Machinery, Barcelona, June 2017. https://doi.org/10.1145/3062341.3062342

  17. Holey, A., Mekkat, V., Zhai, A.: HAccRG: hardware-accelerated data race detection in GPUs. In: 2013 42nd International Conference on Parallel Processing, pp. 60–69. IEEE, Lyon, October 2013. https://doi.org/10.1109/ICPP.2013.15

  18. Islam, M.J., Nguyen, H.A., Pan, R., Rajan, H.: What do developers ask about ML libraries? A large-scale study using stack overflow. ArXiv: 1906.11940 (Cs), June 2019

  19. Kamil, S., Cheung, A., Itzhaky, S., Solar-Lezama, A.: Verified lifting of stencil computations. In: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, pp. 711–726. Association for Computing Machinery, Santa Barbara, June 2016. https://doi.org/10.1145/2908080.2908117

  20. Kojima, K., Igarashi, A.: A hoare logic for GPU kernels. ACM Trans. Comput. Log. 18(1), 3:1–3:43 (2017). https://doi.org/10.1145/3001834

    Article  MathSciNet  MATH  Google Scholar 

  21. Kojima, K., Imanishi, A., Igarashi, A.: Automated verification of functional correctness of race-free GPU programs. J. Autom. Reason. 60(3), 279–298 (2018). https://doi.org/10.1007/s10817-017-9428-2

    Article  MathSciNet  MATH  Google Scholar 

  22. Leung, A., Gupta, M., Agarwal, Y., Gupta, R., Jhala, R., Lerner, S.: Verifying GPU kernels by test amplification. SIGPLAN Not. 47(6), 383–394 (2012). https://doi.org/10.1145/2345156.2254110

    Article  Google Scholar 

  23. Li, G., Gopalakrishnan, G.: Scalable SMT-based verification of GPU kernel functions. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2010, p. 187. ACM Press, Santa Fe (2010). https://doi.org/10.1145/1882291.1882320

  24. Li, G., Li, P., Sawaya, G., Gopalakrishnan, G., Ghosh, I., Rajan, S.P.: GKLEE: concolic verification and test generation for GPUs. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2012, pp. 215–224. ACM, New York (2012). https://doi.org/10.1145/2145816.2145844

  25. Li, P., Li, G., Gopalakrishnan, G.: Practical symbolic race checking of GPU programs. In: SC 2014: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 179–190, November 2014. https://doi.org/10.1109/SC.2014.20

  26. Li, P., et al.: LD: low-overhead GPU race detection without access monitoring. ACM Trans. Archit. Code Optim. 14(1), 1–25 (2017). https://doi.org/10.1145/3046678

    Article  Google Scholar 

  27. Menzies, T., Williams, L., Zimmermann, T.: Perspectives on Data Science for Software Engineering, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (2016)

    Google Scholar 

  28. Monteiro, F.R., et al.: ESBMC-GPU a context-bounded model checking tool to verify CUDA programs. Sci. Comput. Program. 152, 63–69 (2018). https://doi.org/10.1016/j.scico.2017.09.005

    Article  Google Scholar 

  29. Peng, Y., Grover, V., Devietti, J.: CURD: a dynamic CUDA race detector. In: PLDI 2018, pp. 390–403. Association for Computing Machinery, Philadelphia, June 2018. https://doi.org/10.1145/3192366.3192368

  30. Pinto, G., Torres, W., Castor, F.: A study on the most popular questions about concurrent programming. In: Proceedings of the 6th Workshop on Evaluation and Usability of Programming Languages and Tools - PLATEAU 2015, pp. 39–46. ACM Press, Pittsburgh (2015). https://doi.org/10.1145/2846680.2846687

  31. Price, J., McIntosh-Smith, S.: Oclgrind: an extensible OpenCL device simulator. In: Proceedings of the 3rd International Workshop on OpenCL - IWOCL 2015, pp. 1–7. ACM Press, Palo Alto (2015). https://doi.org/10.1145/2791321.2791333

  32. Rosen, C., Shihab, E.: What are mobile developers asking about? A large scale study using stack overflow. Empir. Softw. Eng. 21(3), 1192–1223 (2016). https://doi.org/10.1007/s10664-015-9379-3

    Article  Google Scholar 

  33. Safari, M., Oortwijn, W., Joosten, S., Huisman, M.: Formal verification of parallel prefix sum. In: Lee, R., Jha, S., Mavridou, A. (eds.) NFM 2020. LNCS, vol. 12229, pp. 170–186. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55754-6_10

    Chapter  Google Scholar 

  34. Schmitz, A., Protze, J., Yu, L., Schwitanski, S., Müller, M.S.: DataRaceOnAccelerator – a micro-benchmark suite for evaluating correctness tools targeting accelerators. In: Schwardmann, U., et al. (eds.) Euro-Par 2019. LNCS, vol. 11997, pp. 245–257. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-48340-1_19

    Chapter  Google Scholar 

  35. Sharma, R., Bauer, M., Aiken, A.: Verification of producer-consumer synchronization in GPU programs. In: PLDI 2015, pp. 88–98. Association for Computing Machinery, Portland, June 2015. https://doi.org/10.1145/2737924.2737962

  36. Siegel, S.F., et al.: CIVL: the concurrency intermediate verification language. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC 2015, pp. 1–12. ACM Press, Austin (2015). https://doi.org/10.1145/2807591.2807635

  37. Sorensen, T., Donaldson, A.F., Batty, M., Gopalakrishnan, G., Rakamaric, Z.: Portable inter-workgroup barrier synchronisation for GPUs. In: OOPSLA 2016, p. 20 (2016). https://doi.org/10.1145/3022671.2984032

  38. van den Haak, L.B., Wijs, A., van den Brand, M., Huisman, M.: Card sorting data for Formal methods for GPGPU programming: is the demand met?, September 2020. https://doi.org/10.4121/12988781

  39. Wu, M., Zhou, H., Zhang, L., Liu, C., Zhang, Y.: Characterizing and detecting CUDA program bugs. ArXiv: 1905.01833 (Cs), May 2019

  40. Xing, Y., Huang, B.Y., Gupta, A., Malik, S.: A formal instruction-level GPU model for scalable verification. In: 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8, November 2018. https://doi.org/10.1145/3240765.3240771

  41. Zheng, M., Ravi, V.T., Qin, F., Agrawal, G.: GRace: a low-overhead mechanism for detecting data races in GPU programs. SIGPLAN Not. 46(8), 135–146 (2011). https://doi.org/10.1145/2038037.1941574

    Article  Google Scholar 

Download references

Acknowledgements and Data Availibility Statement

We want to thank Jan Martens for his help with the card sorting.

The data used for the categorization with card sorting is available in the Figshare repository: https://doi.org/10.4121/12988781  [38].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lars B. van den Haak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

van den Haak, L.B., Wijs, A., van den Brand, M., Huisman, M. (2020). Formal Methods for GPGPU Programming: Is the Demand Met?. In: Dongol, B., Troubitsyna, E. (eds) Integrated Formal Methods. IFM 2020. Lecture Notes in Computer Science(), vol 12546. Springer, Cham. https://doi.org/10.1007/978-3-030-63461-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63461-2_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63460-5

  • Online ISBN: 978-3-030-63461-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics