Abstract
Over the years, researchers have developed many formal method tools to support software development. However, hardly any studies are conducted to determine whether the actual problems developers encounter are sufficiently addressed. For the relatively young field of GPU programming, we would like to know whether the tools developed so far are sufficient, or whether some problems still need attention. To this end, we first look at what kind of problems programmers encounter in OpenCL and CUDA. We gather problems from Stack Overflow and categorise them with card sorting. We find that problems related to memory, synchronisation of threads, threads in general and performance are essential topics. Next, we look at (verification) tools in industry and research, to see how these tools addressed the problems we discovered. We think many problems are already properly addressed, but there is still a need for easy to use sound tools. Alternatively, languages or programming styles can be created, that allows for easier checking for soundness .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Although this is not exactly true any more for Nvidia’s Volta architecture and onward. See https://developer.nvidia.com/blog/inside-volta/.
- 2.
- 3.
- 4.
- 5.
- 6.
References
CUDA-MEMCHECK, June 2020. https://docs.nvidia.com/cuda/cuda-memcheck
CUDA Programming Guide, July 2020. http://docs.nvidia.com/cuda/cuda-c-programming-guide/
Parallel Thread Execution ISA Version 7.0, July 2020. http://docs.nvidia.com/cuda/parallel-thread-execution/index.html
SPIR - The Industry Open Standard Intermediate Language for Parallel Compute and Graphics, July 2020. https://www.khronos.org/spir/
Ahmed, S., Bagherzadeh, M.: What do concurrency developers ask about? a large-scale study using stack overflow. In: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018, pp. 1–10. Association for Computing Machinery, Oulu, October 2018. https://doi.org/10.1145/3239235.3239524
Alglave, J., et al.: GPU concurrency: weak behaviours and programming assumptions. In: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS 2015, pp. 577–591. ACM Press, Istanbul (2015). https://doi.org/10.1145/2694344.2694391
Alur, R., Devietti, J., Singhania, N.: Block-size independence for GPU programs. In: Podelski, A. (ed.) SAS 2018. LNCS, vol. 11002, pp. 107–126. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99725-4_9
Atzeni, S., et al.: ARCHER: effectively spotting data races in large OpenMP applications. In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 53–62. IEEE, Chicago, May 2016. https://doi.org/10.1109/IPDPS.2016.68
Azad, H.S.: Advances in GPU Research and Practice, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (2016)
Banerjee, K., Banerjee, S., Sarkar, S.: Data-race detection: the missing piece for an end-to-end semantic equivalence checker for parallelizing transformations of array-intensive programs. In: Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ARRAY 2016, pp. 1–8. Association for Computing Machinery, Santa Barbara, June 2016. https://doi.org/10.1145/2935323.2935324
Betts, A., Chong, N., Donaldson, A., Qadeer, S., Thomson, P.: GPUVerify: a verifier for GPU kernels. In: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA 2012, pp. 113–132. ACM, New York (2012). https://doi.org/10.1145/2384616.2384625
Blom, S., Huisman, M.: The VerCors tool for verification of concurrent programs. In: Jones, C., Pihlajasaari, P., Sun, J. (eds.) FM 2014. LNCS, vol. 8442, pp. 127–131. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06410-9_9
Blom, S., Huisman, M., Mihelčić, M.: Specification and verification of GPGPU programs. Sci. Comput. Program. 95, 376–388 (2014). https://doi.org/10.1016/j.scico.2014.03.013
Collingbourne, P., Cadar, C., Kelly, P.H.J.: Symbolic testing of OpenCL code. In: Eder, K., Lourenço, J., Shehory, O. (eds.) HVC 2011. LNCS, vol. 7261, pp. 203–218. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34188-5_18
Donaldson, A.F., Ketema, J., Sorensen, T., Wickerson, J.: Forward progress on GPU concurrency (invited talk). In: Meyer, R., Nestmann, U. (eds.) 28th International Conference on Concurrency Theory (CONCUR 2017). Leibniz International Proceedings in Informatics (LIPIcs), vol. 85, pp. 1:1–1:13. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl (2017). https://doi.org/10.4230/LIPIcs.CONCUR.2017.1
Eizenberg, A., Peng, Y., Pigli, T., Mansky, W., Devietti, J.: BARRACUDA: binary-level analysis of runtime RAces in CUDA programs. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, pp. 126–140. Association for Computing Machinery, Barcelona, June 2017. https://doi.org/10.1145/3062341.3062342
Holey, A., Mekkat, V., Zhai, A.: HAccRG: hardware-accelerated data race detection in GPUs. In: 2013 42nd International Conference on Parallel Processing, pp. 60–69. IEEE, Lyon, October 2013. https://doi.org/10.1109/ICPP.2013.15
Islam, M.J., Nguyen, H.A., Pan, R., Rajan, H.: What do developers ask about ML libraries? A large-scale study using stack overflow. ArXiv: 1906.11940 (Cs), June 2019
Kamil, S., Cheung, A., Itzhaky, S., Solar-Lezama, A.: Verified lifting of stencil computations. In: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, pp. 711–726. Association for Computing Machinery, Santa Barbara, June 2016. https://doi.org/10.1145/2908080.2908117
Kojima, K., Igarashi, A.: A hoare logic for GPU kernels. ACM Trans. Comput. Log. 18(1), 3:1–3:43 (2017). https://doi.org/10.1145/3001834
Kojima, K., Imanishi, A., Igarashi, A.: Automated verification of functional correctness of race-free GPU programs. J. Autom. Reason. 60(3), 279–298 (2018). https://doi.org/10.1007/s10817-017-9428-2
Leung, A., Gupta, M., Agarwal, Y., Gupta, R., Jhala, R., Lerner, S.: Verifying GPU kernels by test amplification. SIGPLAN Not. 47(6), 383–394 (2012). https://doi.org/10.1145/2345156.2254110
Li, G., Gopalakrishnan, G.: Scalable SMT-based verification of GPU kernel functions. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2010, p. 187. ACM Press, Santa Fe (2010). https://doi.org/10.1145/1882291.1882320
Li, G., Li, P., Sawaya, G., Gopalakrishnan, G., Ghosh, I., Rajan, S.P.: GKLEE: concolic verification and test generation for GPUs. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2012, pp. 215–224. ACM, New York (2012). https://doi.org/10.1145/2145816.2145844
Li, P., Li, G., Gopalakrishnan, G.: Practical symbolic race checking of GPU programs. In: SC 2014: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 179–190, November 2014. https://doi.org/10.1109/SC.2014.20
Li, P., et al.: LD: low-overhead GPU race detection without access monitoring. ACM Trans. Archit. Code Optim. 14(1), 1–25 (2017). https://doi.org/10.1145/3046678
Menzies, T., Williams, L., Zimmermann, T.: Perspectives on Data Science for Software Engineering, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (2016)
Monteiro, F.R., et al.: ESBMC-GPU a context-bounded model checking tool to verify CUDA programs. Sci. Comput. Program. 152, 63–69 (2018). https://doi.org/10.1016/j.scico.2017.09.005
Peng, Y., Grover, V., Devietti, J.: CURD: a dynamic CUDA race detector. In: PLDI 2018, pp. 390–403. Association for Computing Machinery, Philadelphia, June 2018. https://doi.org/10.1145/3192366.3192368
Pinto, G., Torres, W., Castor, F.: A study on the most popular questions about concurrent programming. In: Proceedings of the 6th Workshop on Evaluation and Usability of Programming Languages and Tools - PLATEAU 2015, pp. 39–46. ACM Press, Pittsburgh (2015). https://doi.org/10.1145/2846680.2846687
Price, J., McIntosh-Smith, S.: Oclgrind: an extensible OpenCL device simulator. In: Proceedings of the 3rd International Workshop on OpenCL - IWOCL 2015, pp. 1–7. ACM Press, Palo Alto (2015). https://doi.org/10.1145/2791321.2791333
Rosen, C., Shihab, E.: What are mobile developers asking about? A large scale study using stack overflow. Empir. Softw. Eng. 21(3), 1192–1223 (2016). https://doi.org/10.1007/s10664-015-9379-3
Safari, M., Oortwijn, W., Joosten, S., Huisman, M.: Formal verification of parallel prefix sum. In: Lee, R., Jha, S., Mavridou, A. (eds.) NFM 2020. LNCS, vol. 12229, pp. 170–186. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55754-6_10
Schmitz, A., Protze, J., Yu, L., Schwitanski, S., Müller, M.S.: DataRaceOnAccelerator – a micro-benchmark suite for evaluating correctness tools targeting accelerators. In: Schwardmann, U., et al. (eds.) Euro-Par 2019. LNCS, vol. 11997, pp. 245–257. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-48340-1_19
Sharma, R., Bauer, M., Aiken, A.: Verification of producer-consumer synchronization in GPU programs. In: PLDI 2015, pp. 88–98. Association for Computing Machinery, Portland, June 2015. https://doi.org/10.1145/2737924.2737962
Siegel, S.F., et al.: CIVL: the concurrency intermediate verification language. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC 2015, pp. 1–12. ACM Press, Austin (2015). https://doi.org/10.1145/2807591.2807635
Sorensen, T., Donaldson, A.F., Batty, M., Gopalakrishnan, G., Rakamaric, Z.: Portable inter-workgroup barrier synchronisation for GPUs. In: OOPSLA 2016, p. 20 (2016). https://doi.org/10.1145/3022671.2984032
van den Haak, L.B., Wijs, A., van den Brand, M., Huisman, M.: Card sorting data for Formal methods for GPGPU programming: is the demand met?, September 2020. https://doi.org/10.4121/12988781
Wu, M., Zhou, H., Zhang, L., Liu, C., Zhang, Y.: Characterizing and detecting CUDA program bugs. ArXiv: 1905.01833 (Cs), May 2019
Xing, Y., Huang, B.Y., Gupta, A., Malik, S.: A formal instruction-level GPU model for scalable verification. In: 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8, November 2018. https://doi.org/10.1145/3240765.3240771
Zheng, M., Ravi, V.T., Qin, F., Agrawal, G.: GRace: a low-overhead mechanism for detecting data races in GPU programs. SIGPLAN Not. 46(8), 135–146 (2011). https://doi.org/10.1145/2038037.1941574
Acknowledgements and Data Availibility Statement
We want to thank Jan Martens for his help with the card sorting.
The data used for the categorization with card sorting is available in the Figshare repository: https://doi.org/10.4121/12988781 [38].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
van den Haak, L.B., Wijs, A., van den Brand, M., Huisman, M. (2020). Formal Methods for GPGPU Programming: Is the Demand Met?. In: Dongol, B., Troubitsyna, E. (eds) Integrated Formal Methods. IFM 2020. Lecture Notes in Computer Science(), vol 12546. Springer, Cham. https://doi.org/10.1007/978-3-030-63461-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-63461-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63460-5
Online ISBN: 978-3-030-63461-2
eBook Packages: Computer ScienceComputer Science (R0)