Formal Methods for GPGPU Programming: Is the Demand Met?

van den Haak, Lars B.; Wijs, Anton; van den Brand, Mark; Huisman, Marieke

doi:10.1007/978-3-030-63461-2_9

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12546))

Included in the following conference series:

International Conference on Integrated Formal Methods

534 Accesses
4 Citations

Abstract

Over the years, researchers have developed many formal method tools to support software development. However, hardly any studies are conducted to determine whether the actual problems developers encounter are sufficiently addressed. For the relatively young field of GPU programming, we would like to know whether the tools developed so far are sufficient, or whether some problems still need attention. To this end, we first look at what kind of problems programmers encounter in OpenCL and CUDA. We gather problems from Stack Overflow and categorise them with card sorting. We find that problems related to memory, synchronisation of threads, threads in general and performance are essential topics. Next, we look at (verification) tools in industry and research, to see how these tools addressed the problems we discovered. We think many problems are already properly addressed, but there is still a need for easy to use sound tools. Alternatively, languages or programming styles can be created, that allows for easier checking for soundness .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Although this is not exactly true any more for Nvidia’s Volta architecture and onward. See https://developer.nvidia.com/blog/inside-volta/.
2.
https://data.stackexchange.com/stackoverflow/query/1258739/gpgpu-tags.
3.
https://data.stackexchange.com/stackoverflow/query/1258838/gpgpu.
4.
https://www.iwocl.org/resources/opencl-libraries-and-toolkits/.
5.
https://academic.microsoft.com/home.
6.
https://developer.nvidia.com/blog/cuda-pro-tip-write-flexible-kernels-grid-stride-loops/.

References

CUDA-MEMCHECK, June 2020. https://docs.nvidia.com/cuda/cuda-memcheck
CUDA Programming Guide, July 2020. http://docs.nvidia.com/cuda/cuda-c-programming-guide/
Parallel Thread Execution ISA Version 7.0, July 2020. http://docs.nvidia.com/cuda/parallel-thread-execution/index.html
SPIR - The Industry Open Standard Intermediate Language for Parallel Compute and Graphics, July 2020. https://www.khronos.org/spir/
Ahmed, S., Bagherzadeh, M.: What do concurrency developers ask about? a large-scale study using stack overflow. In: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018, pp. 1–10. Association for Computing Machinery, Oulu, October 2018. https://doi.org/10.1145/3239235.3239524
Alglave, J., et al.: GPU concurrency: weak behaviours and programming assumptions. In: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS 2015, pp. 577–591. ACM Press, Istanbul (2015). https://doi.org/10.1145/2694344.2694391
Alur, R., Devietti, J., Singhania, N.: Block-size independence for GPU programs. In: Podelski, A. (ed.) SAS 2018. LNCS, vol. 11002, pp. 107–126. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99725-4_9
Chapter Google Scholar
Atzeni, S., et al.: ARCHER: effectively spotting data races in large OpenMP applications. In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 53–62. IEEE, Chicago, May 2016. https://doi.org/10.1109/IPDPS.2016.68
Azad, H.S.: Advances in GPU Research and Practice, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (2016)
Google Scholar
Banerjee, K., Banerjee, S., Sarkar, S.: Data-race detection: the missing piece for an end-to-end semantic equivalence checker for parallelizing transformations of array-intensive programs. In: Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ARRAY 2016, pp. 1–8. Association for Computing Machinery, Santa Barbara, June 2016. https://doi.org/10.1145/2935323.2935324
Betts, A., Chong, N., Donaldson, A., Qadeer, S., Thomson, P.: GPUVerify: a verifier for GPU kernels. In: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA 2012, pp. 113–132. ACM, New York (2012). https://doi.org/10.1145/2384616.2384625
Blom, S., Huisman, M.: The VerCors tool for verification of concurrent programs. In: Jones, C., Pihlajasaari, P., Sun, J. (eds.) FM 2014. LNCS, vol. 8442, pp. 127–131. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06410-9_9
Chapter Google Scholar
Blom, S., Huisman, M., Mihelčić, M.: Specification and verification of GPGPU programs. Sci. Comput. Program. 95, 376–388 (2014). https://doi.org/10.1016/j.scico.2014.03.013
Article Google Scholar
Collingbourne, P., Cadar, C., Kelly, P.H.J.: Symbolic testing of OpenCL code. In: Eder, K., Lourenço, J., Shehory, O. (eds.) HVC 2011. LNCS, vol. 7261, pp. 203–218. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34188-5_18
Chapter Google Scholar
Donaldson, A.F., Ketema, J., Sorensen, T., Wickerson, J.: Forward progress on GPU concurrency (invited talk). In: Meyer, R., Nestmann, U. (eds.) 28th International Conference on Concurrency Theory (CONCUR 2017). Leibniz International Proceedings in Informatics (LIPIcs), vol. 85, pp. 1:1–1:13. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl (2017). https://doi.org/10.4230/LIPIcs.CONCUR.2017.1
Eizenberg, A., Peng, Y., Pigli, T., Mansky, W., Devietti, J.: BARRACUDA: binary-level analysis of runtime RAces in CUDA programs. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, pp. 126–140. Association for Computing Machinery, Barcelona, June 2017. https://doi.org/10.1145/3062341.3062342
Holey, A., Mekkat, V., Zhai, A.: HAccRG: hardware-accelerated data race detection in GPUs. In: 2013 42nd International Conference on Parallel Processing, pp. 60–69. IEEE, Lyon, October 2013. https://doi.org/10.1109/ICPP.2013.15
Islam, M.J., Nguyen, H.A., Pan, R., Rajan, H.: What do developers ask about ML libraries? A large-scale study using stack overflow. ArXiv: 1906.11940 (Cs), June 2019
Kamil, S., Cheung, A., Itzhaky, S., Solar-Lezama, A.: Verified lifting of stencil computations. In: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, pp. 711–726. Association for Computing Machinery, Santa Barbara, June 2016. https://doi.org/10.1145/2908080.2908117
Kojima, K., Igarashi, A.: A hoare logic for GPU kernels. ACM Trans. Comput. Log. 18(1), 3:1–3:43 (2017). https://doi.org/10.1145/3001834
Article MathSciNet MATH Google Scholar
Kojima, K., Imanishi, A., Igarashi, A.: Automated verification of functional correctness of race-free GPU programs. J. Autom. Reason. 60(3), 279–298 (2018). https://doi.org/10.1007/s10817-017-9428-2
Article MathSciNet MATH Google Scholar
Leung, A., Gupta, M., Agarwal, Y., Gupta, R., Jhala, R., Lerner, S.: Verifying GPU kernels by test amplification. SIGPLAN Not. 47(6), 383–394 (2012). https://doi.org/10.1145/2345156.2254110
Article Google Scholar
Li, G., Gopalakrishnan, G.: Scalable SMT-based verification of GPU kernel functions. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2010, p. 187. ACM Press, Santa Fe (2010). https://doi.org/10.1145/1882291.1882320
Li, G., Li, P., Sawaya, G., Gopalakrishnan, G., Ghosh, I., Rajan, S.P.: GKLEE: concolic verification and test generation for GPUs. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2012, pp. 215–224. ACM, New York (2012). https://doi.org/10.1145/2145816.2145844
Li, P., Li, G., Gopalakrishnan, G.: Practical symbolic race checking of GPU programs. In: SC 2014: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 179–190, November 2014. https://doi.org/10.1109/SC.2014.20
Li, P., et al.: LD: low-overhead GPU race detection without access monitoring. ACM Trans. Archit. Code Optim. 14(1), 1–25 (2017). https://doi.org/10.1145/3046678
Article Google Scholar
Menzies, T., Williams, L., Zimmermann, T.: Perspectives on Data Science for Software Engineering, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (2016)
Google Scholar
Monteiro, F.R., et al.: ESBMC-GPU a context-bounded model checking tool to verify CUDA programs. Sci. Comput. Program. 152, 63–69 (2018). https://doi.org/10.1016/j.scico.2017.09.005
Article Google Scholar
Peng, Y., Grover, V., Devietti, J.: CURD: a dynamic CUDA race detector. In: PLDI 2018, pp. 390–403. Association for Computing Machinery, Philadelphia, June 2018. https://doi.org/10.1145/3192366.3192368
Pinto, G., Torres, W., Castor, F.: A study on the most popular questions about concurrent programming. In: Proceedings of the 6th Workshop on Evaluation and Usability of Programming Languages and Tools - PLATEAU 2015, pp. 39–46. ACM Press, Pittsburgh (2015). https://doi.org/10.1145/2846680.2846687
Price, J., McIntosh-Smith, S.: Oclgrind: an extensible OpenCL device simulator. In: Proceedings of the 3rd International Workshop on OpenCL - IWOCL 2015, pp. 1–7. ACM Press, Palo Alto (2015). https://doi.org/10.1145/2791321.2791333
Rosen, C., Shihab, E.: What are mobile developers asking about? A large scale study using stack overflow. Empir. Softw. Eng. 21(3), 1192–1223 (2016). https://doi.org/10.1007/s10664-015-9379-3
Article Google Scholar
Safari, M., Oortwijn, W., Joosten, S., Huisman, M.: Formal verification of parallel prefix sum. In: Lee, R., Jha, S., Mavridou, A. (eds.) NFM 2020. LNCS, vol. 12229, pp. 170–186. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55754-6_10
Chapter Google Scholar
Schmitz, A., Protze, J., Yu, L., Schwitanski, S., Müller, M.S.: DataRaceOnAccelerator – a micro-benchmark suite for evaluating correctness tools targeting accelerators. In: Schwardmann, U., et al. (eds.) Euro-Par 2019. LNCS, vol. 11997, pp. 245–257. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-48340-1_19
Chapter Google Scholar
Sharma, R., Bauer, M., Aiken, A.: Verification of producer-consumer synchronization in GPU programs. In: PLDI 2015, pp. 88–98. Association for Computing Machinery, Portland, June 2015. https://doi.org/10.1145/2737924.2737962
Siegel, S.F., et al.: CIVL: the concurrency intermediate verification language. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC 2015, pp. 1–12. ACM Press, Austin (2015). https://doi.org/10.1145/2807591.2807635
Sorensen, T., Donaldson, A.F., Batty, M., Gopalakrishnan, G., Rakamaric, Z.: Portable inter-workgroup barrier synchronisation for GPUs. In: OOPSLA 2016, p. 20 (2016). https://doi.org/10.1145/3022671.2984032
van den Haak, L.B., Wijs, A., van den Brand, M., Huisman, M.: Card sorting data for Formal methods for GPGPU programming: is the demand met?, September 2020. https://doi.org/10.4121/12988781
Wu, M., Zhou, H., Zhang, L., Liu, C., Zhang, Y.: Characterizing and detecting CUDA program bugs. ArXiv: 1905.01833 (Cs), May 2019
Xing, Y., Huang, B.Y., Gupta, A., Malik, S.: A formal instruction-level GPU model for scalable verification. In: 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8, November 2018. https://doi.org/10.1145/3240765.3240771
Zheng, M., Ravi, V.T., Qin, F., Agrawal, G.: GRace: a low-overhead mechanism for detecting data races in GPU programs. SIGPLAN Not. 46(8), 135–146 (2011). https://doi.org/10.1145/2038037.1941574
Article Google Scholar

Download references

Acknowledgements and Data Availibility Statement

We want to thank Jan Martens for his help with the card sorting.

The data used for the categorization with card sorting is available in the Figshare repository: https://doi.org/10.4121/12988781 [38].

Author information

Authors and Affiliations

Eindhoven University of Technology, Eindhoven, The Netherlands
Lars B. van den Haak, Anton Wijs & Mark van den Brand
University of Twente, Enschede, The Netherlands
Marieke Huisman

Authors

Lars B. van den Haak
View author publications
You can also search for this author in PubMed Google Scholar
Anton Wijs
View author publications
You can also search for this author in PubMed Google Scholar
Mark van den Brand
View author publications
You can also search for this author in PubMed Google Scholar
Marieke Huisman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lars B. van den Haak .

Editor information

Editors and Affiliations

University of Surrey, Guildford, UK
Brijesh Dongol
Royal Institute of Technology - KTH, Stockholm, Sweden
Elena Troubitsyna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van den Haak, L.B., Wijs, A., van den Brand, M., Huisman, M. (2020). Formal Methods for GPGPU Programming: Is the Demand Met?. In: Dongol, B., Troubitsyna, E. (eds) Integrated Formal Methods. IFM 2020. Lecture Notes in Computer Science(), vol 12546. Springer, Cham. https://doi.org/10.1007/978-3-030-63461-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-63461-2_9
Published: 13 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63460-5
Online ISBN: 978-3-030-63461-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics