Alpinist: An Annotation-Aware GPU Program Optimizer

Şakar, Ömer; Safari, Mohsen; Huisman, Marieke; Wijs, Anton

doi:10.1007/978-3-030-99527-0_18

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13244))

Included in the following conference series:

International Conference on Tools and Algorithms for the Construction and Analysis of Systems

3349 Accesses
6 Citations

Abstract

GPU programs are widely used in industry. To obtain the best performance, a typical development process involves the manual or semi-automatic application of optimizations prior to compiling the code. To avoid the introduction of errors, we can augment GPU programs with (pre- and postcondition-style) annotations to capture functional properties. However, keeping these annotations correct when optimizing GPU programs is labor-intensive and error-prone.

This paper introduces Alpinist, an annotation-aware GPU program optimizer. It applies frequently-used GPU optimizations, but besides transforming code, it also transforms the annotations. We evaluate Alpinist, in combination with the VerCors program verifier, to automatically optimize a collection of verified programs and reverify them.

This work is supported by NWO grant 639.023.710 for the Mercedes project and by NWO TTW grant 17249 for the ChEOPS project.

Download to read the full chapter text

Chapter PDF

Dataset Sensitive Autotuning of Multi-versioned Code Based on Monotonic Properties

Performance Exploration Through Optimistic Static Program Annotations

Polytypic Genetic Programming

Keywords

References

Allen, R., Kennedy, K.: Automatic translation of Fortran programs to vector form. ACM Transactions on Programming Languages and Systems (TOPLAS) 9(4), 491–542 (1987)
Google Scholar
Ashari, A., Tatikonda, S., Boehm, M., Reinwald, B., Campbell, K., Keenleyside, J., Sadayappan, P.: On optimizing machine learning workloads via kernel fusion. ACM SIGPLAN Notices 50(8), 173–182 (2015)
Google Scholar
Ashouri, A., Killian, W., Cavazos, J., Palermo, G., Silvano, C.: A Survey on Compiler Autotuning using Machine Learning. ACM Computing Surveys 51(5), 96:1–96:42 (2018)
Google Scholar
Ayers, G., Litz, H., Kozyrakis, C., Ranganathan, P.: Classifying memory access patterns for prefetching. In: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. pp. 513–526 (2020)
Google Scholar
Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. Tech. rep., Citeseer (2008)
Google Scholar
Berdine, J., Calcagno, C., O’Hearn, P.: Smallfoot: Modular Automatic Assertion Checking with Separation Logic. In: de Boer, F., Bonsangue, M., Graf, S., de Roever, W. (eds.) FMCO. LNCS, vol. 4111, pp. 115–137. Springer (2005)
Google Scholar
Bertolli, C., Betts, A., Mudalige, G., Giles, M., Kelly, P.: Design and Performance of the OP2 Library for Unstructured Mesh Applications. In: Proceedings of the 1st Workshop on Grids, Clouds and P2P Programming (CGWS). Lecture Notes in Computer Science, vol. 7155, pp. 191–200. Springer (2011). https://doi.org/10.1007/978-3-642-29737-3_22
Betts, A., Chong, N., Donaldson, A., Qadeer, S., Thomson, P.: GPUVerify: a verifier for GPU kernels. In: OOPSLA. pp. 113–132. ACM (2012)
Google Scholar
Blom, S., Darabi, S., Huisman, M., Oortwijn, W.: The VerCors Tool Set: Verification of Parallel and Concurrent Software. In: iFM. LNCS, vol. 10510, pp. 102 – 110. Springer (2017)
Google Scholar
Blom, S., Huisman, M., Mihelčić, M.: Specification and Verification of GPGPU programs. Science of Computer Programming 95, 376–388 (2014)
Google Scholar
Bornat, R., Calcagno, C., O’Hearn, P., Parkinson, M.: Permission accounting in separation logic. In: Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL). pp. 259–270 (2005)
Google Scholar
Boyland, J.: Checking Interference with Fractional Permissions. In: SAS. LNCS, vol. 2694, pp. 55–72. Springer (2003)
Google Scholar
Catanzaro, B., Keller, A., Garland, M.: A decomposition for in-place matrix transposition. ACM SIGPLAN Notices 49(8), 193–206 (2014)
Google Scholar
Collingbourne, P., Cadar, C., Kelly, P.H.: Symbolic testing of OpenCL code. In: Haifa Verification Conference. pp. 203–218. Springer (2011)
Google Scholar
Şakar, O., Safari, M., Huisman, M., Wijs, A.: The repository for the examples used in Alpinist, https://github.com/OmerSakar/Alpinist-Examples.git
Şakar, O., Safari, M., Huisman, M., Wijs, A.: The repository for the implementations of Alpinist, https://github.com/utwente-fmt/vercors/tree/gpgpu-optimizations/src/main/java/vct/col/rewrite/gpgpuoptimizations
DeFrancisco, R., Cho, S., Ferdman, M., Smolka, S.: Swarm Model Checking on the GPU. International Journal on Software Tools for Technology Transfer 22, 583–599 (2020). https://doi.org/10.1007/s10009-020-00576-x
Dross, C., Furia, C.A., Huisman, M., Monahan, R., Müller, P.: Verifythis 2019: a program verification competition. International Journal on Software Tools for Technology Transfer pp. 1–11 (2021)
Google Scholar
Filipovič, J., Madzin, M., Fousek, J., Matyska, L.: Optimizing CUDA code by kernel fusion: application on BLAS. The Journal of Supercomputing 71(10), 3934–3957 (2015)
Google Scholar
Gjomemo, R., Namjoshi, K.S., Phung, P.H., Venkatakrishnan, V., Zuck, L.D.: From verification to optimizations. In: International Workshop on Verification, Model Checking, and Abstract Interpretation. pp. 300–317. Springer (2015)
Google Scholar
Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a High-Level Language Targeted to GPU Codes. In: Proc. 2012 Innovative Parallel Computing (InPar). pp. 1–10. IEEE (2012). https://doi.org/10.1109/InPar.2012.6339595
van den Haak, L., Wijs, A., M.G.J. van den Brand, Huisman, M.: Formal Methods for GPGPU Programming: Is The Demand Met? In: Proceedings of the 16th International Conference on Integrated Formal Methods (IFM 2020). Lecture Notes in Computer Science, vol. 12546, pp. 160–177. Springer (2020). https://doi.org/10.1007/978-3-030-63461-2_9
Hamers, R., Jongmans, S.S.: Safe sessions of channel actions in Clojure: a tour of the discourje project. In: International Symposium on Leveraging Applications of Formal Methods. pp. 489–508. Springer (2020)
Google Scholar
Herrmann, F., Silberholz, J., Tiglio, M.: Black Hole Simulations with CUDA. In: GPU Computing Gems Emerald Edition, chap. 8, pp. 103–111. Morgan Kaufmann (2011)
Google Scholar
Hong, C., Sukumaran-Rajam, A., Nisa, I., Singh, K., Sadayappan, P.: Adaptive sparse tiling for sparse matrix multiplication. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. pp. 300–314 (2019)
Google Scholar
Huisman, M., Blom, S., Darabi, S., Safari, M.: Program correctness by transformation. In: 8th International Symposium On Leveraging Applications of Formal Methods, Verification and Validation (ISoLA). LNCS, vol. 11244. Springer (2018)
Google Scholar
Huisman, M., Joosten, S.: A solution to VerifyThis 2019 challenge 1, https://github.com/utwente-fmt/vercors/blob/97c49d6dc1097ded47a5ed53143695ace6904865/examples/verifythis/2019/challenge1.pvl
Konstantinidis, A., Kelly, P.H., Ramanujam, J., Sadayappan, P.: Parametric GPU code generation for affine loop programs. In: International Workshop on Languages and Compilers for Parallel Computing. pp. 136–151. Springer (2013)
Google Scholar
Le, Q., Ngiam, J., Coates, A., Lahiri, A., Prochnow, B., Ng, A.: On Optimization Methods for Deep Learning. In: Proceedings of the 28th International Conference on Machine Learning (ICML). pp. 265–272. Omnipress (2011)
Google Scholar
Leroy, X.: Formal certification of a compiler back-end or: programming a compiler with a proof assistant. In: Conference record of the 33rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. pp. 42–54 (2006)
Google Scholar
Leroy, X.: A formally verified compiler back-end. Journal of Automated Reasoning 43(4), 363–446 (2009)
Google Scholar
Li, G., Gopalakrishnan, G.: Scalable SMT-based verification of GPU kernel functions. In: SIGSOFT FSE 2010, Santa Fe, NM, USA. pp. 187–196. ACM (2010)
Google Scholar
Li, G., Li, P., Sawaya, G., Gopalakrishnan, G., Ghosh, I., Rajan, S.P.: GKLEE: concolic verification and test generation for GPUs. In: ACM SIGPLAN Notices. vol. 47, pp. 215–224. ACM (2012)
Google Scholar
Lindholm, L., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro 28(2), 39–55 (2008). https://doi.org/10.1109/MM.2008.31
Liu, X., Tan, S., Wang, H.: Parallel Statistical Analysis of Analog Circuits by GPU-Accelerated Graph-Based Approach. In: Proceedings of the 2012 Conference and Exhibition on Design, Automation & Test in Europe (DATE). pp. 852–857. IEEE Computer Society (2012). https://doi.org/10.1109/DATE.2012.6176615
de Moura, L.M., Bjørner, N.: Z3: An efficient SMT solver. In: Ramakrishnan, C., Rehof, J. (eds.) TACAS. LNCS, vol. 4963, pp. 337–340. Springer (2008)
Google Scholar
Müller, P., Schwerhoff, M., Summers, A.: Viper - a verification infrastructure for permission-based reasoning. In: VMCAI (2016)
Google Scholar
Murthy, G.S., Ravishankar, M., Baskaran, M.M., Sadayappan, P.: Optimal loop unrolling for GPGPU programs. In: 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). pp. 1–11. IEEE (2010)
Google Scholar
Namjoshi, K.S., Pavlinovic, Z.: The impact of program transformations on static program analysis. In: International Static Analysis Symposium. pp. 306–325. Springer (2018)
Google Scholar
Namjoshi, K.S., Singhania, N.: Loopy: Programmable and formally verified loop transformations. In: International Static Analysis Symposium. pp. 383–402. Springer (2016)
Google Scholar
Namjoshi, K.S., Xue, A.: A Self-certifying Compilation Framework for WebAssembly. In: International Conference on Verification, Model Checking, and Abstract Interpretation. pp. 127–148. Springer (2021)
Google Scholar
The OpenCL 1.2 specification (2011)
Google Scholar
Osama, M., Wijs, A.: Parallel SAT Simplification on GPU Architectures. In: TACAS, Part I. LNCS, vol. 11427, pp. 21–40. Springer (2019)
Google Scholar
Osama, M., Wijs, A., Biere, A.: SAT Solving with GPU Accelerated Inprocessing. In: Proceedings of the 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), Part I. Lecture Notes in Computer Science, vol. 12651, pp. 133–151. Springer (2021). https://doi.org/10.1007/978-3-030-72016-2_8
de Putter, S., Wijs, A.: Verifying a verifier: on the formal correctness of an LTS transformation verification technique. In: International Conference on Fundamental Approaches to Software Engineering. pp. 383–400. Springer (2016)
Google Scholar
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. Acm Sigplan Notices 48(6), 519–530 (2013)
Google Scholar
Rocha, R.C., Pereira, A.D., Ramos, L., Góes, L.F.: Toast: Automatic tiling for iterative stencil computations on GPUs. Concurrency and Computation: Practice and Experience 29(8), e4053 (2017)
Google Scholar
Safari, M., Huisman, M.: Formal verification of parallel stream compaction and summed-area table algorithms. In: International Colloquium on Theoretical Aspects of Computing. pp. 181–199. Springer (2020)
Google Scholar
Safari, M., Huisman, M.: A generic approach to the verification of the permutation property of sequential and parallel swap-based sorting algorithms. In: International Conference on Integrated Formal Methods. pp. 257–275. Springer (2020)
Google Scholar
Safari, M., Oortwijn, W., Huisman, M.: Automated verification of the parallel Bellman–Ford algorithm. In: Drăgoi, C., Mukherjee, S., Namjoshi, K. (eds.) Static Analysis. pp. 346–358. Springer International Publishing, Cham (2021)
Google Scholar
Safari, M., Oortwijn, W., Joosten, S., Huisman, M.: Formal verification of parallel prefix sum. In: NASA Formal Methods Symposium. pp. 170–186. Springer (2020)
Google Scholar
Şakar, O.: Extending support for axiomatic data types in vercors (April 2020), http://essay.utwente.nl/80892/
Shimobaba, T., Ito, T., Masuda, N., Ichihashi, Y., Takada, N.: Fast calculation of computer-generated-hologram on AMD HD5000 series GPU and OpenCL. Optics express 18(10), 9955–9960 (2010)
Google Scholar
Sundfeld, D., Havgaard, J.H., Gorodkin, J., De Melo, A.C.: CUDA-Sankoff: using GPU to accelerate the pairwise structural RNA alignment. In: 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP). pp. 295–302. IEEE (2017)
Google Scholar
The CUDA team: Documentation of the CUDA unroll pragma (Accessed Oct 6, 2021), https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#pragma-unroll
The Halide team: Documentation of the Halide unroll function (Accessed Oct 6, 2021), https://halide-lang.org/docs/class_halide_1_1_func.html#a05935caceb6efb8badd85f306dd33034
The verification of tictactoe program, https://github.com/utwente-fmt/vercors/blob/0a2fdc24419466c2d3b7a853a2908c37e7a8daa7/examples/session-generate/MatrixGrid.pvl
Unkule, S., Shaltz, C., Qasem, A.: Automatic restructuring of GPU kernels for exploiting inter-thread data locality. In: International Conference on Compiler Construction. pp. 21–40. Springer (2012)
Google Scholar
Van Werkhoven, B., Maassen, J., Bal, H.E., Seinstra, F.J.: Optimizing convolution operations on GPUs using adaptive tiling. Future Generation Computer Systems 30, 14–26 (2014)
Google Scholar
Viper project website: (2016), http://www.pm.inf.ethz.ch/research/viper
Wahib, M., Maruyama, N.: Scalable kernel fusion for memory-bound GPU applications. In: SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. pp. 191–202. IEEE (2014)
Google Scholar
Wang, G., Lin, Y., Yi, W.: Kernel fusion: An effective method for better power efficiency on multithreaded GPU. In: 2010 IEEE/ACM Int’l Conference on Green Computing and Communications & Int’l Conference on Cyber, Physical and Social Computing. pp. 344–350. IEEE (2010)
Google Scholar
Werkhoven, B.v.: Kernel Tuner: A search-optimizing GPU code auto-tuner. Future Generation Computer Systems 90, 347–358 (2019)
Google Scholar
Wienke, S., Springer, P., Terboven, C., Mey, D.: OpenACC - First Experiences with Real-World Applications. In: Proceedings of the 18th European Conference on Parallel and Distributed Computing (EuroPar). Lecture Notes in Computer Science, vol. 7484, pp. 859–870. Springer (2012). https://doi.org/10.1007/978-3-642-32820-6_85
Wijs, A.: BFS-Based Model Checking of Linear-Time Properties With An Application on GPUs. In: CAV, Part II. LNCS, vol. 9780, pp. 472–493. Springer (2016)
Google Scholar
Wijs, A., Engelen, L.: REFINER: Towards Formal Verification of Model Transformations. In: NFM. LNCS, vol. 8430, pp. 258–263. Springer (2014)
Google Scholar
Wijs, A., Neele, T., Bošnački, D.: GPUexplore 2.0: Unleashing GPU Explicit-State Model Checking. In: Proceedings of the 21st International Symposium on Formal Methods. Lecture Notes in Computer Science, vol. 9995, pp. 694–701. Springer (2016). https://doi.org/10.1007/978-3-319-48989-6_42
Wu, H., Diamos, G., Wang, J., Cadambi, S., Yalamanchili, S., Chakradhar, S.: Optimizing data warehousing applications for GPUs using kernel fusion/fission. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. pp. 2433–2442. IEEE (2012)
Google Scholar
Xu, C., Kirk, S.R., Jenkins, S.: Tiling for performance tuning on different models of GPUs. In: 2009 Second International Symposium on Information Science and Engineering. pp. 500–504. IEEE (2009)
Google Scholar
Yang, Y., Xiang, P., Kong, J., Zhou, H.: A GPGPU compiler for memory optimization and parallelism management. ACM Sigplan Notices 45(6), 86–97 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Formal Methods and Tools, University of Twente, Enschede, The Netherlands
Ömer Şakar, Mohsen Safari & Marieke Huisman
Software Engineering and Technology, Eindhoven University of Technology, Eindhoven, The Netherlands
Anton Wijs

Authors

Ömer Şakar
View author publications
You can also search for this author in PubMed Google Scholar
Mohsen Safari
View author publications
You can also search for this author in PubMed Google Scholar
Marieke Huisman
View author publications
You can also search for this author in PubMed Google Scholar
Anton Wijs
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ömer Şakar .

Editor information

Editors and Affiliations

Ben-Gurion University of the Negev, Be’er Sheva, Israel
Dana Fisman
University of Illinois Urbana-Champaign, Urbana, IL, USA
Grigore Rosu

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Şakar, Ö., Safari, M., Huisman, M., Wijs, A. (2022). Alpinist: An Annotation-Aware GPU Program Optimizer. In: Fisman, D., Rosu, G. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2022. Lecture Notes in Computer Science, vol 13244. Springer, Cham. https://doi.org/10.1007/978-3-030-99527-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-99527-0_18
Published: 30 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99526-3
Online ISBN: 978-3-030-99527-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The European Joint Conferences on Theory and Practice of Software. (opens in a new tab)

Alpinist: An Annotation-Aware GPU Program Optimizer

Abstract

Chapter PDF

Similar content being viewed by others

Dataset Sensitive Autotuning of Multi-versioned Code Based on Monotonic Properties

Performance Exploration Through Optimistic Static Program Annotations

Polytypic Genetic Programming

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Alpinist: An Annotation-Aware GPU Program Optimizer

Abstract

Chapter PDF

Similar content being viewed by others

Dataset Sensitive Autotuning of Multi-versioned Code Based on Monotonic Properties

Performance Exploration Through Optimistic Static Program Annotations

Polytypic Genetic Programming

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation