Automatic C-to-CUDA Code Generation for Affine Programs

Baskaran, Muthu Manikandan; Ramanujam, J.; Sadayappan, P.

doi:10.1007/978-3-642-11970-5_14

Muthu Manikandan Baskaran¹⁷,
J. Ramanujam¹⁸ &
P. Sadayappan¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6011))

Included in the following conference series:

International Conference on Compiler Construction

3279 Accesses
127 Citations

Abstract

Graphics Processing Units (GPUs) offer tremendous computational power. CUDA (Compute Unified Device Architecture) provides a multi-threaded parallel programming model, facilitating high performance implementations of general-purpose computations. However, the explicitly managed memory hierarchy and multi-level parallel view make manual development of high-performance CUDA code rather complicated. Hence the automatic transformation of sequential input programs into efficient parallel CUDA programs is of considerable interest.

This paper describes an automatic code transformation system that generates parallel CUDA code from input sequential C code, for regular (affine) programs. Using and adapting publicly available tools that have made polyhedral compiler optimization practically effective, we develop a C-to-CUDA transformation system that generates two-level parallel CUDA code that is optimized for efficient data access. The performance of automatically generated code is compared with manually optimized CUDA code for a number of benchmarks. The performance of the automatically generated CUDA code is quite close to hand-optimized CUDA code and considerably better than the benchmarks’ performance on a multicore CPU.

Download to read the full chapter text

Chapter PDF

Directive-Based Compilers for GPUs

A Source-to-Source OpenACC Compiler for CUDA

Using the SkelCL Library for High-Level GPU Programming of 2D Applications

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Ancourt, C., Irigoin, F.: Scanning polyhedra with do loops. In: PPoPP 1991, pp. 39–50 (1991)
Google Scholar
Baskaran, M., Bondhugula, U., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs. In: ACM ICS (June 2008)
Google Scholar
Baskaran, M., Bondhugula, U., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories. In: ACM SIGPLAN PPoPP (February 2008)
Google Scholar
Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: PACT 2004, pp. 7–16 (2004)
Google Scholar
Bondhugula, U., Baskaran, M., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In: Hendren, L. (ed.) CC 2008. LNCS, vol. 4959, pp. 132–146. Springer, Heidelberg (2008)
Chapter Google Scholar
Bondhugula, U., Hartono, A., Ramanujan, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: ACM SIGPLAN Programming Languages Design and Implementation, PLDI 2008 (2008)
Google Scholar
CLooG: The Chunky Loop Generator, http://www.cloog.org
Fatahalian, K., Sugerman, J., Hanrahan, P.: Understanding the efficiency of GPU algorithms for matrix-matrix multiplication. In: ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, pp. 133–137 (2004)
Google Scholar
Feautrier, P.: Dataflow analysis of array and scalar references. IJPP 20(1), 23–53 (1991)
MATH Google Scholar
Feautrier, P.: Some efficient solutions to the affine scheduling problem, part I: one-dimensional time. IJPP 21(5), 313–348 (1992)
MATH MathSciNet Google Scholar
Feautrier, P.: Automatic parallelization in the polytope model. In: Perrin, G.-R., Darte, A. (eds.) The Data Parallel Programming Model. LNCS, vol. 1132, pp. 79–103. Springer, Heidelberg (1996)
Google Scholar
Govindaraju, N.K., Larsen, S., Gray, J., Manocha, D.: A memory model for scientific algorithms on graphics processors. In: Löwe, W., Südholt, M. (eds.) SC 2006. LNCS, vol. 4089. Springer, Heidelberg (2006)
Google Scholar
General-Purpose Computation Using Graphics Hardware, http://www.gpgpu.org/
Griebl, M.: Automatic Parallelization of Loop Programs for Distributed Memory Architectures. Habilitation Thesis. FMI, University of Passau (2004)
Google Scholar
Irigoin, F., Triolet, R.: Supernode partitioning. In: Proceedings of POPL 1988, pp. 319–329 (1988)
Google Scholar
Nyland, L., Harris, M., Prins, J.F.: Fast N-body Simulation with CUDA. GPU Gems 3 article (August 2007)
Google Scholar
Lee, S., Min, S.-J., Eigenmann, R.: Openmp to gpgpu: A compiler framework for automatic translation and optimization. In: PPoPP 2009, pp. 101–110 (2009)
Google Scholar
Lim, A.: Improving Parallelism And Data Locality With Affine Partitioning. PhD thesis, Stanford University (August 2001)
Google Scholar
Liu, Y., Zhang, E.Z., Shen, X.: A cross-input adaptive framework for gpu programs optimizations. In: IPDPS (May 2009)
Google Scholar
NVIDIA CUDA, http://developer.nvidia.com/object/cuda.html
Parboil Benchmark Suite, http://impact.crhc.illinois.edu/parboil.php
Pluto: A polyhedral automatic parallelizer and locality optimizer for multicores http://pluto-compiler.sourceforge.net
Pouchet, L.-N., Bastoul, C., Cohen, A., Vasilache, N.: Iterative optimization in the polyhedral model: Part I, one-dimensional time. In: CGO 2007, pp. 144–156 (2007)
Google Scholar
Pugh, W.: The Omega test: a fast and practical integer programming algorithm for dependence analysis. Communications of the ACM 8, 102–114 (1992)
Article Google Scholar
Quilleré, F., Rajopadhye, S.V., Wilde, D.: Generation of efficient nested loops from polyhedra. IJPP 28(5), 469–498 (2000)
Google Scholar
Ryoo, S., Rodrigues, C., Baghsorkhi, S., Stone, S., Kirk, D., Hwu, W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: ACM SIGPLAN PPoPP 2008 (February 2008)
Google Scholar
Ryoo, S., Rodrigues, C., Stone, S., Baghsorkhi, S., Ueng, S., Hwu, W.: Program optimization study on a 128-core GPU. In: The First Workshop on General Purpose Processing on Graphics Processing Units (October 2007)
Google Scholar
Ryoo, S., Rodrigues, C., Stone, S., Baghsorkhi, S., Ueng, S., Stratton, J., Hwu, W.: Program optimization space pruning for a multithreaded GPU. In: CGO (2008)
Google Scholar
Vasilache, N., Bastoul, C., Girbal, S., Cohen, A.: Violated dependence analysis. In: ACM ICS (June 2006)
Google Scholar

Download references

Author information

Authors and Affiliations

The Ohio State University, USA
Muthu Manikandan Baskaran & P. Sadayappan
Louisiana State University, USA
J. Ramanujam

Authors

Muthu Manikandan Baskaran
View author publications
You can also search for this author in PubMed Google Scholar
J. Ramanujam
View author publications
You can also search for this author in PubMed Google Scholar
P. Sadayappan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of California Riverside, CA 92521, Riverside, USA
Rajiv Gupta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baskaran, M.M., Ramanujam, J., Sadayappan, P. (2010). Automatic C-to-CUDA Code Generation for Affine Programs. In: Gupta, R. (eds) Compiler Construction. CC 2010. Lecture Notes in Computer Science, vol 6011. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11970-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-11970-5_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11969-9
Online ISBN: 978-3-642-11970-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic C-to-CUDA Code Generation for Affine Programs

Abstract

Chapter PDF

Similar content being viewed by others

Directive-Based Compilers for GPUs

A Source-to-Source OpenACC Compiler for CUDA

Using the SkelCL Library for High-Level GPU Programming of 2D Applications

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Automatic C-to-CUDA Code Generation for Affine Programs

Abstract

Chapter PDF

Similar content being viewed by others

Directive-Based Compilers for GPUs

A Source-to-Source OpenACC Compiler for CUDA

Using the SkelCL Library for High-Level GPU Programming of 2D Applications

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation