Directive-Based Compilers for GPUs

Ghike, Swapnil; Gran, Rubén; Garzarán, María J.; Padua, David

doi:10.1007/978-3-319-17473-0_2

Swapnil Ghike¹⁵,
Rubén Gran¹⁶,
María J. Garzarán¹⁵ &
…
David Padua¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8967))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

890 Accesses
1 Citations

Abstract

General Purpose Graphics Computing Units can be effectively used for enhancing the performance of many contemporary scientific applications. However, programming GPUs using machine-specific notations like CUDA or OpenCL can be complex and time consuming. In addition, the resulting programs are typically fine-tuned for a particular target device. A promising alternative is to program in a conventional and machine-independent notation extended with directives and use compilers to generate GPU code automatically. These compilers enable portability and increase programmer productivity and, if effective, would not impose much penalty on performance.

This paper evaluates two such compilers, PGI and Cray. We first identify a collection of standard transformations that these compilers can apply. Then, we propose a sequence of manual transformations that programmers can apply to enable the generation of efficient GPU kernels. Lastly, using the Rodinia Benchmark suite, we compare the performance of the code generated by the PGI and Cray compilers with that of code written in CUDA. Our evaluation shows that the code produced by the PGI and Cray compilers can perform well. For 6 of the 15 benchmarks that we evaluated, the compiler generated code achieved over 85 % of the performance of a hand-tuned CUDA version.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In the PGI version of CFD Solver, we also had to separate the individual float values included in a structure, but this was most probably due to a bug.

References

Amini, M., Coelho, F., Irigoin, F., Keryell, R.: Static compilation analysis for host-accelerator communication optimization. In: Rajopadhye, S., Mills Strout, M. (eds.) LCPC 2011. LNCS, vol. 7146, pp. 237–251. Springer, Heidelberg (2013)
Chapter Google Scholar
Bordawekar, R., Bondhugula, U., Rao, R.: Can CPUs match GPUs on performance with productivity?: Experiences with optimizing a flop-intensive application on CPUs and GPU. Technical report RC25033, IBM, August 2010
Google Scholar
Boyer, M., Tarjan, D., Acton, S.T., Skadron, K.: Accelerating leukocyte tracking using cuda: a case study in leveraging manycore coprocessors. In: Proceedings of IPDPS, pp. 1–12 (2009)
Google Scholar
CAPS Enterprise: HMPP workbench (2011). http://www.caps-entreprise.com/technology/hmpp/
CAPS Enterprise and Cray Inc. and NVIDIA and the Portland Group: The openacc application programming interface, v1.0, November 2011. http://www.openacc-standard.org/
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IISWC 2009. pp. 44–54, October 2009
Google Scholar
Che, S., et al.: A characterization of the rodinia benchmark suite with comparison to contemporary CMP workloads. In: Proceedings of IISWC, pp. 1–11 (2010)
Google Scholar
Cloutier, B., Muite, B.K., Rigge, P.: A comparison of CPU and GPU performance for fourier pseudospectral simulations of the navier-stokes, cubic nonlinear schrodinger and sine gordon equations. ArXiv e-prints, June 2012
Google Scholar
CRAY: Cray Compiler Environment (2011). http://docs.cray.com/books/S-2179-52/html-S-2179-52/index.html
Grauer Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. In: Proceedings of InPar, pp. 1–10 (2012)
Google Scholar
Hacker, H., Trinitis, C., Weidendorfer, J., Brehm, M.: Considering GPGPU for HPC Centers: Is It Worth the effort? In: Keller, R., Kramer, D., Weiss, J.-P. (eds.) Facing the Multicore-Challenge. LNCS, vol. 6310, pp. 118–130. Springer, Heidelberg (2010)
Chapter Google Scholar
Han, T., Abdelrahman, T.: HiCUDA: high-level GPGPU programming. IEEE Trans. Parallel Distrib. Syst. 22(1), 78–90 (2011)
Article Google Scholar
Henderson, T., Middlecoff, J., Rosinski, J., Govett, M., Madden, P.: Experience applying fortran GPU compilers to numerical weather prediction. In: Proceedings of SAAHPC, pp. 34–41, July 2011
Google Scholar
Hernandez, O., Ding, W., Chapman, B., Kartsaklis, C., Sankaran, R., Graham, R.: Experiences with high-level programming directives for porting applications to GPUs. In: Keller, R., Kramer, D., Weiss, J.-P. (eds.) Facing the Multicore - Challenge II. LNCS, vol. 7174, pp. 96–107. Springer, Heidelberg (2012)
Chapter Google Scholar
Enos, J., et al.: Quantifying the impact of GPUs on performance and energy efficiency in HPC clusters. In: Internatioanl Green Computing Conference, pp. 317–324, August 2010
Google Scholar
Jablin, T.B., et al.: Automatic CPU-GPU communication management and optimization. SIGPLAN Not. 47(6), 142–151 (2011)
Article Google Scholar
Jin, H., Kellogg, M., Mehrotra, P.: Using compiler directives for accelerating CFD applications on GPUs. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 154–168. Springer, Heidelberg (2012)
Chapter Google Scholar
Kennedy, K., Allen, J.R.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc., San Francisco (2002)
Google Scholar
Khronos Group: Opencl - the open standard for parallel programming of heterogeneous systems (2011). http://www.khronos.org/opencl
Lee, S., Eigenmann, R.: OpenMPC: extended OpenMP programming and tuning for GPUs. In: Proceedings of SC 2010 (2010)
Google Scholar
Lee, S., Min, S.J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In: Proceedings of PPoPP 2009 (2010)
Google Scholar
Lee, S., Vetter, J.S.: Early evaluation of directive-based GPU programming models for productive exascale computing. In: Proceedings of SC2012. IEEE Press, Salt Lake City (2012)
Google Scholar
Leung, A., Vasilache, N., Meister, B., Baskaran, M., Wohlford, D., Bastoul, C., Lethin, R.: A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction. In: Proceedings of GPGPU (2010)
Google Scholar
Membarth, R., Hannig, F., Teich, J., Korner, M., Eckert, W.: Frameworks for GPU accelerators: a comprehensive evaluation using 2D/3D image registration. In: Proceedings of SASP, pp. 78–81, June 2011
Google Scholar
NVIDIA: Compute Command Line Profiler. NVIDIA Whitepaper
Google Scholar
NVIDIA: The Benefits of Multiple CPU Cores in Mobile Devices. NVIDIA Whitepaper. http://www.nvidia.com/content/PDF/tegra_white_papers/Benefits-of-Multi-core-CPUs-in-Mobile-Devices_Ver1.2.pdf
NVIDIA: Bring high-end graphics to handheld devices. NVIDIA White Paper (2011). http://www.nvidia.com/content/PDF/tegra_white_papers/Bringing_High-End_Graphics_to_Handheld_Devices.pdf
NVIDIA Corporation: NVIDIA CUDA programming guide version 4.0 (2011). http://developer.download.nvidia.com
OpenMP: Openmp: Complete specification v4.0 (2013). http://openmp.org/wp/resources/
The Portland Group: PGI compiler reference manual (2011). http://www.pgroup.com/doc/pgiref.pdf
Website, B.W.: (2011). http://www.ncsa.illinois.edu/BlueWaters/
Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC — first experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012)
Chapter Google Scholar

Download references

Acknowledgments

This research is part of the Blue Waters sustained-petascale computing project, which is supported by NSF (award number OCI 07-25070) and the state of Illinois. It was also supported by NSF under Award CNS 1111407 and by grants TIN2007-60625, TIN2010-21291-C02-01 and TIN2013-64957-C2-1-P (Spanish Government and European ERDF), gaZ: T48 research group (Aragon Government and European ESF).

Author information

Authors and Affiliations

Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, USA
Swapnil Ghike, María J. Garzarán & David Padua
Departamento de Informática e Ingeniería de Sistemas, Universidad de Zaragoza, Zaragoza, Spain
Rubén Gran

Authors

Swapnil Ghike
View author publications
You can also search for this author in PubMed Google Scholar
Rubén Gran
View author publications
You can also search for this author in PubMed Google Scholar
María J. Garzarán
View author publications
You can also search for this author in PubMed Google Scholar
David Padua
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rubén Gran .

Editor information

Editors and Affiliations

Intel Corporation, Santa Clara, California, USA
James Brodman
Intel Corporation, Santa Clara, California, USA
Peng Tu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghike, S., Gran, R., Garzarán, M.J., Padua, D. (2015). Directive-Based Compilers for GPUs. In: Brodman, J., Tu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science(), vol 8967. Springer, Cham. https://doi.org/10.1007/978-3-319-17473-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-17473-0_2
Published: 01 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17472-3
Online ISBN: 978-3-319-17473-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics