Skip to main content

Directive-Based Compilers for GPUs

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8967))

Abstract

General Purpose Graphics Computing Units can be effectively used for enhancing the performance of many contemporary scientific applications. However, programming GPUs using machine-specific notations like CUDA or OpenCL can be complex and time consuming. In addition, the resulting programs are typically fine-tuned for a particular target device. A promising alternative is to program in a conventional and machine-independent notation extended with directives and use compilers to generate GPU code automatically. These compilers enable portability and increase programmer productivity and, if effective, would not impose much penalty on performance.

This paper evaluates two such compilers, PGI and Cray. We first identify a collection of standard transformations that these compilers can apply. Then, we propose a sequence of manual transformations that programmers can apply to enable the generation of efficient GPU kernels. Lastly, using the Rodinia Benchmark suite, we compare the performance of the code generated by the PGI and Cray compilers with that of code written in CUDA. Our evaluation shows that the code produced by the PGI and Cray compilers can perform well. For 6 of the 15 benchmarks that we evaluated, the compiler generated code achieved over 85 % of the performance of a hand-tuned CUDA version.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In the PGI version of CFD Solver, we also had to separate the individual float values included in a structure, but this was most probably due to a bug.

References

  1. Amini, M., Coelho, F., Irigoin, F., Keryell, R.: Static compilation analysis for host-accelerator communication optimization. In: Rajopadhye, S., Mills Strout, M. (eds.) LCPC 2011. LNCS, vol. 7146, pp. 237–251. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  2. Bordawekar, R., Bondhugula, U., Rao, R.: Can CPUs match GPUs on performance with productivity?: Experiences with optimizing a flop-intensive application on CPUs and GPU. Technical report RC25033, IBM, August 2010

    Google Scholar 

  3. Boyer, M., Tarjan, D., Acton, S.T., Skadron, K.: Accelerating leukocyte tracking using cuda: a case study in leveraging manycore coprocessors. In: Proceedings of IPDPS, pp. 1–12 (2009)

    Google Scholar 

  4. CAPS Enterprise: HMPP workbench (2011). http://www.caps-entreprise.com/technology/hmpp/

  5. CAPS Enterprise and Cray Inc. and NVIDIA and the Portland Group: The openacc application programming interface, v1.0, November 2011. http://www.openacc-standard.org/

  6. Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IISWC 2009. pp. 44–54, October 2009

    Google Scholar 

  7. Che, S., et al.: A characterization of the rodinia benchmark suite with comparison to contemporary CMP workloads. In: Proceedings of IISWC, pp. 1–11 (2010)

    Google Scholar 

  8. Cloutier, B., Muite, B.K., Rigge, P.: A comparison of CPU and GPU performance for fourier pseudospectral simulations of the navier-stokes, cubic nonlinear schrodinger and sine gordon equations. ArXiv e-prints, June 2012

    Google Scholar 

  9. CRAY: Cray Compiler Environment (2011). http://docs.cray.com/books/S-2179-52/html-S-2179-52/index.html

  10. Grauer Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. In: Proceedings of InPar, pp. 1–10 (2012)

    Google Scholar 

  11. Hacker, H., Trinitis, C., Weidendorfer, J., Brehm, M.: Considering GPGPU for HPC Centers: Is It Worth the effort? In: Keller, R., Kramer, D., Weiss, J.-P. (eds.) Facing the Multicore-Challenge. LNCS, vol. 6310, pp. 118–130. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Han, T., Abdelrahman, T.: HiCUDA: high-level GPGPU programming. IEEE Trans. Parallel Distrib. Syst. 22(1), 78–90 (2011)

    Article  Google Scholar 

  13. Henderson, T., Middlecoff, J., Rosinski, J., Govett, M., Madden, P.: Experience applying fortran GPU compilers to numerical weather prediction. In: Proceedings of SAAHPC, pp. 34–41, July 2011

    Google Scholar 

  14. Hernandez, O., Ding, W., Chapman, B., Kartsaklis, C., Sankaran, R., Graham, R.: Experiences with high-level programming directives for porting applications to GPUs. In: Keller, R., Kramer, D., Weiss, J.-P. (eds.) Facing the Multicore - Challenge II. LNCS, vol. 7174, pp. 96–107. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  15. Enos, J., et al.: Quantifying the impact of GPUs on performance and energy efficiency in HPC clusters. In: Internatioanl Green Computing Conference, pp. 317–324, August 2010

    Google Scholar 

  16. Jablin, T.B., et al.: Automatic CPU-GPU communication management and optimization. SIGPLAN Not. 47(6), 142–151 (2011)

    Article  Google Scholar 

  17. Jin, H., Kellogg, M., Mehrotra, P.: Using compiler directives for accelerating CFD applications on GPUs. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 154–168. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  18. Kennedy, K., Allen, J.R.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc., San Francisco (2002)

    Google Scholar 

  19. Khronos Group: Opencl - the open standard for parallel programming of heterogeneous systems (2011). http://www.khronos.org/opencl

  20. Lee, S., Eigenmann, R.: OpenMPC: extended OpenMP programming and tuning for GPUs. In: Proceedings of SC 2010 (2010)

    Google Scholar 

  21. Lee, S., Min, S.J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In: Proceedings of PPoPP 2009 (2010)

    Google Scholar 

  22. Lee, S., Vetter, J.S.: Early evaluation of directive-based GPU programming models for productive exascale computing. In: Proceedings of SC2012. IEEE Press, Salt Lake City (2012)

    Google Scholar 

  23. Leung, A., Vasilache, N., Meister, B., Baskaran, M., Wohlford, D., Bastoul, C., Lethin, R.: A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction. In: Proceedings of GPGPU (2010)

    Google Scholar 

  24. Membarth, R., Hannig, F., Teich, J., Korner, M., Eckert, W.: Frameworks for GPU accelerators: a comprehensive evaluation using 2D/3D image registration. In: Proceedings of SASP, pp. 78–81, June 2011

    Google Scholar 

  25. NVIDIA: Compute Command Line Profiler. NVIDIA Whitepaper

    Google Scholar 

  26. NVIDIA: The Benefits of Multiple CPU Cores in Mobile Devices. NVIDIA Whitepaper. http://www.nvidia.com/content/PDF/tegra_white_papers/Benefits-of-Multi-core-CPUs-in-Mobile-Devices_Ver1.2.pdf

  27. NVIDIA: Bring high-end graphics to handheld devices. NVIDIA White Paper (2011). http://www.nvidia.com/content/PDF/tegra_white_papers/Bringing_High-End_Graphics_to_Handheld_Devices.pdf

  28. NVIDIA Corporation: NVIDIA CUDA programming guide version 4.0 (2011). http://developer.download.nvidia.com

  29. OpenMP: Openmp: Complete specification v4.0 (2013). http://openmp.org/wp/resources/

  30. The Portland Group: PGI compiler reference manual (2011). http://www.pgroup.com/doc/pgiref.pdf

  31. Website, B.W.: (2011). http://www.ncsa.illinois.edu/BlueWaters/

  32. Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC — first experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

Download references

Acknowledgments

This research is part of the Blue Waters sustained-petascale computing project, which is supported by NSF (award number OCI 07-25070) and the state of Illinois. It was also supported by NSF under Award CNS 1111407 and by grants TIN2007-60625, TIN2010-21291-C02-01 and TIN2013-64957-C2-1-P (Spanish Government and European ERDF), gaZ: T48 research group (Aragon Government and European ESF).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rubén Gran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ghike, S., Gran, R., Garzarán, M.J., Padua, D. (2015). Directive-Based Compilers for GPUs. In: Brodman, J., Tu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science(), vol 8967. Springer, Cham. https://doi.org/10.1007/978-3-319-17473-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17473-0_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17472-3

  • Online ISBN: 978-3-319-17473-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics