Advertisement

Evaluating Performance Portability of OpenACC

  • Amit SabneEmail author
  • Putt Sakdhnagool
  • Seyong Lee
  • Jeffrey S. Vetter
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8967)

Abstract

Accelerator-based heterogeneous computing is gaining momentum in High Performance Computing arena. However, the increased complexity of the accelerator architectures demands more generic, high-level programming models. OpenACC is one such attempt to tackle the problem. While the abstraction endowed by OpenACC offers productivity, it raises questions on its portability. This paper evaluates the performance portability obtained by OpenACC on twelve OpenACC programs on NVIDIA CUDA, AMD GCN, and Intel MIC architectures. We study the effects of various compiler optimizations and OpenACC program settings on these architectures to provide insights into the achieved performance portability.

Keywords

OpenACC Performance portability High performance computing 

Notes

Acknowledgements

The paper has been authored by Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC under Contract #DE-AC05-00OR22725 to the U.S. Government. Accordingly, the U.S. Government retains a non-exclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes. This research is sponsored by the Office of Advanced Scientific Computing Research in the U.S. Department of Energy. This research is sponsored by the Office of Advanced Scientific Computing Research in the U.S. Department of Energy.

References

  1. 1.
    The heterogeneous offload model for intel many integrated core architectures. http://software.intel.com/sites/default/files/article/326701/heterogeneous-programming-model.pdf. Accessed 25 June 2014
  2. 2.
    OpenARC: Open Accelerator Research Compiler. http://ft.ornl.gov/research/openarc. Accessed 25 June 2014
  3. 3.
    Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Ha Lee, S., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC) (2009)Google Scholar
  4. 4.
    Dave, C., Bae, H., Min, S.J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: a source-to-source compiler infrastructure for multicores. IEEE Comput. 42(12), 36–42 (2009)CrossRefGoogle Scholar
  5. 5.
    Han, T.D., Abdelrahman, T.S.: hiCUDA: a high-level directive-based language for GPU programming. In: GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, pp. 52–61. ACM (2009)Google Scholar
  6. 6.
    Intel: OpenCL Design and Programming Guide for the Intel Xeon Phi Coprocessor. http://software.intel.com/en-us/articles/opencl-design-and-programming-guide-for-the-intel-xeon-phi-coprocessor. Accessed 25 June 2014
  7. 7.
    Lee, S., Eigenmann, R.: OpenMPC: extended OpenMP programming and tuning for GPUs. In: SC 2010: Proceedings of the 2010 ACM/IEEE Conference on Supercomputing. IEEE Press (2010)Google Scholar
  8. 8.
    Lee, S., Min, S.J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 101–110. ACM, February 2009Google Scholar
  9. 9.
    Lee, S., Vetter, J.S.: Openarc: open accelerator research compiler for directive-based, efficient heterogeneous computing. In: Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2014, pp. 115–120. ACM, New York (2014). http://doi.acm.org/10.1145/2600212.2600704
  10. 10.
    NVIDIA: CUDA (2013). https://developer.nvidia.com/cuda-zone. Accessed 25 June 2014
  11. 11.
    OpenACC: OpenACC: directives for Accelerators (2011). http://www.openacc-standard.org. Accessed 25 June 2014
  12. 12.
    OpenCL: OpenCL (2013). http://www.khronos.org/opencl/. Accessed 25 June 2014
  13. 13.
    Ravi, N., Yang, Y., Bao, T., Chakradhar, S.: Apricot: an optimizing compiler and productivity tool for x86-compatible many-core coprocessors. In: Proceedings of the 26th ACM International Conference on Supercomputing, ICS 2012, pp. 47–58. ACM, New York (2012). http://doi.acm.org/10.1145/2304576.2304585
  14. 14.
    Spafford, K., Meredith, J.S., Lee, S., Li, D., Roth, P.C., Vetter, J.S.: The tradeoffs of fused memory hierarchies in heterogeneous architectures. In: ACM Computing Frontiers (CF). ACM, Cagliari (2012)Google Scholar
  15. 15.
    Vetter, J.S. (ed.): Contemporary High Performance Computing: From Petascale Toward Exascale. CRC Computational Science Series, vol. 1, 1st edn. Taylor and Francis, Boca Raton (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Amit Sabne
    • 1
    Email author
  • Putt Sakdhnagool
    • 1
  • Seyong Lee
    • 2
  • Jeffrey S. Vetter
    • 2
    • 3
  1. 1.Purdue UniversityWest LafayetteUSA
  2. 2.Oak Ridge National LaboratoryOak RidgeUSA
  3. 3.Georgia Institute of TechnologyAtlantaUSA

Personalised recommendations