Advertisement

From Describing to Prescribing Parallelism: Translating the SPEC ACCEL OpenACC Suite to OpenMP Target Directives

  • Guido JuckelandEmail author
  • Oscar Hernandez
  • Arpith C. Jacob
  • Daniel Neilson
  • Verónica G. Vergara Larrea
  • Sandra Wienke
  • Alexander Bobyr
  • William C. Brantley
  • Sunita Chandrasekaran
  • Mathew Colgrove
  • Alexander Grund
  • Robert Henschel
  • Wayne Joubert
  • Matthias S. Müller
  • Dave Raddatz
  • Pavel Shelepugin
  • Brian Whitney
  • Bo Wang
  • Kalyan Kumaran
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9945)

Abstract

Current and next generation HPC systems will exploit accelerators and self-hosting devices within their compute nodes to accelerate applications. This comes at a time when programmer productivity and the ability to produce portable code has been recognized as a major concern. One of the goals of OpenMP and OpenACC is to allow the user to specify parallelism via directives so that compilers can generate device specific code and optimizations. However, the challenge of porting codes becomes more complex because of the different types of parallelism and memory hierarchies available on different architectures. In this paper we discuss our experience with porting the SPEC ACCEL benchmarks from OpenACC to OpenMP 4.5 using a performance portable style that lets the compiler make platform-specific optimizations to achieve good performance on a variety of systems. The ported SPEC ACCEL OpenMP benchmarks were validated on different platforms including Xeon Phi, GPUs and CPUs. We believe that this experience can help the community and compiler vendors understand how users plan to write OpenMP 4.5 applications in a performance portable style.

Keywords

SPEC SPEC ACCEL OpenMP OpenACC Offloading 

Notes

Acknowledgments

The authors thank Cloyce Spradling for his work on the SPEC harness as well as the SPEC POWER group for their work on enabling the integration of power measurements into other SPEC suites.

SPEC®, SPEC ACCEL™, SPEC CPU™, SPEC MPI®, and SPEC OMP® are registered trademarks of the Standard Performance Evaluation Corporation (SPEC).

References

  1. 1.
    Github repository for the extended Clang implementation supporting OpenMP 4.0 (2016). https://github.com/clang-omp/clang_trunk
  2. 2.
    Agathos, S.N., Papadogiannakis, A., Dimakopoulos, V.V.: Targeting the parallella. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015. LNCS, vol. 9233, pp. 662–674. Springer, Heidelberg (2015). doi: 10.1007/978-3-662-48096-0_51 CrossRefGoogle Scholar
  3. 3.
    Bertolli, C., Antao, S.F., Bercea, G.T., Jacob, A.C., Eichenberger, A.E., Chen, T., Sura, Z., Sung, H., Rokos, G., Appelhans, D., O’Brien, K.: Integrating GPU support for OpenMP offloading directives into clang. In: Proceedings of 2nd Workshop on the LLVM Compiler Infrastructure in HPC, LLVM 2015, NY, USA, pp. 5:1–5:11. ACM, New York (2015). http://doi.acm.org/10.1145/2833157.2833161
  4. 4.
    Bertolli, C., Antao, S.F., Eichenberger, A.E., O’Brien, K., Sura, Z., Jacob, A.C., Chen, T., Sallenave, O.: Coordinating GPU threads for OpenMP 4.0 in LLVM (2014)Google Scholar
  5. 5.
    Calore, E., Schifano, S.F., Tripiccione, R.: On portability, performance and scalability of an MPI OpenCL lattice Boltzmann code. In: Lopes, L., et al. (eds.) Euro-Par 2014. LNCS, vol. 8806, pp. 438–449. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-14313-2_37 Google Scholar
  6. 6.
    Cray: Cray Compiling Environment Release: Overview and Installation Guide (Document: S-5212-84) (2015)Google Scholar
  7. 7.
    Foundation, F.S.: GCC 6 Release Series: Changes, New Features, and Fixes (2016). https://gcc.gnu.org/gcc-6/changes.html
  8. 8.
    GCC Wiki: Offloading Support in GCC. https://gcc.gnu.org/wiki/Offloading
  9. 9.
    Herdman, J.A., Gaudin, W.P., Perks, O., Beckingsale, D.A., Mallinson, A.C., Jarvis, S.A.: Achieving portability and performance through OpenACC. In: Proceedings of 1st Workshop on Accelerator Programming Using Directives, WACCPD 2014, pp. 19–26. IEEE Press, Piscataway (2014). http://dx.doi.org/10.1109/WACCPD.2014.10
  10. 10.
    Intel Corporation: Intel\(\textregistered \) C++ Compiler 16.0 User and Reference Guide: OpenMP* Support (2015)Google Scholar
  11. 11.
    Juckeland, G., Grund, A., Nagel, W.E.: Performance portable applications for hardware accelerators: lessons learned from SPEC ACCEL. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW), pp. 689–698, May 2015Google Scholar
  12. 12.
    Juckeland, G., et al.: SPEC ACCEL: a standard application suite for measuring hardware accelerator performance. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 46–67. Springer, Heidelberg (2015). http://dx.doi.org/10.1007/978-3-319-17248-4_3 Google Scholar
  13. 13.
    Liao, C., Yan, Y., Supinski, B.R., Quinlan, D.J., Chapman, B.: Early experiences with the OpenMP accelerator model. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 84–98. Springer, Heidelberg (2013). http://dx.doi.org/10.1007/978-3-642-40698-0_7 CrossRefGoogle Scholar
  14. 14.
    Lin, P.H., Liao, C., Quinlan, D.J., Guzik, S.: Experiences of using the OpenMP accelerator model to port DOE stencil applications. In: Terboven, C., de Supinski, B.R., Reble, P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2015. LNCS, vol. 9342, pp. 45–59. Springer, Berlin (2015)CrossRefGoogle Scholar
  15. 15.
    Martineau, M., McIntosh-Smith, S., Boulton, M., Gaudin, W.: An evaluation of emerging many-core parallel programming models. In: Proceedings of 7th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2016, NY, USA pp. 1–10 (2016)Google Scholar
  16. 16.
    Mitra, G., Stotzer, E., Jayaraj, A., Rendell, A.P.: Implementation and optimization of the OpenMP accelerator model for the TI Keystone II architecture. In: DeRose, L., Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 202–214. Springer, Heidelberg (2014)Google Scholar
  17. 17.
    Müller, M.S., et al.: SPEC OMP2012 — an application benchmark suite for parallel systems using OpenMP. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 223–236. Springer, Heidelberg (2012). http://dx.doi.org/10.1007/978-3-642-30961-8_17 CrossRefGoogle Scholar
  18. 18.
    Müller, M.S., van Waveren, M., Lieberman, R., Whitney, B., Saito, H., Kumaran, K., Baron, J., Brantley, W.C., Parrott, C., Elken, T., Feng, H., Ponder, C.: SPEC MPI2007 - an application benchmark suite for parallel systems using MPI. Concurr. Comput.: Pract. Exper. 22(2), 191–205 (2010). http://dx.doi.org/10.1002/cpe.v22:2
  19. 19.
    Newburn, C.J., Dmitriev, S., Narayanaswamy, R., Wiegert, J., Murty, R., Chinchilla, F., Deodhar, R., McGuire, R.: Offload compiler runtime for the Intel Xeon Phi™ coprocessor. In: 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops and Ph.D. Forum (IPDPSW), pp. 1213–1225 (2013)Google Scholar
  20. 20.
    OpenMP Architecture Review Board: OpenMP Application Program Interface. Version 4.0, July 2013. http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
  21. 21.
    OpenMP Architecture Review Board: OpenMP Application Program Interface. Version 4.5, November 2015. http://www.openmp.org/mp-documents/openmp-4.5.pdf
  22. 22.
    Oracle: Oracle\({\textregistered }\) Solaris Studio 12.4: OpenMP API User’s Guide (2014). http://docs.oracle.com/cd/E37069_01/pdf/E37081.pdf
  23. 23.
    PathScale: PathScale ENZO 2015 (2015). http://www.pathscale.com/enzo
  24. 24.
    Pennycook, S.J., Jarvis, S.A.: Developing Performance-Portable Molecular Dynamics Kernels in OpenCL. In: 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), pp. 386–395 (2012)Google Scholar
  25. 25.
    Sabne, A., Sakdhnagool, P., Lee, S., Vetter, J.S.: Evaluating performance portability of OpenACC. In: Brodman, J., Tu, P. (eds.) LCPC 2014. LNCS, vol. 8967, pp. 51–66. Springer, Heidelberg (2015). http://dx.doi.org/10.1007/978-3-319-17473-0_4 Google Scholar
  26. 26.
    Strohmeier, E., Simon, H., Dongarra, J., Meurer, M.: The 46th top. 500 list, November 2015. http://top500.org/list/2015/11/
  27. 27.
    Wienke, S., Terboven, C., Beyer, J.C., Müller, M.S.: A pattern-based comparison of OpenACC and OpenMP for accelerator computing. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014 Parallel Processing. LNCS, vol. 8632, pp. 812–823. Springer, Heidelberg (2014). http://dx.doi.org/10.1007/978-3-319-09873-9_68 Google Scholar
  28. 28.
    Wong, M.: The future of GPU/accelerator programming models. In: Keynote at the 2nd Workshop on the LLVM Compiler Infrastructure in HPC (2015). https://llvm-hpc2-workshop.github.io/slides/Wong.pdf
  29. 29.

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Guido Juckeland
    • 1
    • 2
    Email author
  • Oscar Hernandez
    • 1
    • 3
  • Arpith C. Jacob
    • 1
    • 4
  • Daniel Neilson
    • 1
    • 5
  • Verónica G. Vergara Larrea
    • 1
    • 3
  • Sandra Wienke
    • 1
    • 6
  • Alexander Bobyr
    • 1
    • 7
  • William C. Brantley
    • 1
    • 8
  • Sunita Chandrasekaran
    • 1
    • 9
  • Mathew Colgrove
    • 1
    • 10
  • Alexander Grund
    • 1
    • 2
  • Robert Henschel
    • 1
    • 11
  • Wayne Joubert
    • 1
    • 3
  • Matthias S. Müller
    • 1
    • 6
  • Dave Raddatz
    • 1
    • 12
  • Pavel Shelepugin
    • 1
    • 7
  • Brian Whitney
    • 1
    • 13
  • Bo Wang
    • 1
    • 6
  • Kalyan Kumaran
    • 1
    • 14
  1. 1.SPEC High Performance Group (HPG)GainesvilleUSA
  2. 2.Helmholtz-Zentrum Dresden-Rossendorf (HZDR)DresdenGermany
  3. 3.Oak Ridge National LaboratoryOak RidgeUSA
  4. 4.IBM T. J. Watson Research CenterYorktown HeightsUSA
  5. 5.IBMMarkhamCanada
  6. 6.RWTH Aachen UniversityAachenGermany
  7. 7.IntelNizhny NovgorodRussia
  8. 8.AMDAustinUSA
  9. 9.University of DelawareNewarkUSA
  10. 10.NVIDIASanta ClaraUSA
  11. 11.Indiana UniversityBloomingtonUSA
  12. 12.SGIMilpitasUSA
  13. 13.OracleRedwood ShoresUSA
  14. 14.Argonne National LaboratoryLemontUSA

Personalised recommendations