Advertisement

OpenCL Performance Portability for Xeon Phi Coprocessor and NVIDIA GPUs: A Case Study of Finite Element Numerical Integration

  • Krzysztof Banaś
  • Filip Krużel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8806)

Abstract

We present the performance analysis of OpenCL kernels for three recently introduced many-core accelerator architectures: Intel Xeon Phi coprocessor and NVIDIA Kepler and Fermi GPUs. We use a case study of finite element numerical integration, a practically important and theoretically interesting algorithm used in scientific computing. We design a single parametrized kernel for all three architectures and test the performance obtained in numerical tests. We indicate possible further, architecture dependent, optimizations and draw conclusions on the performance portability for different accelerator architectures and OpenCL programming model.

Keywords

OpenCL performance portability performance analysis Xeon Phi coprocessor GPU Kepler architecture Fermi architecture finite elements numerical integration 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Banaś, K., Płaszewski, P., Macioł, P.: Numerical integration on GPUs for higher order finite elements. Computers and Mathematics with Applications 67(6), 1319–1344 (2014)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Becker, E., Carey, G., Oden, J.: Finite Elements. An Introduction. Prentice Hall, Englewood Cliffs (1981)zbMATHGoogle Scholar
  3. 3.
    Benkner, S., Pllana, S., Traff, J., Tsigas, P., Dolinsky, U., Augonnet, C., Bachmayer, B., Kessler, C., Moloney, D., Osipov, V.: Peppher: Efficient and productive usage of hybrid computing systems. IEEE Micro 31(5), 28–41 (2011)CrossRefGoogle Scholar
  4. 4.
    Cecka, C., Lew, A.J., Darve, E.: Assembly of finite element methods on graphics processors. International Journal for Numerical Methods in Engineering 85(5), 640–669 (2011), http://dx.doi.org/10.1002/nme.2989 CrossRefzbMATHGoogle Scholar
  5. 5.
    Goto, K., van de Geijn, R.A.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34(3), 12:1–12:25 (2008), http://doi.acm.org/10.1145/1356052.1356053
  6. 6.
    Group, K.O.W.: The OpenCL Specification, version 1.1 (2010), http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf
  7. 7.
    Intel: Intel SDK for OpenCL Applications XE 2013 R3. User’s Guide (2013)Google Scholar
  8. 8.
    Jeffers, J., Reinders, J.: Intel Xeon Phi Coprocessor High Performance Programming, 1st edn. Morgan Kaufmann (2013)Google Scholar
  9. 9.
    Krużel, F., Banaś, K.: Vectorized OpenCL implementation of numerical integration for higher order finite elements. Computers and Mathematics with Applications 66(10), 2030–2044 (2013)CrossRefGoogle Scholar
  10. 10.
    Markall, G.R., Ham, D.A., Kelly, P.H.: Towards generating optimised finite element solvers for gpus from high-level specifications. Procedia Computer Science 1(1), 1815–1823 (2010); iCCS 2010CrossRefGoogle Scholar
  11. 11.
    Marr, D.T., Binns, F., Hill, D.L., Hinton, G., Koufaty, D.A., Miller, A.J., Upton, M.: Hyper-Threading Technology Architecture and Microarchitecture. Intel Technology Journal 6(1), 4–15 (2002)Google Scholar
  12. 12.
    NVIDIA: NVIDIA CUDA C Programming Guide Version 5.0 (2012)Google Scholar
  13. 13.
    Reguly, I., Giles, M.: Finite element algorithms and data structures on graphical processing units. International Journal of Parallel Programming, 1–37 (2013), http://dx.doi.org/10.1007/s10766-013-0301-6
  14. 14.
    Rul, S., Vandierendonck, H., D’Haene, J., De Bosschere, K.: An experimental study on performance portability of opencl kernels. In: Application Accelerators in High Performance Computing, 2010 Symposium, Papers, Knoxville, TN, USA, p. 3 (2010)Google Scholar
  15. 15.
  16. 16.
    Wienke, S., an Mey, D., Müller, M.S.: Accelerators for technical computing: Is it worth the pain? A TCO perspective. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 330–342. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  17. 17.
    Williams, S., Waterman, A., Patterson, D.: Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009), http://doi.acm.org/10.1145/1498765.1498785 CrossRefGoogle Scholar
  18. 18.
    Yuen, D., Wang, L., Chi, X., Johnsson, L., Ge, W., Shi, Y. (eds.): GPU Solutions to Multi-scale Problems in Science and Engineering. Springer (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Krzysztof Banaś
    • 1
  • Filip Krużel
    • 2
  1. 1.AGH University of Science and TechnologyKrakówPoland
  2. 2.Institute of Computer ModellingCracow University of TechnologyKrakówPoland

Personalised recommendations