A Comparison of Performance Tuning Process for Different Generations of NVIDIA GPUs and an Example Scientific Computing Algorithm

  • Krzysztof BanaśEmail author
  • Filip Krużel
  • Jan Bielański
  • Kazimierz Chłoń
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10777)


We consider the performance of a selected computational kernel from a scientific code on different generations of NVIDIA GPUs. The code that we use for tests is an OpenCL implementation of finite element numerical integration algorithm. In the current contribution we describe the performance tuning for the code, done by searching a parameter space associated with the code. The results of tuning for different generations of NVIDIA GPUs serve as a basis for analyses and conclusions.


Graphics processors Performance tuning OpenCL Finite element method Numerical integration 


  1. 1.
    Banaś, K., Płaszewski, P., Macioł, P.: Numerical integration on GPUs for higher order finite elements. Comput. Math. Appl. 67(6), 1319–1344 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Banaś, K., Krużel, F., Bielański, J.: Finite element numerical integration for first order approximations on multi- and many-core architectures. Comput. Methods Appl. Mech. Eng. 305, 827–848 (2016)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Cecka, C., Lew, A.J., Darve, E.: Assembly of finite element methods on graphics processors. Int. J. Numer. Methods Eng. 85(5), 640–669 (2011)CrossRefzbMATHGoogle Scholar
  4. 4.
    Davidson, A., Owens, J.: Toward techniques for auto-tuning GPU algorithms. In: Jónasson, K. (ed.) PARA 2010. LNCS, vol. 7134, pp. 110–119. Springer, Heidelberg (2012). CrossRefGoogle Scholar
  5. 5.
    Dziekonski, A., Sypek, P., Lamecki, A., Mrozowski, M.: Generation of large finite-element matrices on multiple graphics processors. Int. J. Numer. Methods Eng. 94(2), 204–220 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Group, K.O.W.: The OpenCL Specification, version 1.1 (2010).
  7. 7.
    Hennessy, J.L., Patterson, D.A.: Computer Architecture, Fifth Edition: A Quantitative Approach, 5th edn. Morgan Kaufmann Publishers Inc., San Francisco (2011)zbMATHGoogle Scholar
  8. 8.
    Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 28, 39–55 (2008)CrossRefGoogle Scholar
  9. 9.
    Markall, G.R., Slemmer, A., Ham, D.A., Kelly, P.H.J., Cantwell, C.D., Sherwin, S.J.: Finite element assembly strategies on multi-core and many-core architectures. Int. J. Numer. Methods Fluids 71(1), 80–97 (2013)MathSciNetCrossRefGoogle Scholar
  10. 10.
    NVIDIA: NVIDIA CUDA C Programming Guide Version 5.0 (2012)Google Scholar
  11. 11.
    Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, SC 1998, pp. 1–27. IEEE Computer Society, Washington (1998)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.AGH University of Science and TechnologyKrakówPoland
  2. 2.Cracow University of TechnologyKrakówPoland

Personalised recommendations