GPU Computing Using Concurrent Kernels: A Case Study

  • Fengshun LuEmail author
  • Junqiang Song
  • Fukang Yin
  • Xiaoqian Zhu
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 126)


With the rapid evolution of processor architectures, more attention has been paid to the hardware-oriented numeric applications. Based on the newly released Fermi architecture, we investigate the approach to accelerate high performance computing (HPC) applications with concurrent kernels. We concentrated on two performance factors, namely the launching order of concurrent kernels and the kernel granularity. Extensive experiments show that the launching order of concurrent kernels can hardly affect application performance. Particularly, we identify the heuristics of kernel granularity that may result in the best performance, i.e. the occupancy of each kernel should be in the interval [30%, 50%].


Legendre Transform Concurrent Execution Small Kernel Fermi Architecture Concurrent Kernel 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gepner, P., Kowalik, M.F.: Multi-core processors: new way to achieve high system performance. In: Proceedings of the International Symposium on Parallel Computing in Electrical Engineering, pp. 9–13 (2006)Google Scholar
  2. 2.
    Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proceedings of the IEEE 96(5), 879–899 (2008)CrossRefGoogle Scholar
  3. 3.
    Underwood, K.: FPGAs vs. CPUs: trends in peak floating point performance. In: Proceedings of the 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, pp. 171–180 (2004)Google Scholar
  4. 4.
    AMD: Coming soon: the AMD FusionTM family of APUs,
  5. 5.
  6. 6.
    Preis, T., Virnau, P., Paul, W., Schneider, J.J.: GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model. Journal of Computational Physics 228, 4468–4477 (2009)zbMATHCrossRefGoogle Scholar
  7. 7.
    Michalakes, J., Vachharajani, M.: GPU acceleration of numerical weather prediction. Parallel Processing Letters 18(4), 531–548 (2008)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Govindaraju, N.K., Larsen, S., Gray, J., Manocha, D.: A memory model for scientic algorithms on graphics processors. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, vol. (89) (2006)Google Scholar
  9. 9.
    Meredith, J.S., Alvarez, G., Maier, T.A., Schulthess, T.C., Vetter, J.S.: Accuracy and performance of graphics processors: a Quantum Monte Carlo application case study. Parallel Computing 35, 151–163 (2009)CrossRefGoogle Scholar
  10. 10.
    Jang, B., Do, S., Kaeli, D., Pien, H.: Architecture-aware optimization targeting multithreaded stream computing. In: Proceedings of Second Workshop on General-Purpose Computation on Graphics Processing Units (2009)Google Scholar
  11. 11.
    Turek, S., Becker, C., Kilian, S.: Hardware–oriented numerics and concepts for PDE software. Future Generation Computer Systems 22(1-2), 217–238 (2003)CrossRefGoogle Scholar
  12. 12.
    Turek, S., Goddeke, D., Becker, C., Buijssen, S.H.M., Wobker, H.: FEAST–realisation of hardware-oriented numerics for HPC simulations with finite elements. Concurrency and Computation: Practive and Experience 22, 2195–2296 (2010)CrossRefGoogle Scholar
  13. 13.
    Hack, J.J., Jakob, R.: Description of a global shallow water model based on the spectral transform method. Technical Report NCAR/TN-343+STR (1992)Google Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2012

Authors and Affiliations

  • Fengshun Lu
    • 1
    Email author
  • Junqiang Song
    • 1
  • Fukang Yin
    • 1
  • Xiaoqian Zhu
    • 1
    • 2
  1. 1.College of ComputerNational University of Defense TechnologyChangshaP.R. China
  2. 2.National Supercomputing Center in TianjinTianjinP.R. China

Personalised recommendations