Fine-Grained Parallelization of a Vlasov-Poisson Application on GPU
Understanding turbulent transport in magnetised plasmas is a subject of major importance to optimise experiments in tokamak fusion reactors. Also, simulations of fusion plasma consume a great amount of CPU time on today’s supercomputers. The Vlasov equation provides a useful framework to model such plasma. In this paper, we focus on the parallelization of a 2D semi-Lagrangian Vlasov solver on GPGPU. The originality of the approach lies in the needed overhaul of both numerical scheme and algorithms, in order to compute accurately and efficiently in the CUDA framework. First, we show how to deal with 32-bit floating point precision, and we look at accuracy issues. Second, we exhibit a very fine grain parallelization that fits well on a many-core architecture. A speed-up of almost 80 has been obtained by using a GPU instead of one CPU core. As far as we know, this work presents the first semi-Lagrangian Vlasov solver ported onto GPU.
KeywordsCentral Processing Unit Global Memory Double Precision Vlasov Equation Single Precision
Unable to display preview. Download preview PDF.
- [BAB+08]Bowers, K.J., Albright, B.J., Bergen, B., Yin, L., Barker, K.J., Kerbyson, D.J.: 0.374 pflop/s trillion-particle kinetic modeling of laser plasma interaction on roadrunner. In: Proc. of Supercomputing. IEEE Press, Los Alamitos (2008)Google Scholar
- [CLS06]Crouseilles, N., Latu, G., Sonnendrücker, E.: Hermite spline interpolation on patches for a parallel solving of the Vlasov-Poisson equation. Technical Report 5926, INRIA (2006), http://hal.inria.fr/inria-00078455/en/
- [NVI09]NVIDIA. CUDA Programming Guide, 2.3 (2009)Google Scholar