Abstract
Particle-in-cell (PIC) method has got much benefits from GPU-accelerated heterogeneous systems. However, the performance of PIC is constrained by the interpolation operations in the weighting process on GPU (graphic processing unit). Aiming at this problem, a fast weighting method for PIC simulation on GPU-accelerated systems was proposed to avoid the atomic memory operations during the weighting process. The method was implemented by taking advantage of GPU’s thread synchronization mechanism and dividing the problem space properly. Moreover, software managed shared memory on the GPU was employed to buffer the intermediate data. The experimental results show that the method achieves speedups up to 3.5 times compared to previous works, and runs 20.08 times faster on one NVIDIA Tesla M2090 GPU compared to a single core of Intel Xeon X5670 CPU.
Similar content being viewed by others
References
BIRDSALL C, LANGDON A. Plasma physics via computer simulation [M]. New York: Adam Hilger, 1991: 23–24.
PRZEBINDA V, CARY J. Some improvements in PIC performance through sorting, caching, and dynamic load balancing [M]. Boulder, Colorado: University of Colorado, 2005: 1–14.
QIANG J, RYNE R, HABIB S, DECYK V. An object-oriented parallel particle-in-cell code for beam dynamics simulation in linear accelerators [J]. Journal of Computational Physics, 2000, 16(3): 434–451.
GERMASCHEWSKI K, RUHL H, BHATTACHARJEE A. Dynamic load-balancing and GPU computing with the particle-in-cell code PSC [J]. Bulletin of the American Physical Society, 2011, 56(1): 13–23.
MADDURI K, IM E, IBRAHIM K, WILLIAMS S, ETHIER S, OLIKER L. Gyrokinetic particle-in-cell optimization on emerging multi-and manycore platforms [J]. Parallel Computing, 2011, 37(9): 501–520.
FAN Z, QIU F, KAUFMAN A, YOAKUM-STOVER S. GPU cluster for high performance computing [C]// Proceedings of the 2004 ACM/IEEE Conference on Supercomputing. Washington DC, USA: IEEE Computer Society, 2004: 47–58.
NVIDIA C. Compute unified device architecture programming guide [M]. Santa Clara, CA: NVIDIA Coorperation, 2010: 3–5.
STANTCHEV G, DORLAND W, GUMEROV N. Fast parallel particle-to-grid interpolation for plasma PIC simulations on the GPU [J]. Journal of Parallel and Distributed Computing, 2008, 68(10): 1339–1349.
BURAU H, WIDERA R, HONIG W, JUCKELAND G, DEBUS A, KLUGE T, SCHRAMM U, COWAN T, SAUERBREY R, BUSSMANN M. PIConGPU: A fully relativistic particle-in-cell code for a GPU cluster [J]. IEEE Transactions on Plasma Science, 2010, 38(10): 2831–2839.
KONG X, HUANG M, REN C, DECYK V. Particle-in-cell simulations with charge-conserving current deposition on graphic processing units [J]. Journal of Computational Physics, 2011, 230(4): 1676–1685.
COOKE S, LEVUSH B, CHERNYAVSKIY I, ANTONSEN T. GPU-accelerated 3d electromagnetic PIC simulations [C]// IEEE International Conference on Plasma Science (ICOPS). Washing DC, USA: IEEE Press, 2011: 1–2.
MERTMANN P, EREMIN D, MUSSENBROCK T, BRINKMANN R, AWAKOWICZ P. Fine-sorting one-dimensional particle-in-cell algorithm with montecarlo collisions on a graphics processing unit [J]. Computer Physics Communications, 2011, 18(2): 2161–2167.
HILL S, COLLIN D. Practical, dynamic visibility for games [J]. GPU Pro, 2011, 2(1): 329–330.
OWENS J, HOUSTON M, LUEBKE D, GREEN S, STONE J, PHILLIPS J. GPU computing [J]// Proceedings of the IEEE. 2008, 96(5): 879–899.
RYOO S, RODRIGUES C, BAGHSORKHI S, STONE S, KIRK D, HWU W. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA [C]// Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. UT, USA: ACM Press, 2010: 73–82.
TANG Tao, YANG Xue-jun, LIN Yu-fei. Cache miss analysis for GPU programs based on stack distance profile [C]// Proceedings of the 31st International Conference on Distributed Computing Systems. Minneapolis, USA: ICDCS ICDCS’ 11, 2011: 623–634.
CALLAHAN D, KOBLENZ B. Register allocation via hierarchical graph coloring [C]// ACM SIGPLAN Notices. CA, USA: ACM Press, 1991: 182–203.
BRIGGS P, COOPER K, TORCZON L. Improvements to graph coloring register allocation [J]. ACM Transactions on Programming Languages and Systems, 1994, 16(3): 428–455.
ZHANG Ai-qing, MO Ze-yao. Parallelization of LARED-P codes for simulation of laser plasma interactions [R]. GF Report, Technical Report, ZW-J-2002045, IAPCM, 2002.
MO Ze-yao, XU Lin-bao, ZHANG Bao-lin, SHEN Long-jun. parallel computing and performance analysis for 2-dimensional plasma simulations with particle clouds in cells method [J]. Chinese Journal of Computational Physics, 1999, 16(5): 496–504. (in Chinese)
ZHENG Chun-yang, ZHU Shao-ping, HE Xian-tu. Quasistatic magnetic field generation by an intense ultrashort laser pulse in underdense plasma [J]. Chinese Physics Letters. 2000, 17(10): 746–748.
ZHENG Chun-yang, HE Xian-tu, ZHU Shao-ping. Magnetic field generation and relativistic electron dynamics in circularly polarized intense laser interaction with dense plasma [J]. Physics of plasmas. Physics of Plasmas, 2005, 12(4): 44–55.
ZHENG Chun-yang, ZHANG Ai-qing, ZHU Shao-ping, HE Xian-tu. Simulation of electron beam instabilities in collisionless plasmas [J]. Journal of Plasma Physics, 2006, 72(2): 249–258.
CHEN Min, SHENG Zheng-ming, ZHENG Jun, MA Yan-yun, ZHANG Jie. Development and application of multi-dimensional particle-in-cell codes for investigation of laser plasma interactions [J]. Chinese Journal of Computational Physics, 2008, 25(1): 50. (in Chinese)
GARLAND M, GRAND S, NICKOLLS J, ANDERSON J, HARDWICK J, MORTON S, PHILLIPS E, ZHANG Y, VOLKOV V. Parallel computing experiences with CUDA [J]. Micro, 2008, 28(4): 13–27.
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: Projects(61170049, 60903044) supported by National Natural Science Foundation of China; Project(2012AA010903) supported by National High Technology Research and Development Program of China
Rights and permissions
About this article
Cite this article
Yang, Cq., Wu, Q., Hu, Hl. et al. Fast weighting method for plasma PIC simulation on GPU-accelerated heterogeneous systems. J. Cent. South Univ. 20, 1527–1535 (2013). https://doi.org/10.1007/s11771-013-1644-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11771-013-1644-2