Advertisement

Efficient Strict-Binning Particle-in-Cell Algorithm for Multi-core SIMD Processors

  • Yann Barsamian
  • Arthur Charguéraud
  • Sever A. Hirstoaga
  • Michel Mehrenberger
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11014)

Abstract

Particle-in-Cell (PIC) codes are widely used for plasma simulations. On recent multi-core hardware, performance of these codes is often limited by memory bandwidth. We describe a multi-core PIC algorithm that achieves close-to-minimal number of memory transfers with the main memory, while at the same time exploiting SIMD instructions for numerical computations and exhibiting a high degree of OpenMP-level parallelism. Our algorithm keeps particles sorted by cell at every time step, and represents particles from a same cell using a linked list of fixed-capacity arrays, called chunks. Chunks support either sequential or atomic insertions, the latter being used to handle fast-moving particles. To validate our code, called Pic-Vert, we consider a 3d electrostatic Landau-damping simulation as well as a 2d3v transverse instability of magnetized electron holes. Performance results on a 24-core Intel Skylake hardware confirm the effectiveness of our algorithm, in particular its high throughput and its ability to cope with fast moving particles.

Keywords

Particle-in-Cell Plasma physics Multi-core SIMD architecture Shared memory Chunks Strict binning Magnetized electron holes 

Notes

Acknowledgments

We would like to thank the anonymous reviewers for their valuable suggestions and comments. This work has been carried out within the framework of the EUROfusion Consortium and has received funding from the Euratom Research and Training Program 2014–2018 under Grant Agreement No. 633053. Simulations were run on the EUROfusion Marconi supercomputer, in the context of the Selavlas project led by K. Kormann. The views and opinions expressed herein do not necessarily reflect those of the European Commission.

References

  1. 1.
    Barsamian, Y., Charguéraud, A., Ketterlin, A.: A space and bandwidth efficient multicore algorithm for the particle-in-cell method. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds.) PPAM 2017. LNCS, vol. 10777, pp. 133–144. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-78024-5_13CrossRefGoogle Scholar
  2. 2.
    Barsamian, Y., Charguéraud, A., Hirstoaga, S.A., Mehrenberger, M.: Software artifacts for Euro-Par 2018 paper: “Efficient Strict-Binning Particle-in-Cell Algorithm forMulti-Core SIMD Processors”. In: Figshare (2018). https://doi.org/10.6084/m9.figshare.6391796
  3. 3.
    Birdsall, C.K., Langdon, A.B.: Plasma Physics via Computer Simulation. McGraw-Hill, New York (1985)Google Scholar
  4. 4.
    Bowers, K.J., Albright, B.J., Yin, L., Bergen, B., Kwan, T.J.T.: Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulation. Phys. Plasmas 15(5), 055703 (2008).  https://doi.org/10.1063/1.2840133CrossRefGoogle Scholar
  5. 5.
    Bussmann, M., Burau, H., Cowan, T.E., Debus, A., Huebl, A., Juckeland, G., Kluge, T., Nagel, W.E., Pausch, R., Schmitt, F., Schramm, U., Schuchart, J., Widera, R.: Radiative signatures of the relativistic Kelvin-Helmholtz instability. In: International Conference on High Performance Computing, Networking, Storage and Analysis (SC), pp. 5:1–5:12 (2013).  https://doi.org/10.1145/2503210.2504564
  6. 6.
    Decyk, V.K., Singh, T.V.: Particle-in-Cell algorithms for emerging computer architectures. Comput. Phys. Commun. 185(3), 708–719 (2014).  https://doi.org/10.1016/j.cpc.2013.10.013MathSciNetCrossRefGoogle Scholar
  7. 7.
    Durand, M., Raffin, B., Faure, F.: A packed memory array to keep moving particles sorted. In: Workshop on Virtual Reality Interaction and Physical Simulation (VRIPHYS) (2012).  https://doi.org/10.2312/PE/vriphys/vriphys12/069-077
  8. 8.
    Fonseca, R.A., Vieira, J., Fiuza, F., Davidson, A., Tsung, F.S., Mori, W.B., Silva, L.O.: Exploiting multi-scale parallelism for large scale numerical modelling of laser wakefield accelerators. Plasma Phys. Control. Fusion 55(12), 124011 (2013).  https://doi.org/10.1088/0741-3335/55/12/124011CrossRefGoogle Scholar
  9. 9.
    Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93(2), 216–231 (2005).  https://doi.org/10.1109/JPROC.2004.840301. http://www.fftw.orgCrossRefGoogle Scholar
  10. 10.
    Germaschewski, K., Fox, W., Abbott, S., Ahmadi, N., Maynard, K., Wang, L., Ruhl, H., Bhattacharjee, A.: The plasma simulation code: a modern particle-in-cell code with patch-based load-balancing. J. Comput. Phys. 318, 305–326 (2016).  https://doi.org/10.1016/j.jcp.2016.05.013MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Hockney, R.W., Eastwood, J.W.: Computer Simulation Using Particles. Institute of Physics, Philadelphia (1988).  https://doi.org/10.1201/9781439822050
  12. 12.
    Jocksch, A., Hariri, F., Tran, T.-M., Brunner, S., Gheller, C., Villard, L.: A bucket sort algorithm for the particle-in-cell method on manycore architectures. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015. LNCS, vol. 9573, pp. 43–52. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-32149-3_5CrossRefGoogle Scholar
  13. 13.
    Kong, X., Huang, M.C., Ren, C., Decyk, V.K.: Particle-in-cell simulations with charge-conserving current deposition on graphic processing units. J. Comput. Phys. 230(4), 1676–1685 (2011).  https://doi.org/10.1016/j.jcp.2010.11.032CrossRefzbMATHGoogle Scholar
  14. 14.
    Larin, A., Bastrakov, S., Bashinov, A., Efimenko, E., Surmin, I., Gonoskov, A., Meyerov, I.: Load balancing for particle-in-cell plasma simulation on multi-core systems. In: 12th Internationall Conference Parallel Processing and Applied Mathematics (PPAM), pp. 145–155 (2018).  https://doi.org/10.1007/978-3-319-78024-5_14CrossRefGoogle Scholar
  15. 15.
    McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Technical Committee on Computer Architecture Newsletter (TCCA), pp. 19–25 (1995). https://www.cs.virginia.edu/stream/
  16. 16.
    Muschietti, L., Roth, I., Carlson, C.W., Ergun, R.E.: Transverse instability of magnetized electron holes. Phys. Rev. Lett. 85(1), 94–97 (2000).  https://doi.org/10.1103/PhysRevLett.85.94CrossRefGoogle Scholar
  17. 17.
    Nakashima, H., Summura, Y., Kikura, K., Miyake, Y.: Large scale manycore-aware PIC simulation with efficient particle binning. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 202–212 (2017).  https://doi.org/10.1109/IPDPS.2017.65
  18. 18.
    Ricketson, L.F., Cerfon, A.J.: Sparse grid techniques for particle-in-cell schemes. Plasma Phys. Control. Fusion 59(2), 024002 (2017).  https://doi.org/10.1088/1361-6587/59/2/024002CrossRefGoogle Scholar
  19. 19.
    Surmin, I., Bastrakov, S., Matveev, Z., Efimenko, E., Gonoskov, A., Meyerov, I.: Co-design of a particle-in-cell plasma simulation code for Intel Xeon Phi: a first look at knights landing. In: Carretero, J., et al. (eds.) ICA3PP 2016. LNCS, vol. 10049, pp. 319–329. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49956-7_25CrossRefGoogle Scholar
  20. 20.
    Tang, W., Wang, B., Ethier, S., Kwasniewski, G., Hoefler, T., Ibrahim, K.Z., Madduri, K., Williams, S., Oliker, L., Rosales-Fernandez, C., Williams, T.: Extreme scale plasma turbulence simulations on top supercomputers world-wide. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 502–513 (2016).  https://doi.org/10.1109/SC.2016.42
  21. 21.
    Tskhakaya, D., Schneider, R.: Optimization of PIC codes by improved memory management. J. Comput. Phys. 225(1), 829–839 (2007).  https://doi.org/10.1016/j.jcp.2007.01.002CrossRefzbMATHGoogle Scholar
  22. 22.
    Vincenti, H., Lobet, M., Lehe, R., Sasanka, R., Vay, J.-L.: An efficient and portable SIMD algorithm for charge/current deposition in particle-in-cell codes. Comput. Phys. Commun. 210, 145–154 (2016).  https://doi.org/10.1016/j.cpc.2016.08.023CrossRefzbMATHGoogle Scholar
  23. 23.
    Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009).  https://doi.org/10.1145/1498765.1498785CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Université de Strasbourg, CNRS, ICube UMR 7357StrasbourgFrance
  2. 2.InriaNancyFrance
  3. 3.Université de Strasbourg, CNRS, IRMA UMR 7501StrasbourgFrance

Personalised recommendations