Astrophysical particle simulations with large custom GPU clusters on three continents

  • R. SpurzemEmail author
  • P. Berczik
  • I. Berentzen
  • K. Nitadori
  • T. Hamada
  • G. Marcus
  • A. Kugel
  • R. Männer
  • J. Fiestas
  • R. Banerjee
  • R. Klessen
Special Issue Paper


We present direct astrophysical N-body simulations with up to six million bodies using our parallel MPI-CUDA code on large GPU clusters in Beijing, Berkeley, and Heidelberg, with different kinds of GPU hardware. The clusters are linked in the cooperation of ICCS (International Center for Computational Science). We reach about one third of the peak performance for this code, in a real application scenario with hierarchically block time-steps and a core-halo density structure of the stellar system. The code and hardware is used to simulate dense star clusters with many binaries and galactic nuclei with supermassive black holes, in which correlations between distant particles cannot be neglected.


N-Body simulations Computational astrophysics GPU clusters 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aarseth SJ (2003) Gravitational N-body simulations. Cambridge University Press, Cambridge, p 430. ISBN: 0521432723 CrossRefGoogle Scholar
  2. 2.
    Aarseth SJ (1999a) From NBODY to NBODY6: the growth of an industry. Publ Astron Soc Pac 111:1333 CrossRefGoogle Scholar
  3. 3.
    Aarseth SJ (1999b) Star cluster simulations: the state of the art. Celest Mech Dyn Astron 73:127 zbMATHCrossRefGoogle Scholar
  4. 4.
    Ahmad A, Cohen L (1973) A numerical integration scheme for the N-body gravitational problem. J Comput Phys 12:389–402 CrossRefGoogle Scholar
  5. 5.
    Akeley K, Nguyen H, Nvidia X (2007) GPU gems 3, programming techniques for high-performance graphics and general-purpose computation. Addison-Wesley, Reading Google Scholar
  6. 6.
    Barnes J, Hut P (1986) A hierarchical O(Nlog N) force-calculation algorithm. Nature 324:446 CrossRefGoogle Scholar
  7. 7.
    Barsdell BR, Barnes DG, Fluke CJ (2009) Advanced architectures for astrophysical supercomputing. In: The proceedings of ADASS XIX, Sapporo, Japan, Oct 4–8 2009. ASP Conf. Series. arXiv:1001.2048 Google Scholar
  8. 8.
    Belleman RG, Bedorf J, Portegies Zwart SF (2008) High performance direct gravitational N-body simulations on graphics processing units II. An implementation in CUDA. New Astron 13:103 CrossRefGoogle Scholar
  9. 9.
    Berczik P, Hamada T, Nitadori K, Spurzem R (2011, in preparation) The parallel GPU N-body code ϕGPU Google Scholar
  10. 10.
    Berczik P, Merritt D, Spurzem R, Bischof H-P (2006) Efficient merger of binary supermassive black holes in nonaxisymmetric galaxies. Astrophys J 642:L21 CrossRefGoogle Scholar
  11. 11.
    Berczik P, Merritt D, Spurzem R (2005) Long-term evolution of massive black hole binaries. II. Binary evolution in low-density galaxies. Astrophys J 633:680 CrossRefGoogle Scholar
  12. 12.
    Berczik P, Nakasato N, Berentzen I, Spurzem R, Marcus G, Lienhart G, Kugel A, Männer R, Burkert A, Wetzstein M, Naab T, Vasquez H, Vinogradov SB (2007). Special, hardware accelerated, parallel SPH code for galaxy evolution. In: SPHERIC—smoothed particle hydrodynamics European research interest community. p. 5 Google Scholar
  13. 13.
    Berentzen I, Preto M, Berczik P, Merritt D, Spurzem R (2009) Astrophys J 695:455 CrossRefGoogle Scholar
  14. 14.
    Chen Y, Cui X, Mei H (2010) Large-scale FFT on GPU clusters. In: Proceedings of the 24th ACM International Conference on Supercomputing, Tsukuba, Ibaraki, Japan, ICS ’10. ACM, New York, pp 315–324 CrossRefGoogle Scholar
  15. 15.
    Couchman HMP, Thomas PA, Pearce FR (1995) Hydra: an adaptive-mesh implementation of P 3M-SPH. Astrophys J 452:797 CrossRefGoogle Scholar
  16. 16.
    Cui D, Liao N, Wu W, Tan B, Lin Y (2010) Fast ARFTIS reconstruction algorithms using CUDA. In: Zhang W, Chen Z, Douglas C, Tong W (eds) High Performance Computing and Applications. Lecture Notes in Computer Science, vol 5938. Springer, Heidelberg, pp 119–126 CrossRefGoogle Scholar
  17. 17.
    Dehnen W (2002) A hierarchical O(N) force calculation algorithm. J Comput Phys 179:27 MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    Dehnen W (2000) A very fast momentum-conserving tree code. Astrophys J 536:L39 CrossRefGoogle Scholar
  19. 19.
    Egri G et al. (2007) Comput Phys Commun 177:631 CrossRefGoogle Scholar
  20. 20.
    Fellhauer M, Kroupa P, Baumgardt H, Bien R, Boily CM, Spurzem R, Wassmer N (2000) SUPERBOX—an efficient code for collisionless galactic dynamics. New Astron 5:305 CrossRefGoogle Scholar
  21. 21.
    Fukushige T, Makino J, Kawai A (2005) GRAPE-6A: a single-card GRAPE-6 for parallel PC-GRAPE cluster systems. Publ Astron Soc Jpn 57:1009 Google Scholar
  22. 22.
    Greengard L, Rokhlin V (1987) A fast algorithm for particle simulations. J Comput Phys 73:325 MathSciNetzbMATHCrossRefGoogle Scholar
  23. 23.
    Greengard L, Rokhlin V (1997) A fast algorithm for particle simulations. J Comput Phys 135:280 MathSciNetzbMATHCrossRefGoogle Scholar
  24. 24.
    Gualandris A, Merritt D (2008) Ejection of supermassive black holes from Galaxy cores. Astrophys J 678:780–797. doi: 10.1086/586877 CrossRefGoogle Scholar
  25. 25.
    Hamada T, Iitaka T The chamomile scheme: an optimized algorithm for N-body simulations on programmable graphics processing units (2007). arXiv:astro-ph/0703100
  26. 26.
    Harfst S, Gualandris A, Merritt D, Spurzem R, Portegies Zwart S, Berczik P (2007) Performance analysis of direct N-body algorithms on special-purpose supercomputers. New Astron 12:357 CrossRefGoogle Scholar
  27. 27.
    Hockney RW, Eastwood JW (1988) Computer simulation using particles. Hilger, Bristol zbMATHCrossRefGoogle Scholar
  28. 28.
    Hwu W-MW (2011) GPU computing gems. Kaufmann, Los Altos Google Scholar
  29. 29.
    Ishiyama T, Fukushige T, Makino J (2009) GreeM: Massively parallel TreePM code for large cosmological N-body simulations. Publ Astron Soc Jpn 61:1319 Google Scholar
  30. 30.
    Just A, Khan FM, Berczik P, Ernst A, Spurzem R (2010) Dynamical friction of massive objects in galactic centres. Mon Not R Astron Soc Lett 411:653 Google Scholar
  31. 31.
    Makino J, Hut P (1988) Performance analysis of direct N-body calculations. Astrophys J Suppl Ser 68:833 CrossRefGoogle Scholar
  32. 32.
    Makino J, Aarseth SJ (1992) On a Hermite integrator with Ahmad-Cohen scheme for gravitational many-body problems. Publ Astron Soc Jpn 44:141 Google Scholar
  33. 33.
    Makino J, Fukushige T, Koga M, Namura K (2003) Publ Astron Soc Jpn 55:1163 Google Scholar
  34. 34.
    Makino J (2004) A fast parallel treecode with GRAPE. Publ Astron Soc Jpn 56:521 Google Scholar
  35. 35.
    Nitadori K, Makino J (2008) Sixth- and eighth-order Hermite integrator for N-body simulations. New Astron 13:498 CrossRefGoogle Scholar
  36. 36.
    Pearce FR, Couchman HMP (1997) Hydra: a parallel adaptive grid code. New Astron 2:411 CrossRefGoogle Scholar
  37. 37.
    Portegies Zwart SF, Belleman RG, Geldof PM (2007) High-performance direct gravitational N-body simulations on graphics processing units. New Astron 12:641 CrossRefGoogle Scholar
  38. 38.
    Schive H-Y, Tsai Y-C, Chiueh T (2010) GAMER: a graphic processing unit accelerated adaptive-mesh-refinement code for astrophysics. Astrophys J Suppl Ser 186:457 CrossRefGoogle Scholar
  39. 39.
    Springel V (2005) The cosmological simulation code GADGET-2. Mon Not R Astron Soc Lett 364:1105 CrossRefGoogle Scholar
  40. 40.
    Spurzem R (1999) Direct N-body simulations. J Comput Appl Math 109:407 MathSciNetzbMATHCrossRefGoogle Scholar
  41. 41.
    Spurzem R, Berczik P, Hensler G, Theis C, Amaro-Seoane P, Freitag M, Just A (2004) Physical processes in star-gas systems. Publ Astron Soc Aust 21:188 CrossRefGoogle Scholar
  42. 42.
    Spurzem R, Berczik P, Berentzen I, Merritt D, Nakasato N, Adorf HM, Brüsemeister T, Schwekendiek P, Steinacker J, WambsganßJ, Martinez GM, Lienhart G, Kugel A, Männer R, Burkert A, Naab T, Vasquez H, Wetzstein M (2007) From Newton to Einstein—N-body dynamics in galactic nuclei and SPH using new special hardware and astrogrid-D. J Phys Conf Ser 78:012071 CrossRefGoogle Scholar
  43. 43.
    Spurzem R, Berentzen I, Berczik P, Merritt D, Amaro-Seoane P, Harfst S, Gualandris A (2008) Parallelization special hardware and post-Newtonian dynamics in direct N-body simulations. Lecture notes in physics, vol 760. Springer, Berlin, p 377 Google Scholar
  44. 44.
    Spurzem R, Berczik P, Marcus G, Kugel A, Lienhart G, Berentzen I, Männer R, Klessen R, Banerjee R (2009) Accelerating astrophysical particle simulations with programmable hardware (FPGA and GPU). Comput Sci Res Dev 23:231–239 CrossRefGoogle Scholar
  45. 45.
    Spurzem R, Berczik P, Berentzen I, Ge W, Wang X, Schive H-Y, Nitadori K, Hamada T (2011, in press) Physics and astrophysics—multiscale simulations: accelerated many-core GPU computing on three continents. In: Dubitzky W, Kurowski K, Schott B (eds) Special volume on “Large scale computing techniques for complex systems and simulations”. Wiley, New York Google Scholar
  46. 46.
    Thompson AC, Fluke CJ, Barnes DG, Barsdell BR (2010) Teraflop per second gravitational lensing ray-shooting using graphics processing units. New Astron 15:16 CrossRefGoogle Scholar
  47. 47.
    Wang P, Abel T, Kaehler R (2010) Adaptive mesh fluid simulations on GPU. New Astron 15:581 CrossRefGoogle Scholar
  48. 48.
    Wong H-C, Wong U-H, Feng X, Tang Z (2009) Efficient magnetohydrodynamic simulations on graphics processing units with CUDA. arXiv:0908.4362
  49. 49.
    Xu G (1995) A new parallel N-body gravity solver: TPM. Astrophys J Suppl Ser 98:355 CrossRefGoogle Scholar
  50. 50.
    Yasuda Koji (2007) J Comput Chem 29:334 CrossRefGoogle Scholar
  51. 51.
    Yang J, Wang Y, Chen Y (2007) GPU accelerated simulation. J Comput Phys 221:799 zbMATHCrossRefGoogle Scholar
  52. 52.
    Yokota R, Barba L (2010) Treecode and fast multipole method for N-body simulation with CUDA. arXiv:1010.1482
  53. 53.
    Yokota R, Bardhan JP, Knepley MG, Barba LA, Hamada T (2010) Biomolecular electrostatics simulation with a parallel FMM-based BEM, using up to 512 GPU’s. arXiv:1007.4591
  54. 54.
    Yoshikawa K, Fukushige T (2005) PPPM and TreePM methods on GRAPE systems for cosmological N-body simulations. Publ Astron Soc Jpn 57:849 Google Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • R. Spurzem
    • 1
    • 2
    Email author
  • P. Berczik
    • 1
    • 2
    • 6
  • I. Berentzen
    • 2
    • 5
  • K. Nitadori
    • 3
  • T. Hamada
    • 3
  • G. Marcus
    • 4
  • A. Kugel
    • 4
  • R. Männer
    • 4
  • J. Fiestas
    • 1
    • 2
  • R. Banerjee
    • 5
  • R. Klessen
    • 5
  1. 1.National Astronomical Observatories of ChinaChinese Academy of SciencesBeijingChina
  2. 2.Astronomisches Rechen-Institut (ZAH)University of HeidelbergHeidelbergGermany
  3. 3.Nagasaki Advanced Computing CenterUniversity of Nagasaki, RIKEN InstituteTokyoJapan
  4. 4.Department of Computer Science V, Central Inst. of Computer EngineeringUniversity of HeidelbergHeidelbergGermany
  5. 5.Institut für Theor. Astrophysik (ZAH)University of HeidelbergHeidelbergGermany
  6. 6.Main Astronomical ObservatoryNational Academy of Sciences of UkraineKyivUkraine

Personalised recommendations