Computer Science - Research and Development

, Volume 24, Issue 1–2, pp 21–31 | Cite as

A novel multiple-walk parallel algorithm for the Barnes–Hut treecode on GPUs – towards cost effective, high performance N-body simulation

  • Tsuyoshi Hamada
  • Keigo Nitadori
  • Khaled Benkrid
  • Yousuke Ohno
  • Gentaro Morimoto
  • Tomonari Masada
  • Yuichiro Shibata
  • Kiyoshi Oguri
  • Makoto Taiji
Special Issue Paper

Abstract

Recently, general-purpose computation on graphics processing units (GPGPU) has become an increasingly popular field of study as graphics processing units (GPUs) continue to be proposed as high performance and relatively low cost implementation platforms for scientific computing applications. Among these applications figure astrophysical N-bodysimulations, which form one of the most challenging problems in computational science. However, in most reported studies, a simple \( \mathcal{O}(N^{2})\) algorithm was used for GPGPUs, and the resulting performances were not observed to be better than those of conventional CPUs that were based on more optimized \( \mathcal{O}(N \log N)\) algorithms such as the tree algorithm or the particle-particle particle-mesh algorithm. Because of the difficulty in getting efficient implementations of such algorithms on GPUs, a GPU cluster had no practical advantage over general-purpose PC clusters for N-bodysimulations. In this paper, we report a new method for efficient parallel implementation of the tree algorithm on GPUs. Our novel tree code allows the realization of an N-bodysimulation on a GPU cluster at a much higher performance than that on general PC clusters. We practically performed a cosmological simulation with 562 million particles on a GPU cluster using 128 NVIDIA GeForce 8800GTS GPUs at an overall cost of 168172 $. We obtained a sustained performance of 20.1 Tflops, which when normalized against a general-purpose CPU implementation leads to a performance of 8.50 Tflops. The achieved cost/performance was hence a mere $19.8 /Gflops which shows the high competitiveness of GPGPUs.

Keywords

GPU  N-bodysimulation  Tree algorithm 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Barnes J, Hut P (1986) A hierarchical O(NlogN) force-calculation algorithm. Nature 324:446–449CrossRefGoogle Scholar
  2. 2.
    Warren MS, Salmon JK (1992) Astrophysical N-body simulations using hierarchical tree data structures. In: Supercomputing ’92: Proceedings of the 1992 ACM/IEEE conference on Supercomputing, pp 570–576. IEEE Computer Society Press, Los Alamitos, CA, USAGoogle Scholar
  3. 3.
    Fukushige T, Makino J (1996) N-body simulation of galaxy formation on grape-4 special-purpose computer. In: Supercomputing ’96: Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p 48. IEEE Computer Society, Washington, DC, USA. doi: http://doi.acm.org/10.1145/369028.369130
  4. 4.
    Warren MS, Germann TC, Lomdahl PS, Beazley DM, Salmon JK (1998) Avalon: an alpha/linux cluster achieves 10 gflops for $15 k. In: Supercomputing ’98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), pp 1–11. IEEE Computer Society, Washington, DC, USAGoogle Scholar
  5. 5.
    Kawai A, Fukushige T, Makino J (1999) $7.0 /Mflops Astrophysical N-Body Simulation with Treecode on GRAPE-5. In: Proc of Supercomputing ’99 (Gordon Bell Prize winner), pp 197–206Google Scholar
  6. 6.
    Makino J, Taiji M (1995) Astrophysical N-body simulations on grape-4 special-purpose computer. In: Supercomputing ’95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p 63. ACM, New York, NY, USA. doi: http://doi.acm.org/10.1145/224170.224400
  7. 7.
    Makino J, Fukushige T, Koga M (2000) A 1.349 Tflops simulation of black holes in a galactic center on grape-6. In: Supercomputing ’00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p 43. IEEE Computer Society, Washington, DC, USAGoogle Scholar
  8. 8.
    Makino J, Kokubo E, Fukushige T (2003) Performance evaluation and tuning of grape-6 – towards 40 “real” tflops. In: SC ’03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p 2. IEEE Computer Society, Washington, DC, USAGoogle Scholar
  9. 9.
    Makino J, Kokubo E, Fukushige T, Daisaka H (2002) A 29.5 Tflops simulation of planetesimals in uranus-neptune region on grape-6. In: Supercomputing ’02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pp 1–14. IEEE Computer Society Press, Los Alamitos, CA, USAGoogle Scholar
  10. 10.
    Warren MS, Salmon JK, Becker DJ, Goda MP, Sterling T (1997) Pentium Pro Inside: I. A Treecode at 430 Gflops on ASCI Red, II. Price/Performance of $50 /Mflop on Loki and Hyglac. In: Proc. Supercomputing 97, in CD-ROM. IEEE, Los Alamitos, CAGoogle Scholar
  11. 11.
    Springel V, White SDM, Jenkins A, Frenk CS, Yoshida N, Gao L, Navarro J, Thacker R, Croton D, Helly J, Peacock JA, Cole S, Thomas P, Couchman H, Evrard A, Colberg J, Pearce F (2005) Simulating the joint evolution of quasars, galaxies and their large-scale distribution. doi:10.1038/nature03597Google Scholar
  12. 12.
    Moore B, Diemand J, Madau P, Zemp M, Stadel J (2005) Globular clusters, satellite galaxies and stellar haloes from early dark matter peaks. doi:10.1111/j.1365-2966.2006.10116.xGoogle Scholar
  13. 13.
    Nyland L, Harris M, Prins J (2004) N-body simulations on a GPU. In: Proc of the ACM Workshop on General-Purpose Computation on Graphics ProcessorsGoogle Scholar
  14. 14.
    Harris M (2005) GPGPU: General-Purpose Computation on GPUs. In: SIGGRAPH 2005 GPGPU COURSE. http://www.gpgpu.org/s2005/
  15. 15.
    Harris M (2005) GPGPU: General-Purpose Computation on GPUs. In: Game Developpers ConferenceGoogle Scholar
  16. 16.
    Zwart Portegies S, Belleman R, Geldof P (2007) High Performance Direct Gravitational N-body Simulations on Graphics Processing Unit. astro-ph/0702058Google Scholar
  17. 17.
    Hamada T, Iitaka T (2007) The chamomile scheme: An optimized algorithm for N-body simulations on programmable graphics processing units. http://arxiv.org/abs/astro-ph/0703100
  18. 18.
    Nyland L, Harris M, Prins J (2007) Fast N-body simulation with cuda. In: Nguyen H (ed) GPU Gems 3, chap. 31. Addison Wesley ProfessionalGoogle Scholar
  19. 19.
    Belleman RG, Bedorf J, Zwart SP (2007) High performance direct gravitational N-body simulations on graphics processing units – ii: An implementation in cuda. doi:10.1016/j.newast.2007.07.004Google Scholar
  20. 20.
    Hamada T, Narumi T, Sakamaki T, Yasuoka K, Taiji M, Sagara T, Egami YKO (2008) The earliest scientific computation using cuda. In: Japan CUDA conference 2008, University of TokyoGoogle Scholar
  21. 21.
    Barnes J (1990) A modified tree code: don’t laugh; it runs. J Computat Phys 87:161–170MATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Hamada T, Ohno Y, Morimoto G, Taiji M, Toshiaki I, Nitadori K (2007) Internals of the cunbody-1 library: particle/force decomposition and reduction. Princeton, NJGoogle Scholar
  23. 23.
    Makino J (2004) A Fast Parallel Treecode with GRAPE. Publ Astron Soc Japan 56(3):521–531. http://grape.astron.s.u-tokyo.ac.jp/ makino/softwares/pC++tree
  24. 24.
    Nitadori K, Makino J, Hut P (2006) Performance tuning of N-body codes on modern microprocessors: I. direct integration with a hermite scheme on x86_64 architecture. New Astron 12:169. http://www.citebase.org/abstract?id=oai:arXiv.org:astro-ph/0511062
  25. 25.
    Sengupta S, Harris M, Zhang Y, Owens JD (2007) Scan primitives for gpu computing. In: GH ’07: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, pp 97–106. Eurographics Association, Aire-la-Ville, SwitzerlandGoogle Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  • Tsuyoshi Hamada
    • 1
  • Keigo Nitadori
    • 2
  • Khaled Benkrid
    • 3
  • Yousuke Ohno
    • 4
  • Gentaro Morimoto
    • 4
  • Tomonari Masada
    • 1
  • Yuichiro Shibata
    • 1
  • Kiyoshi Oguri
    • 1
  • Makoto Taiji
    • 4
  1. 1.Faculty of Engineering, Department of Computer and Information SciencesNagasaki UniversityNagasakiJapan
  2. 2.Department of AstronomyUniversity of TokyoTokyoJapan
  3. 3.School of EngineeringThe University of EdinburghEdinburghUK
  4. 4.RIKEN (The Institute of Physical and Chemical Research)KanagawaJapan

Personalised recommendations