Direct N-body Code on Low-Power Embedded ARM GPUs

  • David GozEmail author
  • Sara BertoccoEmail author
  • Luca TornatoreEmail author
  • Giuliano TaffoniEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 997)


This work arises on the environment of the ExaNeSt project aiming at design and development of an exascale ready supercomputer with low energy consumption profile but able to support the most demanding scientific and technical applications. The ExaNeSt compute unit consists of densely-packed low-power 64-bit ARM processors, embedded within Xilinx FPGA SoCs. SoC boards are heterogeneous architecture where computing power is supplied both by CPUs and GPUs, and are emerging as a possible low-power and low-cost alternative to clusters based on traditional CPUs. A state-of-the-art direct N-body code suitable for astrophysical simulations has been re-engineered in order to exploit SoC heterogeneous platforms based on ARM CPUs and embedded GPUs. Performance tests show that embedded GPUs can be effectively used to accelerate real-life scientific calculations, and that are promising also because of their energy efficiency, which is a crucial design in future exascale platforms.


ExaNeSt HPC N-body solver ARM SoC GPU computing Parallel algorithms Heterogeneous architecture 



This work was carried out within the ExaNeSt (FET-HPC) project (grant no. 671553) and the ASTERICS project (grant no. 653477), funded by the European Union’s Horizon 2020 research and innovation programme.

This research has been made use of IPython [18], Scipy [10], Numpy [24] and MatPlotLib [9].


  1. 1.
    Ammendola, R., Biagioni, A., Cretaro, P., Frezza, O., Cicero, F.L., Lonardo, A., Martinelli, M., Paolucci, P.S., Pastorelli, E., Simula, F., Vicini, P., Taffoni, G., Pascual, J.A., Navaridas, J., Luján, M., Goodacree, J., Chrysos, N., Katevenis, M.: The next generation of Exascale-class systems: the ExaNeSt project. In: 2017 Euromicro Conference on Digital System Design (DSD), pp. 510–515, August 2017Google Scholar
  2. 2.
    Berczik, P., Nitadori, K., Zhong, S., Spurzem, R., Hamada, T., Wang, X., Berentzen, I., Veles, A., Ge, W.: High performance massively parallel direct N-body simulations on large GPU clusters. In: International conference on High Performance Computing, Kyiv, Ukraine, 8–10 October 2011, pp. 8–18, October 2011Google Scholar
  3. 3.
    Bertocco, S., Goz, D., Tornatore, L., Taffoni, G.: INCAS: INtensive Clustered ARM SoC - Cluster Deployment. INAF-OATs technical report, 222, August 2018Google Scholar
  4. 4.
    Bortolas, E., Gualandris, A., Dotti, M., Spera, M., Mapelli, M.: Brownian motion of massive black hole binaries and the final parsec problem. MNRAS 461, 1023–1031 (2016)CrossRefGoogle Scholar
  5. 5.
    Brodtkorb, A.R., Dyken, C., Hagen, T.R., Hjelmervik, J.M., Storaasli, O.O.: State-of-the-art in heterogeneous computing. Sci. Program. 18(1), 1–33 (2010)Google Scholar
  6. 6.
    Capuzzo-Dolcetta, R., Spera, M.: A performance comparison of different graphics processing units running direct N-body simulations. Comput. Phys. Commun. 184, 2528–2539 (2013)CrossRefGoogle Scholar
  7. 7.
    Capuzzo-Dolcetta, R., Spera, M., Punzo, D.: A fully parallel, high precision, N-body code running on hybrid computing platforms. J. Comput. Phys. 236, 580–593 (2013)CrossRefGoogle Scholar
  8. 8.
    Harfst, S., Gualandris, A., Merritt, D., Spurzem, R., Portegies Zwart, S., Berczik, P.: Performance analysis of direct N-body algorithms on special-purpose supercomputers. New Astron. 12, 357–377 (2007)CrossRefGoogle Scholar
  9. 9.
    Hunter, J.: Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9(3), 90–95 (2007)CrossRefGoogle Scholar
  10. 10.
    Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: open source scientific tools for Python (2001). Accessed 13 Sept 2015Google Scholar
  11. 11.
    Katevenis, M., Chrysos, N., Marazakis, M., et al.: The ExaNeSt project: interconnects, storage, and packaging for Exascale systems. In: 2016 Euromicro Conference on Digital System Design (DSD), pp. 60–67, August 2016Google Scholar
  12. 12.
    Katevenis, M., Ammendola, R., Biagioni, A., Cretaro, P., Frezza, O., Cicero, F.L., Lonardo, A., Martinelli, M., Paolucci, P.S., Pastorelli, E., Simula, F., Vicini, P., Taffoni, G., Pascual, J.A., Navaridas, J., LujÃn, M., Goodacre, J., Lietzow, B., Mouzakitis, A., Chrysos, N., Marazakis, M., Gorlani, P., Cozzini, S., Brandino, G.P., Koutsourakis, P., van Ruth, J., Zhang, Y., Kersten, M.: Next generation of Exascale-class systems: ExaNeSt project and the status of its interconnect and storage development. Microprocess. Microsyst. 61, 58–71 (2018)CrossRefGoogle Scholar
  13. 13.
    Konstantinidis, S., Kokkotas, K.D.: MYRIAD: a new N-body code for simulations of star clusters. Astron. Astrophys. 522, A70 (2010)CrossRefGoogle Scholar
  14. 14.
    Maghazeh, A., Bordoloi, U.D., Eles, P., Peng, Z.: General purpose computing on low-power embedded GPUs: has it come of age? In: 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), pp. 1–10, July 2013Google Scholar
  15. 15.
    Morganti, L., Cesini, D., Ferraro, A.: Evaluating systems on chip through HPC bioinformatic and astrophysic applications, pp. 541–544, February 2016Google Scholar
  16. 16.
    Nitadori, K., Aarseth, S.J.: Accelerating NBODY6 with graphics processing units. MNRAS 424, 545–552 (2012)CrossRefGoogle Scholar
  17. 17.
    Nitadori, K., Makino, J.: Sixth- and eighth-order Hermite integrator for N-body simulations. New Astron. 13, 498–507 (2008)CrossRefGoogle Scholar
  18. 18.
    Perez, F., Granger, B.: IPython: a system for interactive scientific computing. Comput. Sci. Eng. 9(3), 21–29 (2007)CrossRefGoogle Scholar
  19. 19.
    Sirowy, S., Forin, A.: Where’s the beef? Why FPGAs are so fast. Technical report, September 2008Google Scholar
  20. 20.
    Spera, M.: Using Graphics Processing Units to solve the classical N-body problem in physics and astrophysics. ArXiv e-prints, November 2014Google Scholar
  21. 21.
    Spera, M., Capuzzo-Dolcetta, R.: Rapid mass segregation in small stellar clusters. ArXiv e-prints, January 2015Google Scholar
  22. 22.
    Spera, M., Mapelli, M., Bressan, A.: The mass spectrum of compact remnants from the PARSEC stellar evolution tracks. MNRAS 451, 4086–4103 (2015)CrossRefGoogle Scholar
  23. 23.
    Thall, A.: Extended-precision floating-point numbers for GPU computation, p. 52, January 2006Google Scholar
  24. 24.
    van der Walt, S., Colbert, S., Varoquaux, G.: The NumPy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13(2), 22–30 (2011)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.INAF - OATsTriesteItaly

Personalised recommendations