Early Performance Assessment of the ThunderX2 Processor for Lattice Based Simulations

  • Enrico Calore
  • Alessandro Gabbana
  • Fabio Rinaldi
  • Sebastiano Fabio SchifanoEmail author
  • Raffaele Tripiccione
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12043)


This paper presents an early performance assessment of the ThunderX2, the most recent Arm-based multi-core processor designed for HPC applications. We use as benchmarks well known stencil-based LBM and LQCD algorithms, widely used to study respectively fluid flows, and interaction properties of elementary particles. We run benchmark kernels derived from OpenMP production codes, we measure performance as a function of the number of threads, and evaluate the impact of different choices for data layout. We then analyze our results in the framework of the roofline model, and compare with the performances measured on mainstream Intel Skylake processors. We find that these Arm based processors reach levels of performance competitive with those of other state-of-the-art options.


ThunderX2 Lattice-Boltzmann Lattice-QCD 



This work has been done in the framework of the COKA, and COSA projects funded by INFN. We would like to thank CINECA (Italy) and Università di Ferrara for access to their HPC systems. All runs on the ThunderX2 have been performed on computational resources provided and supported by E4 Computer Engineering and installed at CINECA.


  1. 1.
    Pruitt, D.D., Freudenthal, E.A.: Preliminary investigation of mobile system features potentially relevant to HPC. In: 2016 4th International Workshop on Energy Efficient Supercomputing (E2SC), pp. 54–60, November 2016.
  2. 2.
    Calore, E., Mantovani, F., Ruiz, D.: Advanced performance analysis of HPC workloads on Cavium ThunderX. In: 2018 International Conference on High Performance Computing Simulation (HPCS), pp. 375–382, July 2018.
  3. 3.
    Fürlinger, K., Klausecker, C., Kranzlmüller, D.: Towards energy efficient parallel computing on consumer electronic devices. In: Kranzlmüller, D., Toja, A.M. (eds.) ICT-GLOW 2011. LNCS, vol. 6868, pp. 1–9. Springer, Heidelberg (2011). Scholar
  4. 4.
    Rajovic, N., et al.: The Mont-Blanc prototype: an alternative approach for HPC systems. In: SC 2016: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 444–455, November 2016Google Scholar
  5. 5.
    Yokoyama, D., Schulze, B., Borges, F., Mc Evoy, G.: The survey on arm processors for HPC. J. Supercomput. 75(10), 7003–7036 (2019). Scholar
  6. 6.
    Oyarzun, G., Borrell, R., Gorobets, A., Mantovani, F., Oliva, A.: Efficient CFD code implementation for the ARM-based Mont-Blanc architecture. Future Gener. Comput. Syst. 79, 786–796 (2018). Scholar
  7. 7.
    Stegailov, V., Smirnov, G., Vecher, V.: Vasp hits the memory wall: processors efficiency comparison. Concurr. Comput. Pract. Exp. 31(19), e5136 (2019). Scholar
  8. 8.
    Hammond, S., et al.: Evaluating the Marvell ThunderX2 Server Processor for HPC Workloads (2019).
  9. 9.
    Banchelli, F., et al.: MB3 D6.9 - performance analysis of applications and mini-applications and benchmarking on the project test platforms. Technical report, Mont-Blanc Project, Version 1.0 (2019)Google Scholar
  10. 10.
    McIntosh-Smith, S., Price, J., Deakin, T., Poenaru, A.: A performance analysis of the first generation of HPC-optimized arm processors. Concurr. Comput. Pract. Exp., e5110 (2018).
  11. 11.
    Ciznicki, M., Kurowski, K., Weglarz, J.: Energy aware scheduling model and online heuristics for stencil codes on heterogeneous computing architectures. Cluster Comput. 20(3), 2535–2549 (2017). Scholar
  12. 12.
    Yount, C., Tobin, J., Breuer, A., Duran, A.: Yask—yet another stencil kernel: a framework for HPC stencil code-generation and tuning. In: Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for HPC (WOLFHPC), pp. 30–39 (2016).
  13. 13.
    Pereira, A.D., Ramos, L., Góes, L.F.W.: PSkel: a stencil programming framework for CPU-GPU systems. Concurr. Comput. Pract. Exp. 27(17), 4938–4953 (2015). Scholar
  14. 14.
    Calore, E., Gabbana, A., Schifano, S.F., Tripiccione, R.: Optimization of lattice Boltzmann simulations on heterogeneous computers. Int. J. High Perform. Comput. Appl., 1–16 (2017).
  15. 15.
    Bonati, C., et al.: Portable multi-node LQCD Monte Carlo simulations using OpenACC. Int. J. Mod. Phys. C 29(1) (2018).
  16. 16.
    Shet, A.G., et al.: On vectorization for lattice based simulations. Int. J. Mod. Phys. C 24 (2013).
  17. 17.
    Joó, B., Kalamkar, D.D., Kurth, T., Vaidyanathan, K., Walden, A.: Optimizing Wilson-Dirac operator and linear solvers for Intel® KNL. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) ISC High Performance 2016. LNCS, vol. 9945, pp. 415–427. Springer, Cham (2016). Scholar
  18. 18.
    Calore, E., Gabbana, A., Schifano, S.F., Tripiccione, R.: Early experience on using Knights Landing processors for Lattice Boltzmann applications. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds.) PPAM 2017. LNCS, vol. 10777, pp. 519–530. Springer, Cham (2018). Scholar
  19. 19.
    Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). Scholar
  20. 20.
    Gwennap, L.: ThunderX rattles server market. Microproc. Rep. 29(6), 1–4 (2014)Google Scholar
  21. 21.
    McCalpin, J.D.: Stream: sustainable memory bandwidth in high performance computers (2019). Accessed 14 Apr 2019
  22. 22.
    Marvell: ThunderX2 arm-based processors (2019). Accessed 18 Apr 2019
  23. 23.
    Biferale, L., Mantovani, F., Sbragaglia, M., Scagliarini, A., Toschi, F., Tripiccione, R.: Second-order closure in stratified turbulence: simulations and modeling of bulk and entrainment regions. Phys. Rev. E 84(1), 016305 (2011). Scholar
  24. 24.
    Calore, E., Gabbana, A., Kraus, J., Schifano, S.F., Tripiccione, R.: Performance and portability of accelerated lattice Boltzmann applications with OpenACC. Concurr. Comput. Pract. Exp. 28(12), 3485–3502 (2016). Scholar
  25. 25.
    DeGrand, T., DeTar, C.: Lattice Methods for Quantum ChromoDynamics. World Scientific (2006).
  26. 26.
    Bonati, C., et al.: Design and optimization of a portable LQCD Monte Carlo code using OpenACC. Int. J. Mod. Phys. C 28(5) (2017).
  27. 27.
    Bonati, C., et al.: Early experience on running OpenStaPLE on DAVIDE. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds.) ISC High Performance 2018. LNCS, vol. 11203, pp. 387–401. Springer, Cham (2018). Scholar
  28. 28.
    Lo, Y.J., et al.: Roofline model toolkit: a practical tool for architectural and program analysis. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 129–148. Springer, Cham (2015). Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Enrico Calore
    • 2
  • Alessandro Gabbana
    • 1
    • 2
  • Fabio Rinaldi
    • 1
  • Sebastiano Fabio Schifano
    • 1
    • 2
    Email author
  • Raffaele Tripiccione
    • 1
    • 2
  1. 1.Università degli Studi di FerraraFerraraItaly
  2. 2.INFN Sezione di FerraraFerraraItaly

Personalised recommendations