Advertisement

Early Performance Evaluation of the Hybrid Cluster with Torus Interconnect Aimed at Molecular-Dynamics Simulations

  • Vladimir Stegailov
  • Alexander Agarkov
  • Sergey Biryukov
  • Timur Ismagilov
  • Mikhail Khalilov
  • Nikolay Kondratyuk
  • Evgeny Kushtanov
  • Dmitry Makagon
  • Anatoly Mukosey
  • Alexander Semenov
  • Alexey Simonov
  • Alexey Timofeev
  • Vyacheslav Vecher
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10777)

Abstract

In this paper, we describe the Desmos cluster that consists of 32 hybrid nodes connected by a low-latency high-bandwidth torus interconnect. This cluster is aimed at cost-effective classical molecular dynamics calculations. We present strong scaling benchmarks for GROMACS, LAMMPS and VASP and compare the results with other HPC systems. This cluster serves as a test bed for the Angara interconnect that supports 3D and 4D torus network topologies, and verifies its ability to unite MPP systems speeding-up effectively MPI-based applications. We describe the interconnect presenting typical MPI benchmarks.

Keywords

Atomistic simulations Angara interconnect GPU 

Notes

Acknowledgments

The JIHT team was supported by the Russian Science Foundation (grant No. 14-50-00124). Their work included the development of the Desmos cluster architecture, tuning of the codes and benchmarking (HSE and MIPT provided preliminary support). The NICEVT team developed the Angara interconnect and its low-level software stack, built and tuned the Desmos cluster.

The authors are grateful to Dr. Maciej Cytowski and Dr. Jacek Peichota (ICM, University of Warsaw) for the data on the VASP benchmark [34].

References

  1. 1.
    Heinecke, A., Eckhardt, W., Horsch, M., Bungartz, H.-J.: Supercomputing for Molecular Dynamics Simulations. Springer, Heidelberg (2015).  https://doi.org/10.1007/978-3-319-17148-7 CrossRefGoogle Scholar
  2. 2.
    Eckhardt, W., Heinecke, A., Bader, R., Brehm, M., Hammer, N., Huber, H., Kleinhenz, H.-G., Vrabec, J., Hasse, H., Horsch, M., Bernreuther, M., Glass, C.W., Niethammer, C., Bode, A., Bungartz, H.-J.: 591 TFLOPS multi-trillion particles simulation on SuperMUC. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 1–12. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-38750-0_1 CrossRefGoogle Scholar
  3. 3.
    Piana, S., Klepeis, J.L., Shaw, D.E.: Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations. Curr. Opin. Struct. Biol. 24, 98–105 (2014)CrossRefGoogle Scholar
  4. 4.
    Begau, C., Sutmann, G.: Adaptive dynamic load-balancing with irregular domain decomposition for particle simulations. Comput. Phys. Commun. 190, 51–61 (2015)CrossRefGoogle Scholar
  5. 5.
    Smirnov, G.S., Stegailov, V.V.: Efficiency of classical molecular dynamics algorithms on supercomputers. Math. Models Comput. Simul. 8(6), 734–743 (2016)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Stegailov, V.V., Orekhov, N.D., Smirnov, G.S.: HPC hardware efficiency for quantum and classical molecular dynamics. In: Malyshkin, V. (ed.) PaCT 2015. LNCS, vol. 9251, pp. 469–473. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-21909-7_45 CrossRefGoogle Scholar
  7. 7.
    Rojek, K., Wyrzykowski, R., Kuczynski, L.: Systematic adaptation of stencil-based 3D MPDATA to GPU architectures. Concurr. Comput. Pract. Exp. 29, e3970 (2016)CrossRefGoogle Scholar
  8. 8.
    Berendsen, H.J.C., van der Spoel, D., van Drunen, R.: Gromacs: a message-passing parallel molecular dynamics implementation. Comput. Phys. Commun. 91(13), 43–56 (1995)CrossRefGoogle Scholar
  9. 9.
    Plimpton, S.: Fast parallel algorithms for short-range molecular dynamics. J. Comput. Phys. 117(1), 1–19 (1995)CrossRefzbMATHGoogle Scholar
  10. 10.
    Trott, C.R., Winterfeld, L., Crozier, P.S.: General-purpose molecular dynamics simulations on GPU-based clusters. ArXiv e-prints (2010)Google Scholar
  11. 11.
    Brown, W.M., Wang, P., Plimpton, S.J., Tharrington, A.N.: Implementing molecular dynamics on hybrid high performance computers - short range forces. Comput. Phys. Commun. 182(4), 898–911 (2011)CrossRefzbMATHGoogle Scholar
  12. 12.
    Brown, W.M., Wang, P., Plimpton, S.J., Tharrington, A.N.: Implementing molecular dynamics on hybrid high performance computers - Particle-particle particle-mesh. Comput. Phys. Commun. 183(3), 449–459 (2012)CrossRefGoogle Scholar
  13. 13.
    Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014). Domain-specific languages and high-level frameworks for high-performance computingCrossRefGoogle Scholar
  14. 14.
    Abraham, M.J., Murtola, T., Schulz, R., Páll, S., Smith, J.C., Hess, B., Lindahl, E.: Gromacs: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 12, 19–25 (2015)CrossRefGoogle Scholar
  15. 15.
    Ohmura, I., Morimoto, G., Ohno, Y., Hasegawa, A., Taiji, M.: MDGRAPE-4: a special-purpose computer system for molecular dynamics simulations. Philos. Trans. R. Soc. Lond. Math. Phys. Eng. Sci. 372, 2014 (2021)Google Scholar
  16. 16.
    Kutzner, C., Pall, S., Fechner, M., Esztermann, A., de Groot, B.L., Grubmuller, H.: Best bang for your buck: GPU nodes for GROMACS biomolecular simulations. J. Comput. Chem. 36(26), 1990–2008 (2015)CrossRefGoogle Scholar
  17. 17.
    Scott, S.L., Thorson, G.M.: The Cray T3E network: adaptive routing in a high performance 3D torus. In: HOT Interconnects IV, Stanford University, 15–16 Aug 1996Google Scholar
  18. 18.
    Adiga, N.R., Blumrich, M.A., Chen, D., Coteus, P., Gara, A., Giampapa, M.E., Heidelberger, P., Singh, S., Steinmacher-Burow, B.D., Takken, T., Tsao, M., Vranas, P.: Blue Gene/L torus interconnection network. IBM J. Res. Dev. 49(2), 265–276 (2005)CrossRefGoogle Scholar
  19. 19.
    Ajima, Y., Inoue, T., Hiramoto, S., Takagi, Y., Shimizu, T.: The Tofu interconnect. IEEE Micro 32(1), 21–31 (2012)CrossRefGoogle Scholar
  20. 20.
    Neuwirth, S., Frey, D., Nuessle, M., Bruening, U.: Scalable communication architecture for network-attached accelerators. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 627–638, February 2015Google Scholar
  21. 21.
    Elizarov, G.S., Gorbunov, V.S., Levin, V.K., Latsis, A.O., Korneev, V.V., Sokolov, A.A., Andryushin, D.V., Klimov, Y.A.: Communication fabric MVS-Express. Vychisl. Metody Programm. 13(3), 103–109 (2012)Google Scholar
  22. 22.
    Adamovich, I.A., Klimov, A.V., Klimov, Y.A., Orlov, A.Y., Shvorin, A.B.: Thoughts on the development of SKIF-Aurora supercomputer interconnect. Programmnye Sistemy: Teoriya i Prilozheniya 1(3), 107–123 (2010)Google Scholar
  23. 23.
    Klimov, Y.A., Shvorin, A.B., Khrenov, A.Y., Adamovich, I.A., Orlov, A.Y., Abramov, S.M., Shevchuk, Y.V., Ponomarev, A.Y.: Pautina: the high performance interconnect. Programmnye Sistemy: Teoriya i Prilozheniya 6(1), 109–120 (2015)Google Scholar
  24. 24.
    Korzh, A.A., Makagon, D.V., Borodin, A.A., Zhabin, I.A., Kushtanov, E.R., Syromyatnikov, E.L., Cheryomushkina, E.V.: Russian 3D-torus interconnect with globally addressable memory support. Vestnik YuUrGU. Ser. Mat. Model. Progr. 6, 41–53 (2010)Google Scholar
  25. 25.
    Mukosey, A.V., Semenov, A.S., Simonov, A.S.: Simulation of collective operations hardware support for Angara interconnect. Vestn. YuUrGU. Ser. Vych. Mat. Inf. 4(3), 40–55 (2015)Google Scholar
  26. 26.
    Agarkov, A.A., Ismagilov, T.F., Makagon, D.V., Semenov, A.S., Simonov, A.S.: Performance evaluation of the Angara interconnect. In: Proceedings of the International Conference “Russian Supercomputing Days” – 2016, pp. 626–639 (2016)Google Scholar
  27. 27.
    Corsetti, F.: Performance analysis of electronic structure codes on HPC systems: a case study of SIESTA. PLoS ONE 9(4), 1–8 (2014)CrossRefGoogle Scholar
  28. 28.
    Haque, I.S., Pande, V.S.: Hard data on soft errors: a large-scale assessment of real-world error rates in GPGPU. In Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, CCGRID 2010, pp. 691–696. IEEE Computer Society, Washington (2010)Google Scholar
  29. 29.
    Puente, V., Beivide, R., Gregorio, J.A., Prellezo, J.M., Duato, J., Izu, C.: Adaptive bubble router: a design to improve performance in torus networks. In: Proceedings of the 1999 International Conference on Parallel Processing, pp. 58–67 (1999)Google Scholar
  30. 30.
    Hoefler, T., Snir, M.: Generic topology mapping strategies for large-scale parallel architectures. In: Proceedings of the International Conference on Supercomputing, ICS 2011, pp. 75–84. ACM, New York (2011)Google Scholar
  31. 31.
    Höhnerbach, M., Ismail, A.E., Bientinesi, P.: The vectorization of the Tersoff multi-body potential: an exercise in performance portability. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 7:1–7:13. IEEE Press, Piscataway (2016)Google Scholar
  32. 32.
    Bethune, I.: Ab Initio Molecular Dynamics. Introduction to Molecular Dynamics on ARCHER (2015)Google Scholar
  33. 33.
    Max Hutchinson. VASP on GPUs. When and how. In: GPU Technology Theater, SC15 (2015)Google Scholar
  34. 34.
    Cytowski, M.: Best practice guide – IBM power 775. In: PRACE (2013)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Joint Institute for High Temperatures of RASMoscowRussia
  2. 2.Moscow Institute of Physics and TechnologyDolgoprudnyRussia
  3. 3.National Research University Higher School of EconomicsMoscowRussia
  4. 4.JSC NICEVTMoscowRussia

Personalised recommendations