Multidimensional Performance and Scalability Analysis for Diverse Applications Based on System Monitoring Data
The availability of high performance computing resources enables us to perform very large numerical simulations and in this way to tackle challenging real life problems. At the same time, in order to efficiently utilize the computational power at our disposal, the ever growing complexity of the computer architecture poses high demands on the algorithms and their implementation.
Performing large scale high performance simulations can be done by utilizing available general libraries, writing libraries that suit particular classes of problems or developing software from scratch. Clearly, the possibilities to enhance the efficiency of the software tools in the three cases is very different, ranging from nearly impossible to full capacity. In this work we exemplify the efficiency of the three approaches on benchmark problems, using monitoring tools that provide a very rich spectrum of data on the performance of the applied codes as well as on the utilization of the supercomputer itself.
KeywordsSupercomputing application efficiency analysis Parallel program High-performance computing
The research work of the authors was partly supported by The Swedish Foundation for international Cooperation in Research and Higher Education (STINT) Initiation grant IB2016-6543, entitled ‘Large scale complex numerical simulations on large scale complex computer facilities - identifying performance and scalability issues’, 2016–2017.
The performance evaluation and all large scale tests are thanks to the access to the supercomputer Lomonosov-2 at the Research Computing Center of Lomonosov Moscow State University, Russia.
The results were obtained in the Lomonosov Moscow State University with the financial support of the Russian Science Foundation (agreement N 17-71-20114) in part of Chunks and Tasks model efficiency analysis (Sect. 4.3). The work on applications described in Sects. 4.2 and 4.4 was supported by the Russian Foundation for Basic Research (projects 16-07-01003 in part of scalability analysis, and project 17-07-00719 in part of system monitoring data management). This is hereby gratefully acknowledged.
Numerous valuable discussions with Emanuel H. Rubensson and Elias Rudberg as well as their contribution in correcting the paper are hereby also gratefully acknowledged.
- 1.Alexandrov, V., Esquivel-Flores, O., Ivanovska, S., Karaivanova, A.: On the preconditioned quasi-Monte Carlo algorithm for matrix computations. In: Lirkov, I., Margenov, S.D., Waśniewski, J. (eds.) LSSC 2015. LNCS, vol. 9374, pp. 163–171. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26520-9_17 CrossRefGoogle Scholar
- 2.Andreev, D.Y., Antonov, A.S., Voevodin, V.V., Zhumatiy, S.A., Nikitenko, D.A., Stefanov, K.S., Shvets, P.A.: A system for the automated finding of inefficiencies and errors in parallel programs. Comput. Methods Program.: New Comput. Technol. 14, 48–53 (2013)Google Scholar
- 6.Nikitenko, D., Stefanov, K., Zhumatiy, S., Voevodin, V., Teplov, A., Shvets, P.: System monitoring-based holistic resource utilization analysis for every user of a large HPC center. In: Carretero, J., et al. (eds.) ICA3PP 2016. LNCS, vol. 10049, pp. 305–318. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49956-7_24 CrossRefGoogle Scholar
- 7.Nikitenko, D.A., Voevodin, V.V., Voevodin, V.V., Zhumatiy, S.A., Stefanov, K.S., Teplov, A.M., Shvets, P.A.: Supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPC systems. In: 10th Annual International Scientific Conference on Parallel Computing Technologies, Arkhangelsk, Russian Federation, 29–31 March 2016, PCT 2016. CEUR Workshop Proceedings, vol. 1576, pp. 20–30 (2016)Google Scholar
- 8.Nikitenko, D.A., Adinets, A.V., Bryzgalov, P.A., Stefanov, K.S., Voevodin, V.V., Zhumatiy, S.A.: Job Digest - approach to analysis of application dynamic characteristics on supercomputer systems. Numer. Methods Program. 13, 160–166 (2012)Google Scholar
- 11.Rubensson, E.H., Rudberg, E.: CHT-MPI: an MPI-based Chunks and Tasks library implementation, version 1.2. http://www.chunks-and-tasks.org
- 14.Voevodin, V.V., Zhumatiy, S.A., Sobolev, S.I., Antonov, A.S., Bryzgalov, P.A., Nikitenko, D.A., Stefanov, K.S., Voevodin, V.V.: Practice of "Lomonosov" supercomputer. Open Syst. J. 7, 36–39 (2012)Google Scholar
- 15.Weidendorfer, J.: Sequential performance analysis with Callgrind and KCachegrind. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds.) Tools for High Performance Computing, pp. 93–113. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68564-7_7 CrossRefGoogle Scholar
- 17.Allinea. https://www.allinea.com/products/map
- 18.Deal.II. https://www.dealii.org
- 19.mpiP Profiling Tool. mpip.sourceforge.net/
- 20.Totalview for HPC. https://www.roguewave.com/products-services/totalview
- 21.The Trilinos Project. https://trilinos.org/