Advertisement

Multidimensional Performance and Scalability Analysis for Diverse Applications Based on System Monitoring Data

  • Maya Neytcheva
  • Sverker Holmgren
  • Jonathan Bull
  • Ali Dorostkar
  • Anastasia Kruchinina
  • Dmitry NikitenkoEmail author
  • Nina Popova
  • Pavel Shvets
  • Alexey Teplov
  • Vadim Voevodin
  • Vladimir Voevodin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10777)

Abstract

The availability of high performance computing resources enables us to perform very large numerical simulations and in this way to tackle challenging real life problems. At the same time, in order to efficiently utilize the computational power at our disposal, the ever growing complexity of the computer architecture poses high demands on the algorithms and their implementation.

Performing large scale high performance simulations can be done by utilizing available general libraries, writing libraries that suit particular classes of problems or developing software from scratch. Clearly, the possibilities to enhance the efficiency of the software tools in the three cases is very different, ranging from nearly impossible to full capacity. In this work we exemplify the efficiency of the three approaches on benchmark problems, using monitoring tools that provide a very rich spectrum of data on the performance of the applied codes as well as on the utilization of the supercomputer itself.

Keywords

Supercomputing application efficiency analysis Parallel program High-performance computing 

Notes

Acknowledgements

The research work of the authors was partly supported by The Swedish Foundation for international Cooperation in Research and Higher Education (STINT) Initiation grant IB2016-6543, entitled ‘Large scale complex numerical simulations on large scale complex computer facilities - identifying performance and scalability issues’, 2016–2017.

The performance evaluation and all large scale tests are thanks to the access to the supercomputer Lomonosov-2 at the Research Computing Center of Lomonosov Moscow State University, Russia.

The results were obtained in the Lomonosov Moscow State University with the financial support of the Russian Science Foundation (agreement N 17-71-20114) in part of Chunks and Tasks model efficiency analysis (Sect. 4.3). The work on applications described in Sects. 4.2 and 4.4 was supported by the Russian Foundation for Basic Research (projects 16-07-01003 in part of scalability analysis, and project 17-07-00719 in part of system monitoring data management). This is hereby gratefully acknowledged.

Numerous valuable discussions with Emanuel H. Rubensson and Elias Rudberg as well as their contribution in correcting the paper are hereby also gratefully acknowledged.

References

  1. 1.
    Alexandrov, V., Esquivel-Flores, O., Ivanovska, S., Karaivanova, A.: On the preconditioned quasi-Monte Carlo algorithm for matrix computations. In: Lirkov, I., Margenov, S.D., Waśniewski, J. (eds.) LSSC 2015. LNCS, vol. 9374, pp. 163–171. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-26520-9_17 CrossRefGoogle Scholar
  2. 2.
    Andreev, D.Y., Antonov, A.S., Voevodin, V.V., Zhumatiy, S.A., Nikitenko, D.A., Stefanov, K.S., Shvets, P.A.: A system for the automated finding of inefficiencies and errors in parallel programs. Comput. Methods Program.: New Comput. Technol. 14, 48–53 (2013)Google Scholar
  3. 3.
    Antonov, A., Teplov, A.: Generalized approach to scalability analysis of parallel applications. In: Carretero, J., et al. (eds.) ICA3PP 2016. LNCS, vol. 10049, pp. 291–304. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49956-7_23 CrossRefGoogle Scholar
  4. 4.
    Dorostkar, A., Neytcheva, M., Lund, B.: Numerical and computational aspects of some block-preconditioners for saddle point systems. Parallel Comput. 49, 164–178 (2015).  https://doi.org/10.1016/j.parco.2015.06.003 MathSciNetCrossRefGoogle Scholar
  5. 5.
    Koufaty, D., Marr, D.: Hyper-threading technology in the netburst microarchitecture. IEEE Micro 23, 56–65 (2003). ISSN 0272-1732CrossRefGoogle Scholar
  6. 6.
    Nikitenko, D., Stefanov, K., Zhumatiy, S., Voevodin, V., Teplov, A., Shvets, P.: System monitoring-based holistic resource utilization analysis for every user of a large HPC center. In: Carretero, J., et al. (eds.) ICA3PP 2016. LNCS, vol. 10049, pp. 305–318. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49956-7_24 CrossRefGoogle Scholar
  7. 7.
    Nikitenko, D.A., Voevodin, V.V., Voevodin, V.V., Zhumatiy, S.A., Stefanov, K.S., Teplov, A.M., Shvets, P.A.: Supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPC systems. In: 10th Annual International Scientific Conference on Parallel Computing Technologies, Arkhangelsk, Russian Federation, 29–31 March 2016, PCT 2016. CEUR Workshop Proceedings, vol. 1576, pp. 20–30 (2016)Google Scholar
  8. 8.
    Nikitenko, D.A., Adinets, A.V., Bryzgalov, P.A., Stefanov, K.S., Voevodin, V.V., Zhumatiy, S.A.: Job Digest - approach to analysis of application dynamic characteristics on supercomputer systems. Numer. Methods Program. 13, 160–166 (2012)Google Scholar
  9. 9.
    Rubensson, E.H., Rudberg, E.: Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model. Parallel Comput. 57, 87–106 (2016)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Rubensson, E.H., Rudberg, E.: Chunks and Tasks: a programming model for parallelization of dynamic algorithms. Parallel Comput. 40, 328–343 (2014)CrossRefGoogle Scholar
  11. 11.
    Rubensson, E.H., Rudberg, E.: CHT-MPI: an MPI-based Chunks and Tasks library implementation, version 1.2. http://www.chunks-and-tasks.org
  12. 12.
    Bowler, D.R., Miyazaki, T.: \(O(N)\) methods in electronic structure calculations. Rep. Prog. Phys. 75, 036503 (2012).  https://doi.org/10.1088/0034-4885/75/3/036503 CrossRefGoogle Scholar
  13. 13.
    Voevodin, V., Voevodin, V.: Efficiency of exascale supercomputer centers and supercomputing education. In: Gitler, I., Klapp, J. (eds.) ISUM 2015. CCIS, vol. 595, pp. 14–23. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-32243-8_2 CrossRefGoogle Scholar
  14. 14.
    Voevodin, V.V., Zhumatiy, S.A., Sobolev, S.I., Antonov, A.S., Bryzgalov, P.A., Nikitenko, D.A., Stefanov, K.S., Voevodin, V.V.: Practice of "Lomonosov" supercomputer. Open Syst. J. 7, 36–39 (2012)Google Scholar
  15. 15.
    Weidendorfer, J.: Sequential performance analysis with Callgrind and KCachegrind. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds.) Tools for High Performance Computing, pp. 93–113. Springer, Berlin, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-68564-7_7 CrossRefGoogle Scholar
  16. 16.
    Karypis, G., Kumar, V.: A fast and highly quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1999)CrossRefzbMATHGoogle Scholar
  17. 17.
  18. 18.
  19. 19.
    mpiP Profiling Tool. mpip.sourceforge.net/
  20. 20.
  21. 21.
    The Trilinos Project. https://trilinos.org/

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Maya Neytcheva
    • 1
  • Sverker Holmgren
    • 1
  • Jonathan Bull
    • 1
  • Ali Dorostkar
    • 1
  • Anastasia Kruchinina
    • 1
  • Dmitry Nikitenko
    • 2
    Email author
  • Nina Popova
    • 2
  • Pavel Shvets
    • 2
  • Alexey Teplov
    • 2
  • Vadim Voevodin
    • 2
  • Vladimir Voevodin
    • 2
  1. 1.Department of Information TechnologyUppsala UniversityUppsalaSweden
  2. 2.Research Computing CenterLomonosov Moscow State UniversityMoscowRussia

Personalised recommendations