Advertisement

Event Flow Graphs for MPI Performance Monitoring and Analysis

  • Xavier AguilarEmail author
  • Karl Fürlinger
  • Erwin Laure
Conference paper

Abstract

Classical performance analysis methodologies use either execution traces with fine-grained data or profiles with aggregates. Event traces provide the finest level of detail on application behavior, however, they are infeasible at extreme scales due to the huge amount of information they contain. In contrast, profiles are much more scalable but they lack the temporal order between events. In this paper, we present the use of event flow graphs for performance characterization of MPI applications. Event flow graphs capture statistics on the events performed by the application while preserving the temporal order of such events. Thus, they stand in between tracing and profiling, and are a good complement for classical performance analysis approaches such as event traces or profiles. Furthermore, event flow graphs can be used for means other than visual exploration of performance data. For example, graphs can be utilized as compressed representations of event traces, or to automatically detect the program structure of iterative applications at runtime without any source code analysis.

References

  1. 1.
    Adhianto L, Banerjee S, Fagan M, Krentel M, Marin G, Mellor-Crummey J, Tallent NR (2010) Hpctoolkit: Tools for performance analysis of optimized parallel programs. Concurr Comput Pract Exp. 22(6):685–701Google Scholar
  2. 2.
    Aguilar X, Fürlinger K, Laure E (2014) MPI trace compression using event flow graphs. In: Proceedings of the Euro-Par 2014 parallel processing, Springer, Heidelberg, pp 1–12Google Scholar
  3. 3.
    Aguilar X, Fürlinger K, Laure E (2015) Automatic on-line detection of MPI application structure with event flow graphs. In: Proceedings of the Euro-Par 2015: Parallel processing, Springer, Heidelberg, pp 70–81Google Scholar
  4. 4.
    Aguilar X, Fürlinger K, Laure E (2015) Visual MPI performance analysis using event flow graphs. Procedia Comput. Sci. 51(0), 1353–1362, In: Proceedings of the international conference on computational science ICCS 2015, Computational science at the gates of natureGoogle Scholar
  5. 5.
    Aguilar X, Laure E, Furlinger K (2013) Online performance data introspection with ipm. In: Proceedings of the high performance computing and communications & IEEE international conference on embedded and ubiquitous computing (HPCC_EUC), IEEE 10th international conference, IEEE, pp 728–734Google Scholar
  6. 6.
    Bailey DH, Barszcz E, Barton JT, Browning DS, Carter RL, Dagum L, Fatoohi RA, Frederickson, PO, Lasinski TA, Schreiber RS et al (1991) The nas parallel benchmarks. Int J High Perform Comput Appl 5(3):63–73Google Scholar
  7. 7.
    Casas M, Badia RM, Labarta J (2010) Automatic phase detection and structure extraction of MPI applications. Int J High Perform Comput Appl 24(3):335–360Google Scholar
  8. 8.
    Fürlinger K, Skinner D (2010) Capturing and visualizing event flow graphs of MPI applications. In: Proceedings of the Euro-Par 2009–parallel processing workshops. Springer, Heidelberg, pp 218–227Google Scholar
  9. 9.
    Geimer M, Wolf F, Wylie BJ, Ábrahám E, Becker D, Mohr B (2010) The scalasca performance toolset architecture. Concurr Comput Pract. Exp 22(6):702–719Google Scholar
  10. 10.
    Gonzalez J, Gimenez J, Labarta J (2009) Automatic detection of parallel applications computation phases. In: Proceedings of the parallel distributed processing on IEEE international symposium IDPDS 2009, pp 1–11Google Scholar
  11. 11.
    Havlak P (1997) Nesting of reducible and irreducible loops. ACM Trans Progr Lang Syst (TOPLAS) 19(4):557–567Google Scholar
  12. 12.
    IPM WWW site: http://www.ipm2.org
  13. 13.
    Knupfer A, Nagel WE (2005) Construction and compression of complete call graphs for post-mortem program trace analysis. In: Proceedings of the international conference on parallel Processing ICPP 2005, IEEE, pp 165–172Google Scholar
  14. 14.
    Krishnamoorthy S, Agarwal K (2010) Scalable communication trace compression. In: Proceedings of the 2010 10th IEEE/ACM international conference on cluster, cloud and grid computing. IEEE Computer Society, pp 408–417Google Scholar
  15. 15.
    Llort G, Gonzalez J, Servat H, Gimenez J, Labarta J (2010) On-line detection of large-scale parallel application’s structure. In: Proceedings of the IEEE international symposium on parallel distributed processing (IPDPS), pp 1–10Google Scholar
  16. 16.
    Mucci PJ, Browne S, Deane C, Ho G (1999) Papi: a portable interface to hardware performance counters. In: Proceedings of the department of defense HPCMP users group conference, pp 7–10Google Scholar
  17. 17.
    Nagel WE, Arnold A, Weber M, Hoppe HC, Solchenbach K (1996) Vampir: visualization and analysis of mpi resourcesGoogle Scholar
  18. 18.
    Noeth M, Ratn P, Mueller F, Schulz M, de Supinski BR (2009) Scalatrace: scalable compression and replay of communication traces for high-performance computing. J Parallel Distrib Comput 69(8):696–710Google Scholar
  19. 19.
    Pillet V, Labarta J, Cortes T, Girona S (1995) Paraver: a tool to visualize and analyze parallel code. In: Proceedings of WoTUG-18: transputer and occam developments. vol 44, pp 17–31Google Scholar
  20. 20.
    Ramalingam G (1999) Identifying loops in almost linear time. ACM Trans Program Lang Syst (TOPLAS) 21(2):175–188Google Scholar
  21. 21.
    Shende SS, Malony AD (2006) The tau parallel performance system. Int J High Perform Comput Appl 20(2):287–311Google Scholar
  22. 22.
    Sreedhar VC, Gao GR, Lee YF (1996) Identifying loops using DJ graphs. ACM Trans Program Lang Syst (TOPLAS) 18(6):649–658Google Scholar
  23. 23.
    Wei T, Mao J, Zou W, Chen Y (2007) A new algorithm for identifying loops in decompilation. In: Nielson H, File G (eds) Static analysis. Lecture notes in computer science, vol 4634. Springer, Heidelberg, pp 170–183Google Scholar
  24. 24.
    Zaki O, Lusk E, Gropp W, Swider D (1999) Toward scalable performance visualization with jumpshot. Int J High Perform Comput Appl 13(3):277–288Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.High Performance Computing and Visualization Department (HPCViz), and Swedish E-Science Research Center (SeRC)KTH Royal Institute of TechnologyStockholmSweden
  2. 2.Computer Science Department, MNM TeamLudwig-Maximilians-Universität (LMU) MunichMunichGermany

Personalised recommendations