PAMDA: Performance Assessment Using MAQAO Toolset and Differential Analysis

  • Zakaria Bendifallah
  • William JalbyEmail author
  • José Noudohouenou
  • Emmanuel Oseret
  • Vincent Palomares
  • Andres Charif Rubial
Conference paper


Identifying performance bottlenecks in applications is crucial to improve their efficiency, but it might be difficult to precisely assess their impact on performance: in particular, two performance problems can interact making it difficult to isolate and therefore to correct them. We propose PAMDA, a methodology to single out performance problems through hierarchical bottlenecks detection. Important potential performance issues are classified in a ‘Performance Breakdown Tree’ which is used to drive our iterative analysis cycle, prioritizing the most relevant problems. Our system relies on MAQAO toolset and code’s differential analysis. While MAQAO is a performance analysis and optimization tool suite, the differential analysis approach, which is implemented through DECAN tool, consists in quantifying performance changes when applying controlled transformations to the target code. Our focus will be on performance issues raised by processors and memory sub-systems in multicore architectures. We will demonstrate the approach on loops extracted from real life HPC applications.


Access Pattern Memory Hierarchy Performance Bottleneck Memory Subsystem Reverse Time Migration 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We would like to thank Michel Masella for the access to his POLARIS(MD) code and Henri Calandra and Asma Farjallah for the access to the RTM code.

This work has been carried out by the Exascale Computing Research laboratory, thanks to the support of CEA, GENCI, Intel, UVSQ, and by the PRiSM laboratory, thanks to the support of the French Ministry for Economy, Industry, and Employment throught the PERFCLOUD project. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the CEA, GENCI, Intel, or UVSQ.


  1. 1.
  2. 2.
    Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCToolkit: tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exp. 22(6), 685–701 (2010).
  3. 3.
    Alam, S.R., Barrett, R.F., Kuehn, J.A., Roth, P.C., Vetter, J.S.: Characterization of scientific workloads on systems with multi-core processors. In: IISWC, San Jose, pp. 225–236 (2006)Google Scholar
  4. 4.
    Barthou, D., Rubial, A.C., Jalby, W., Koliai, S., Valensi, C.: Performance tuning of x86 OpenMP codes with MAQAO. In: Parallel Tools Workshop, Dresden. Springer (2009)Google Scholar
  5. 5.
    Baysal, E., Kosloff, D., Sherwood, J.: Reverse time migration. Geophysics 48, 1514–1524 (1983)CrossRefGoogle Scholar
  6. 6.
    Beyler, J.C., Triquenaux, N., Palomares, V., Chabane, F., Fighiera, T., Halimi, J.P., Jalby, W.: MicroTools: automating program generation and performance measurement. In: ICPPW, Pittsburgh, pp. 424–433. IEEE (2012)Google Scholar
  7. 7.
    Burtscher, M., Kim, B.D., Diamond, J.R., McCalpin, J.D., Koesterke, L., Browne, J.C.: PerfExpert: an easy-to-use performance diagnosis tool for HPC applications. In: SC, New Orleans, pp. 1–11. IEEE (2010)Google Scholar
  8. 8.
    Charif-Rubial, A.S.: On code performance analysis and optimisation for multicore architectures. Ph.D. thesis (2012).
  9. 9.
    Charif-Rubial, A.S., Barthou, D., Valensi, C., Shende, S.S., Malony, A.D., William Jalby, I.P.: MIL: a language to build program analysis tools through static binary instrumentation. In: HiPC’13, Hyderabad (2013)Google Scholar
  10. 10.
    Geimer, M., Wolf, F., Wylie, B.J.N., Ábrahàm, E., Becker, D., Mohr, B.: The SCALASCA performance toolset architecture. In: STHEC, Kos, Greece (2008)Google Scholar
  11. 11.
  12. 12.
    Intel: Intel Vtune Amplifier XE. (2013)
  13. 13.
    Koliaï, S., Bendifallah, Z., Tribalat, M., Valensi, C., Acquaviva, J.T., Jalby, W.: Quantifying performance bottleneck cost through differential analysis. In: 27th ICS, Eugene, pp. 263–272. ACM, New York (2013).
  14. 14.
    Koliai, S., Zuckerman, S., Oseret, E., Ivascot, M., Moseley, T., Quang, D., Jalby, W.: A balanced approach to application performance tuning. In: LCPC, Newark, pp. 111–125 (2009)Google Scholar
  15. 15.
    Levon, J., Elie, P.: OProfile: a system profiler for Linux. (2013)
  16. 16.
    Liu, J., Yu, W., Wu, J., Buntinas, D., Kini, S., K, D., Wyckoff, P.: Microbenchmark performance comparison of high-speed cluster interconnects. IEEE Micro 24, 42–51 (2004)Google Scholar
  17. 17.
    MAQAO: Maqao project. (2013)
  18. 18.
    Martonosi, M., Gupta, A., Anderson, T.: MemSpy: analyzing memory system bottlenecks in programs. In: Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, Newport, pp. 1–12 (1992)Google Scholar
  19. 19.
    Nagel, W.E., Arnold, A., Weber, M., Hoppe, H.C., Solchenbach, K.: VAMPIR: visualization and analysis of MPI resources. Supercomputer 12, 69–80 (1996)Google Scholar
  20. 20.
    Real, F., Trumm, M., Vallet, V., Schimmelpfennig, B., Masella, M., Flament, J.P.: Quantum chemical and molecular dynamics study of the coordination of Th(IV) in aqueous solvent. J. Phys. Chem. B 114(48), 15913–15924 (2010).
  21. 21.
    Shende, S.S., Malony, A.D.: The TAU parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006).
  22. 22.
    Sopeju, O., Burtscher, M., Rane, A., Browne, J.: AutoSCOPE: Automatic suggestions for code optimizations using PerfExpert. In: 2011 ICPDPTA, Las Vegas, Nevada, USA pp. 19–25 (2011)Google Scholar
  23. 23.
    Staelin, C.: lmbench: portable tools for performance analysis. In: USENIX Annual Technical Conference, San Diego, pp. 279–294 (1996)Google Scholar
  24. 24.
    Yoo, W., Larson, K., Kim, S., Ahn, W., Campbell, R.H., Baugh, L.: Automated fingerprinting of performance pathologies using performance monitoring units (PMUs). In: 3rd USENIX Workshop on Hot Topics in Parallelism (HotPar’11), Berkeley, USENIX (2011)Google Scholar
  25. 25.
    Yoo, W., Larson, K., Baugh, L., Kim, S., Campbell, R.H.: ADP: automated diagnosis of performance pathologies using hardware events. In: Harrison, P.G., Arlitt, M.F., Casale, G. (eds.) SIGMETRICS, London, pp. 283–294. ACM (2012).

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Zakaria Bendifallah
    • 1
    • 2
  • William Jalby
    • 1
    • 2
    Email author
  • José Noudohouenou
    • 1
    • 2
  • Emmanuel Oseret
    • 1
    • 2
  • Vincent Palomares
    • 1
    • 2
  • Andres Charif Rubial
    • 1
    • 2
  1. 1.Exascale Computing ResearchVersaillesFrance
  2. 2.University of Versailles Saint-Quentin-en-YvelinesVersaillesFrance

Personalised recommendations