Scalable Parallel Trace-Based Performance Analysis

  • Markus Geimer
  • Felix Wolf
  • Brian J. N. Wylie
  • Bernd Mohr
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4192)


Automatic trace analysis is an effective method for identifying complex performance phenomena in parallel applications. However, as the size of parallel systems and the number of processors used by individual applications is continuously raised, the traditional approach of analyzing a single global trace file, as done by kojak’s expert trace analyzer, becomes increasingly constrained by the large number of events. In this article, we present a scalable version of the expert analysis based on analyzing separate local trace files with a parallel tool which ‘replays’ the target application’s communication behavior. We describe the new parallel analyzer architecture and discuss first empirical results.


Parallel Analysis Parallel Application Target Application Execution Trace Collective Communication 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Nagel, W., Weber, M., Hoppe, H.C., Solchenbach, K.: VAMPIR: Visualization and Analysis of MPI Resources. Supercomputer 63, XII(1), 69–80 (1996)Google Scholar
  2. 2.
    Labarta, J., Girona, S., Pillet, V., Cortes, T., Gregoris, L.: DiP: A Parallel Program Development Environment. In: Proc. 2nd Int’l Euro-Par Conf., Lyon, France. Springer, Heidelberg (1996)Google Scholar
  3. 3.
    Wolf, F., Mohr, B.: Automatic performance analysis of hybrid MPI/OpenMP applications. Journal of Systems Architecture 49(10–11), 421–439 (2003)CrossRefGoogle Scholar
  4. 4.
    Wolf, F., Mohr, B., Dongarra, J., Moore, S.: Efficient Pattern Search in Large Traces through Successive Refinement. In: Proc. European Conf. on Parallel Computing (Euro-Par 2004), Pisa, Italy. Springer, Heidelberg (2004)Google Scholar
  5. 5.
    Wolf, F., Freitag, F., Mohr, B., Moore, S., Wylie, B.: Large Event Traces in Parallel Performance Analysis. In: Proc. 8th Workshop on Parallel Systems and Algorithms (PASA 2006), Frankfurt/Main, Germany. Lecture Notes in Informatics, Gesellschaft für Informatik (2006)Google Scholar
  6. 6.
    Freitag, F., Caubet, J., Labarta, J.: On the scalability of tracing mechanisms. In: Monien, B., Feldmann, R.L. (eds.) Euro-Par 2002. LNCS, vol. 2400, p. 97. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Wu, C.E., Bolmarcich, A., Snir, M., Wootton, D., Parpia, F., Chan, A., Lusk, E., Gropp, W.: From Trace Generation to Visualization: A Performance Framework for Distributed Parallel Systems. In: Reich, S., Anderson, K.M. (eds.) OHS 2000 and SC 2000. LNCS, vol. 1903. Springer, Heidelberg (2000)Google Scholar
  8. 8.
    Brunst, H., Nagel, W.E.: Scalable Performance Analysis of Parallel Systems: Concepts and Experiences. In: Parallel Computing: Software Technology, Algorithms, Architectures and Applications, pp. 737–744. Elsevier, Amsterdam (2004)CrossRefGoogle Scholar
  9. 9.
    Knüpfer, A., Nagel, W.E.: Construction and Compression of Complete Call Graphs for Post-Mortem Program Trace Analysis. In: Proc. of the International Conference on Parallel Processing (ICCP 2005), Oslo, Norway, pp. 165–172. IEEE Computer Society, Los Alamitos (2005)CrossRefGoogle Scholar
  10. 10.
    Roth, P.C., Miller, B.P.: On-line automated performance diagnosis on thousands of processes. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2006), New York City, NY, USA (2006)Google Scholar
  11. 11.
    Fürlinger, K., Gerndt, M.: Distributed Application Monitoring for Clustered SMP Architectures. In: Proc. 9th Int’l Euro-Par Conf., Klagenfurt, Austria. Springer, Heidelberg (2003)Google Scholar
  12. 12.
    Fahringer, T., Gerndt, M., Mohr, B., Wolf, F., Riley, G., Träff, J.L.: Knowledge Specification for Automatic Performance Analysis. Technical Report FZJ-ZAM-IB-2001-08, ESPRIT IV Working Group APART, Forschungszentrum Jülich (2001) (Revised version)Google Scholar
  13. 13.
    Fahringer, T., Seragiotto Jr., C.: Modelling and Detecting Performance Problems for Distributed and Parallel Programs with JavaPSL. In: Proc. SC 2001, Denver, CO, USA (2001)Google Scholar
  14. 14.
    Jorba, J., Margalef, T., Luque, E.: Performance Analysis of Parallel Applications with KappaPI 2. In: Proc. Parallel Computing 2005, ParCo, Málaga, Spain (2006)Google Scholar
  15. 15.
    Song, F., Wolf, F., Bhatia, N., Dongarra, J., Moore, S.: An Algebra for Cross-Experiment Performance Analysis. In: Proc. Int’l Conf. on Parallel Processing (ICPP 2004), Montreal, Canada. IEEE Computer Society, Los Alamitos (2004)Google Scholar
  16. 16.
    Wolf, F.: Automatic Performance Analysis on Parallel Computers with SMP Nodes. PhD thesis, RWTH Aachen, Forschungszentrum Jülich (2003) ISBN 3-00-010003-2Google Scholar
  17. 17.
    The BlueGene/L Team at IBM and LLNL: An overview of the BlueGene/L supercomputer. In: Proc. SC 2002, Baltimore, MD, USA. IEEE Computer Society, Los Alamitos (2002)Google Scholar
  18. 18.
    Advanced Simulation and Computing Program: The ASC SMG 2000 Benchmark Code (2001),
  19. 19.
    Gibbon, P.: PEPC: A Multi-Purpose Parallel Tree-Code (2005),

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Markus Geimer
    • 1
  • Felix Wolf
    • 1
  • Brian J. N. Wylie
    • 1
  • Bernd Mohr
    • 1
  1. 1.John von Neumann Institute for Computing (NIC), Forschungszentrum JülichJülichGermany

Personalised recommendations