MPI Trace Compression Using Event Flow Graphs

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8632)


Understanding how parallel applications behave is crucial for using high-performance computing (HPC) resources efficiently. However, the task of performance analysis is becoming increasingly difficult due to the growing complexity of scientific codes and the size of machines. Even though many tools have been developed over the past years to help in this task, current approaches either only offer an overview of the application discarding temporal information, or they generate huge trace files that are often difficult to handle.

In this paper we propose the use of event flow graphs for monitoring MPI applications, a new and different approach that balances the low overhead of profiling tools with the abundance of information available from tracers. Event flow graphs are captured with very low overhead, require orders of magnitude less storage than standard trace files, and can still recover the full sequence of events in the application. We test this new approach with the NERSC-8/Trinity Benchmark suite and achieve compression ratios up to 119x.


MPI event flow graphs trace compression trace reconstruction performance monitoring 


  1. 1.
    Labarta, J., Gimenez, J., Martinez, E., González, P., Servat, H., Llort, G., Aguilar, X.: Scalability of visualization and tracing tools. In: Proc. 11th Parallel Computing Conf. (ParCo 2005), pp. 869–876 (2005)Google Scholar
  2. 2.
    Fuerlinger, K., Wright, N.J., Skinner, D.: Effective performance measurement at petascale using ipm. In: 2010 IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS), pp. 373–380. IEEE (2010)Google Scholar
  3. 3.
    Aguilar, X., Fürlinger, K., Laure, E.: Online performance data introspection with ipm. In: The 15th IEEE International Conference on High Performance Computing and Communications (2013) (to be published)Google Scholar
  4. 4.
    Fürlinger, K., Skinner, D.: Capturing and visualizing event flow graphs of mpi applications. In: Lin, H.-X., Alexander, M., Forsell, M., Knüpfer, A., Prodan, R., Sousa, L., Streit, A. (eds.) Euro-Par 2009 Workshops 2009. LNCS, vol. 6043, pp. 218–227. Springer, Heidelberg (2010)Google Scholar
  5. 5.
  6. 6.
    Alcouffe, R.E., Baker, R.S., Dahl, J.A., Turner, S.A., Ward, R.: Partisn: A time-dependent, parallel neutral particle transport code system. Los Alamos National Laboratory, LA-UR-05-3925 (May 2005)Google Scholar
  7. 7.
  8. 8.
    Pillet, V., Labarta, J., Cortes, T., Girona, S.: Paraver: A tool to visualize and analyze parallel code. In: Proceedings of WoTUG-18: Transputer and Occam Developments, vol. 44, pp. 17–31 (1995)Google Scholar
  9. 9.
    Servat, H., Llort, G., Huck, K., Giménez, J., Labarta, J.: Framework for a productive performance optimization. Parallel Computing 39(8), 336–353 (2013)CrossRefGoogle Scholar
  10. 10.
    Knüpfer, A., Rössel, C., Mey, D., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A., et al.: Score-p: A joint performance measurement run-time infrastructure for periscope, scalasca, tau, and vampir. In: Tools for High Performance Computing 2011, pp. 79–91. Springer (2012)Google Scholar
  11. 11.
    Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The vampir performance analysis tool-set. In: Tools for High Performance Computing, pp. 139–155. Springer (2008)Google Scholar
  12. 12.
    Vetter, J.S., McCracken, M.O.: Statistical scalability analysis of communication operations in distributed applications. In: ACM SIGPLAN Notices, vol. 36, pp. 123–132. ACM (2001)Google Scholar
  13. 13.
    Graham, S.L., Kessler, P.B., Mckusick, M.K.: Gprof: A call graph execution profiler. ACM Sigplan Notices 17(6), 120–126 (1982)CrossRefGoogle Scholar
  14. 14.
    Noeth, M., Ratn, P., Mueller, F., Schulz, M., de Supinski, B.R.: Scalatrace: Scalable compression and replay of communication traces for high-performance computing. Journal of Parallel and Distributed Computing 69(8), 696–710 (2009)CrossRefGoogle Scholar
  15. 15.
    Havlak, P., Kennedy, K.: An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems 2(3), 350–360 (1991)CrossRefGoogle Scholar
  16. 16.
    Krishnamoorthy, S., Agarwal, K.: Scalable communication trace compression. In: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp. 408–417. IEEE Computer Society (2010)Google Scholar
  17. 17.
    Knupfer, A., Nagel, W.E.: Construction and compression of complete call graphs for post-mortem program trace analysis. In: International Conference on Parallel Processing, ICPP 2005, pp. 165–172. IEEE (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.High Performance Computing and Visualization Department (HPCViz) and Swedish e-Science Research Center (SeRC)KTH Royal Institute of TechnologyStockholmSweden
  2. 2.Computer Science Department, MNM TeamLudwig-Maximilians-Universität (LMU) MunichMunichGermany

Personalised recommendations