Gleaming the Cube: Online Performance Analysis and Visualization Using MALP

  • Jean-Baptiste BesnardEmail author
  • Allen D. Malony
  • Sameer Shende
  • Marc Pérache
  • Julien Jaeger
Conference paper


Multi-Application onLine Profiling (MALP) is a performance tool which has been developed as an alternative to the trace-based approach for fine-grained event collection. Any performance and analysis measurement system must address the problem of data management and projection to meaningful forms. Our concept of a valorization chain is introduced to capture this fundamental principle. MALP is a dramatic departure from performance tool dogma in that is advocates for an online valorization architecture that integrates data producers with transformers, consumers, and visualizers, all operating in concert and simultaneously. MALP provides a powerful, dynamic framework for performance processing, as is demonstrated in unique performance analysis and application dashboard examples. Our experience with MALP has identified opportunities for data-query in MPI context, and more generally, creating a “constellation of services” that allow parallel processes and tools to collaborate through a common mediation layer.


  1. 1.
    Adhianto L, Banerjee S, Fagan M, Krentel M, Marin G, Mellor-Crummey J, Tallent NR (2010) HPCToolkit: tools for performance analysis of optimized parallel programs. Concurr. Comput.: Pract. Exp. 22(6):685–701Google Scholar
  2. 2.
    Arnold, D.C., Ahn, D.H., de Supinski, B.R., Lee, G.L., Miller, B.P., Schulz, M.: Stack trace analysis for large scale debugging. In: IEEE International Parallel and Distributed Processing Symposium, IPDPS. pp. 1–10. IEEE (2007)Google Scholar
  3. 3.
    Benedict, S., Petkov, V., Gerndt, M.: Periscope: an online-based distributed performance analysis tool. Tools for High Performance Computing 2009, pp. 1–16. Springer, Berlin (2010)Google Scholar
  4. 4.
    Besnard, J.B.: Profiling and Debugging by Efficient Tracing of Hybrid Multi-Threaded HPC Applications. Ph.D. thesis, Université de Versailles Saint Quentin en Yvelines (2014)Google Scholar
  5. 5.
    Besnard, J.B., Pérache, M., Jalby, W.: Event streaming for online performance measurements reduction. In: 42nd International Conference on Parallel Processing (ICPP), pp. 985–994. IEEE (2013)Google Scholar
  6. 6.
    Chan A, Gropp W, Lusk E (2008) An efficient format for nearly constant-time access to arbitrary time intervals in large trace files. Sci. Program. 16(2–3):155–165Google Scholar
  7. 7.
    Crockford, D.: The Application/Json Media Type for Javascript Object Notation (json) (2006)Google Scholar
  8. 8.
    von Eicken, T., Culler, D.E., Goldstein, S.C., Schauser, K.E.: Active messages: a mechanism for integrated communication and computation. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, ISCA ’92, pp. 256–266. ACM, New York, NY, USA (1992).
  9. 9.
    Eschweiler D, Wagner M, Geimer M, Knüpfer A, Nagel WE, Wolf F (2011) Open trace format 2: the next generation of scalable trace formats and support libraries. PARCO. 22:481–490Google Scholar
  10. 10.
    Frings, W., Wolf, F., Petkov, V.: Scalable massively parallel i/o to task-local files. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–11. IEEE (2009)Google Scholar
  11. 11.
    Gansner ER, North SC (2000) An open graph visualization system and its applications to software engineering. Soft. -Pract. Exp. 30(11):1203–1233CrossRefzbMATHGoogle Scholar
  12. 12.
    Geimer M, Wolf F, Wylie BJ, Ábrahám E, Becker D, Mohr B (2010) The scalasca performance toolset architecture. Concurr. Comput. Pract. Exp. 22(6):702–719Google Scholar
  13. 13.
    Hilbrich, T., Müller, M.S., de Supinski, B.R., Schulz, M., Nagel, W.E.: GTI: A generic tools infrastructure for event-based tools in parallel systems. In: IEEE 26th International Parallel & Distributed Processing Symposium (IPDPS), pp. 1364–1375. IEEE (2012)Google Scholar
  14. 14.
    Hilbrich, T., Schulz, M., de Supinski, B.R., Müller, M.S.: MUST: A Scalable approach to runtime error detection in MPI programs. Tools for High Performance Computing 2009, pp. 53–66. Springer, Berlin (2010)Google Scholar
  15. 15.
    Knüpfer, A., Brendel, R., Brunst, H., Mix, H., Nagel, W.E.: Introducing the open trace format (OTF). Computational Science–ICCS 2006, pp. 526–533. Springer, Berlin (2006)Google Scholar
  16. 16.
    Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The vampir performance analysis tool-set. Tools for High Performance Computing, pp. 139–155. Springer, Berlin (2008)Google Scholar
  17. 17.
    Knüpfer, A., Rössel, C., an Mey, D., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A., et al.: Score-P: a joint performance measurement run-time infrastructure for periscope, scalasca, TAU, and vampir. Tools for High Performance Computing 2011, pp. 79–91. Springer, Berlin (2012)Google Scholar
  18. 18.
    Mainwaring, A.M., Culler, D.E.: Active message applications programming interface and communication subsystem organization. Technical Report UCB/CSD-96-918, EECS Department, University of California, Berkeley (Oct 1996).
  19. 19.
    Nataraj, A., Malony, A.D., Morris, A., Arnold, D., Miller, B.: A framework for scalable, parallel performance monitoring using TAU and MRnet. In: International Workshop on Scalable Tools for High-End Computing (STHEC 2008), Island of Kos, Greece (2008)Google Scholar
  20. 20.
    de Oliveira Stein, B., de Kergommeaux, J.C., Mounié, G.: Pajé Trace File Format. Technical report, ID-IMAG, Grenoble, France, 2002. (2010)
  21. 21.
    Roth, P.C., Arnold, D.C., Miller, B.P.: MRNet: A Software-based multicast/reduction network for scalable tools. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, p. 21. ACM (2003)Google Scholar
  22. 22.
    Schulz, M., de Supinski, B.R.: P\(^n\)MPI tools: a whole lot greater than the sum of their parts. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, p. 30. ACM (2007)Google Scholar
  23. 23.
    Shende SS, Malony AD (2006) The TAU parallel performance system. Int. J. High Perform. Comput. Appl. 20(2):287–311CrossRefGoogle Scholar
  24. 24.
    Vetter, J., Chambreau, C.: MPIP: Lightweight, scalable MPI profiling (2005)Google Scholar
  25. 25.
    Willcock, J.J., Hoefler, T., Edmonds, N.G., Lumsdaine, A.: AM++: a generalized active message framework. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT ’10, pp. 401–410. ACM, New York, NY, USA (2010).
  26. 26.
    Zaki O, Lusk E, Gropp W, Swider D (1999) Toward scalable performance visualization with jumpshot. Int. J. High Perform. Comput. Appl. 13(3):277–288CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Jean-Baptiste Besnard
    • 1
    Email author
  • Allen D. Malony
    • 2
  • Sameer Shende
    • 2
  • Marc Pérache
    • 3
  • Julien Jaeger
    • 3
  1. 1.ParaTools SASBruyeres-le-chatelFrance
  2. 2.ParaTools Inc.EugeneUSA
  3. 3.CEA, DAM, DIFArpajonFrance

Personalised recommendations