Tool Support for Developing DASH Applications

  • Denis HünichEmail author
  • Andreas Knüpfer
  • Sebastian Oeste
  • Karl Fürlinger
  • Tobias Fuchs
Conference paper
Part of the Lecture Notes in Computational Science and Engineering book series (LNCSE, volume 113)


DASH is a new parallel programming model for HPC which is implemented as a C++ template library on top of a runtime library implementing various PGAS (Partitioned Global Address Space) substrates. DASH’s goal is to be an easy to use and efficient way to parallel programming with C++. Supporting software tools is an important part of the DASH project, especially debugging and performance monitoring. Debugging is particularly necessary when adopting a new parallelization model, while performance assessment is crucial in High Performance Computing applications by nature. Tools are fundamental for a programming ecosystem and we are convinced that providing tools early brings multiple advantages, benefiting application developers using DASH as well as developers of the DASH library itself. This work, first briefly introduces DASH and the underlying runtime system, existing debugger and performance analysis tools. We then demonstrate the specific debugging and performance monitoring extensions for DASH in exemplary use cases and discuss an early assessment of the results.


Template Library Parallel Programming Model Communication Substrate Hardware Counter Instrumentation Level 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The DASH concept and its current implementation have been developed in the DFG project “Hierarchical Arrays for Efficient and Productive Data-Intensive Exascale Computing” funded under the German Priority Programme 1648 “Software for Exascale Computing” (SPPEXA). 6


  1. 1.
    Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurr. Comput.: Pract. Exper. 22 (6), 685–701 (2010)Google Scholar
  2. 2.
    Alameda, J., Spear, W., Overbey, J.L., Huck, K., Watson, G.R., Tibbitts, B.: The eclipse parallel tools platform: toward an integrated development environment for XSEDE resources. In: Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the Campus and Beyond (XSEDE ’12), pp. 48:1–48:8. ACM, New York (2012).
  3. 3.
    Allen, E., Chase, D., Hallett, J., Luchangco, V., Maessen, J.-W., Ryu, S., Steele Jr, G.L., Tobin-Hochstadt, S., Dias, J., Eastlund, C., et al.: The fortress language specification. Sun Microsyst. 139, 140 (2005)Google Scholar
  4. 4.
    Allinea DDT: The global standard for high-impact debugging on clusters and supercomputers (2015). Online; Accessed 12 Jan 2015
  5. 5.
    Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14 (3), 189–204 (2000)CrossRefGoogle Scholar
  6. 6.
    Buss, A., Papadopoulos, I., Pearce, O., Smith, T., Tanase, G., Thomas, N., Xu, X., Bianco, M., Amato, N.M., Rauchwerger, L., et al.: STAPL: standard template adaptive parallel library. In: Proceedings of the 3rd Annual Haifa Experimental Systems Conference, p. 14. ACM (2010)Google Scholar
  7. 7.
    Chamberlain, B.L., Callahan, D., Zima, H.P.: Parallel programmability and the Chapel language. Int. J. High Perform. Comput. Appl. 21, 291–312 (2007)CrossRefGoogle Scholar
  8. 8.
    Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., Von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. ACM Sigplan Notices 40 (10), 519–538 (2005)CrossRefGoogle Scholar
  9. 9.
    Dietrich, R., Ilsche, T., Juckeland, G.: Non-intrusive performance analysis of parallel hardware accelerated applications on hybrid architectures. In: 2010 39th International Conference on Parallel Processing Workshops (ICPPW), pp. 135–143. IEEE (2010)Google Scholar
  10. 10.
    Dryden, N.: PGDB: A debugger for MPI applications. In: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment (XSEDE ’14), pp. 44:1–44:7. ACM, New York (2014).
  11. 11.
    Eschweiler, D., Wagner, M., Geimer, M., Knüpfer, A., Nagel, W.E., Wolf, F.: Open Trace Format 2: The next generation of scalable trace formats and support libraries. In: Applications, Tools and Techniques on the Road to Exascale Computing. Advances in Parallel Computing, vol. 22, pp. 481–490. IOS Press (2012)Google Scholar
  12. 12.
    Free Software Foundation, Inc.: GDB: The GNU Project Debugger. (2014). Online; Accessed 01 Nov 2015
  13. 13.
    Fürlinger, K., Glass, C., Gracia, J., Knüpfer, A., Tao, J., Hünich, D., Idrees, K., Maiterth, M., Mhedheb, Y., Zhou, H.: DASH: data structures and algorithms with support for hierarchical locality. In: Lopes, L., Žilinskas, J., Costan, A., Cascella, R., Kecskemeti, G., Jeannot, E., Cannataro, M., Ricci, L., Benkner, S., Petit, S., Scarano, V., Gracia, J., Hunold, S., Scott, S., Lankes, S., Lengauer, C., Carretero, J., Breitbart, J., Alexander, M. (eds.) Euro-Par 2014: Parallel Processing Workshops. Lecture Notes in Computer Science, vol. 8806, pp. 542–552. Springer, Cham (2014).
  14. 14.
    Fürlinger, K., Wright, N.J., Skinner, D.: Performance analysis and workload characterization with IPM. In: Proceedings of the 3rd International Workshop on Parallel Tools for High Performance Computing, pp. 31–38. Springer, Dresden (2010)Google Scholar
  15. 15.
    Geimer, M., Wolf, F., Wylie, B.J., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurr. Comput.: Pract. Exper. 22 (6), 702–719 (2010)Google Scholar
  16. 16.
    Gerndt, M., Ott, M.: Automatic performance analysis with periscope. Concurr. Comput.: Pract. Exper. 22 (6), 736–748 (2010)Google Scholar
  17. 17.
    Grünewald, D., Simmendinger, C.: The GASPI API specification and its implementation GPI 2.0. In: 7th International Conference on PGAS Programming Models, vol. 243 (2013)Google Scholar
  18. 18.
    Hünich, D., Knüpfer, A., Gracia, J.: Providing parallel debugging for DASH distributed data structures with GDB. Procedia Comput. Sci. 51, 1383–1392 (2015). International Conference on Computational Science, ICCS 2015 Computational Science at the Gates of Nature
  19. 19.
    Ilsche, T., Schuchart, J., Schöne, R., Hackenberg, D.: Combining instrumentation and sampling for trace-based application performance analysis. In: Proceedings of the 8th International Parallel Tools Workshop, pp. 123–136. Springer (2014)Google Scholar
  20. 20.
    Johnson, T.A.: Coarray C++. In: 7th International Conference on PGAS Programming Models, Edinburgh (2013)Google Scholar
  21. 21.
    Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The Vampir performance analysis tool-set. In: Tools for High Performance Computing, pp. 139–155. Springer, Berlin/Heidelberg (2008)Google Scholar
  22. 22.
    an Mey, D., Biersdorf, S., Bischof, C., Diethelm, K., Eschweiler, D., Gerndt, M., Knüpfer, A., Lorenz, D., Malony, A., Nagel, W.E., et al.: Score-P: a unified performance measurement system for petascale applications. In: Competence in High Performance Computing 2010, pp. 85–97. Springer, Berlin/Heidelberg (2012)Google Scholar
  23. 23.
    MPI Forum: MPI: A Message-Passing Interface Standard. Version 3.0 (2012). Available at: (Sept 2012)
  24. 24.
    Müller, M.S., Knüpfer, A., Jurenz, M., Lieber, M., Brunst, H., Mix, H., Nagel, W.E.: Developing scalable applications with Vampir, VampirServer and VampirTrace. In: Parallel Computing: Architectures, Algorithms and Applications. Advances in Parallel Computing, vol. 18, pp. 637–644. John von Neumann Institute for Computing, Jülich (2007)Google Scholar
  25. 25.
    Numrich, R.W., Reid, J.: Co-array Fortran for parallel programming. SIGPLAN Fortran Forum 17 (2), 1–31 (1998)CrossRefGoogle Scholar
  26. 26.
    Partitioned Global Address Space (2014). [Online]
  27. 27.
    Poole, S.W., Hernandez, O., Kuehn, J.A., Shipman, G.M., Curtis, A., Feind, K.: OpenSHMEM - toward a unified RMA model. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1379–1391. Springer US (2011)Google Scholar
  28. 28.
    Rogue Wave Software, I.: TotalView debugger: faster fault isolation, improved memory optimization, and dynamic visualization for your high performance computing apps (2015)., [Online; Accessed 12 Jan 2015]
  29. 29.
    Shende, S.S., Malony, A.D.: The TAU parallel performance system. Int. J. High Perform Comput. Appl. 20 (2), 287–311 (2006)CrossRefGoogle Scholar
  30. 30.
    UPC Consortium: UPC language specifications, v1.2. Tech Report LBNL-59208, Lawrence Berkeley National Lab (2005).
  31. 31.
    at the University of Illinois at Urbana-Champaign, C.S.D.: The LLDB Debugger (2015)., [Online; Accessed 12 Jan 2015]
  32. 32.
    Zheng, Y., Kamil, A., Driscoll, M.B., Shan, H., Yelick, K.: UPC++: A PGAS extension for C++. In: 28th IEEE International Parallel & Distributed Processing Symposium, pp. 1105–1114. IEEE (2014)Google Scholar
  33. 33.
    Zhou, H., Mhedheb, Y., Idrees, K., Glass, C.W., Gracia, J., Fürlinger, K.: DART-MPI: An MPI-based Implementation of a PGAS Runtime System. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models (PGAS ’14), pp. 3:1–3:11. ACM, New York (2014).

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Denis Hünich
    • 1
    Email author
  • Andreas Knüpfer
    • 1
  • Sebastian Oeste
    • 1
  • Karl Fürlinger
    • 2
  • Tobias Fuchs
    • 2
  1. 1.TU DresdenDresdenGermany
  2. 2.LMU MünchenMünchenGermany

Personalised recommendations