Advertisement

On the Instrumentation of OpenMP and OmpSs Tasking Constructs

  • Harald Servat
  • Xavier Teruel
  • Germán Llort
  • Alejandro Duran
  • Judit Giménez
  • Xavier Martorell
  • Eduard Ayguadé
  • Jesús Labarta
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7640)

Abstract

Parallelism has become more and more commonplace with the advent of the multicore processors. Although different parallel programming models have arisen to exploit the computing capabilities of such processors, developing applications that take benefit of these processors may not be easy. And what is worse, the performance achieved by the parallel version of the application may not be what the developer expected, as a result of a dubious utilization of the resources offered by the processor.

We present in this paper a fruitful synergy of a shared memory parallel compiler and runtime, and a performance extraction library. The objective of this work is not only to reduce the performance analysis life-cycle when doing the parallelization of an application, but also to extend the analysis experience of the parallel application by incorporating data that is only known in the compiler and runtime side. Additionally we present performance results obtained with the execution of instrumented application and evaluate the overhead of the instrumentation.

Keywords

Multicore Processor Task Migration Parallel Programming Model Runtime Library Instrument Version 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Extrae instrumentation package, http://www.bsc.es/paraver (accessed April 2012)
  2. 2.
    Mercurium C/C++ source-to-source compiler, http://pm.bsc.es/projects/mcxx (accessed May 2012)
  3. 3.
    Nanos++RTL. http://pm.bsc.es/projects/nanox (accessed May 2012)
  4. 4.
  5. 5.
    Top 500 supercomputing sites, http://www.top500.org (accessed June 2012)
  6. 6.
    Ayguade, E., Badia, R.M., Cabrera, D., Duran, A., Gonzalez, M., Igual, F., Jimenez, D., Labarta, J., Martorell, X., Mayo, R., Perez, J.M., Quintana-Ortí, E.S.: A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 154–167. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  7. 7.
    OpenMP Architecture Review Board. OpenMP Application Program Interface v 3.0 (May 2008)Google Scholar
  8. 8.
    Buck, B., Hollingsworth, J.K.: An API for runtime code patching. Int. J. High Perform. Comput. Appl. 14(4), 317–329 (2000), http://www.dyninst.org CrossRefGoogle Scholar
  9. 9.
    Bueno, J., Martinell, L., Duran, A., Farreras, M., Martorell, X., Badia, R.M., Ayguade, E., Labarta, J.: Productive Cluster Programming with OmpSs. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part I. LNCS, vol. 6852, pp. 555–566. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  10. 10.
    De Rose, L., et al.: An Implementation of the POMP Performance Monitoring Interface for OpenMP Based on Dynamic Probes, http://www.research.ibm.com/actc/projects/pdf/T16p.pdf (accessed May 2012)
  11. 11.
    Duran, A., et al.: Barcelona OpenMP Tasks Suite: A set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: International Conference on Parallel Processing, ICPP 2009, pp. 124–131. IEEE (2009), https://pm.bsc.es/projects/bots (accessed May 2012)
  12. 12.
    Ferrer, R., Planas, J., Bellens, P., Duran, A., Gonzalez, M., Martorell, X., Badia, R.M., Ayguade, E., Labarta, J.: Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds.) LCPC 2010. LNCS, vol. 6548, pp. 215–229. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  13. 13.
    Fürlinger, K., Skinner, D.: Performance Profiling for OpenMP Tasks. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 132–139. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  14. 14.
    Hempel, R.: The MPI Standard for Message Passing. In: Gentzsch, W., Harms, U. (eds.) HPCN-Europe 1994. LNCS, vol. 797, pp. 247–252. Springer, Heidelberg (1994)CrossRefGoogle Scholar
  15. 15.
    Itzkowitz, M., et al.: An OpenMP Runtime API for Profiling, http://www.compunity.org/futures/omp-api.html (accessed May 2012)
  16. 16.
    Lavallea, P.F., et al.: HYDRO, http://www.prace-ri.eu (accessed May 2012)
  17. 17.
    Liao, C., Quinlan, D.J., Panas, T., de Supinski, B.R.: A ROSE-Based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries. In: Sato, M., Hanawa, T., Müller, M.S., Chapman, B.M., de Supinski, B.R. (eds.) IWOMP 2010. LNCS, vol. 6132, pp. 15–28. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Lorenz, D., Mohr, B., Rössel, C., Schmidl, D., Wolf, F.: How to Reconcile Event-Based Performance Analysis with Tasking in OpenMP. In: Sato, M., Hanawa, T., Müller, M.S., Chapman, B.M., de Supinski, B.R. (eds.) IWOMP 2010. LNCS, vol. 6132, pp. 109–121. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  19. 19.
    Mohr, B., et al.: Design and Prototype of a Performance Tool Interface for OpenMP. The Journal of Supercomputing 23, 105–128 (2001), doi:10.1023/A:1015741304337CrossRefGoogle Scholar
  20. 20.
    Mohr, B., et al.: A Performance Monitoring Interface for OpenMP. In: Proceedings of the Fourth Workshop on OpenMP, EWOMP 2002 (2002)Google Scholar
  21. 21.
    Nagel, W.E., et al.: VAMPIR: Visualization and analysis of MPI resources. Supercomputer 12(1), 69–80 (1996)Google Scholar
  22. 22.
    Pillet, V., et al.: Paraver: A tool to visualize and analyze parallel code. Transputer and Occam Developments, 17–32 (April 1995), http://www.bsc.es/paraver (accessed April 2012)
  23. 23.
    Shende, S.S., Malony, A.D.: The TAU parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)CrossRefGoogle Scholar
  24. 24.
    Wolf, F., et al.: Usage of the SCALASCA for scalable performance analysis of large-scale parallel applications. In: Tools for High Performance Computing, pp. 157–167. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Harald Servat
    • 1
    • 2
  • Xavier Teruel
    • 1
  • Germán Llort
    • 1
    • 2
  • Alejandro Duran
    • 1
    • 3
  • Judit Giménez
    • 1
    • 2
  • Xavier Martorell
    • 1
    • 2
  • Eduard Ayguadé
    • 1
    • 2
  • Jesús Labarta
    • 1
    • 2
  1. 1.Barcelona Supercomputing CenterSpain
  2. 2.Universitat Politècnica de CatalunyaSpain
  3. 3.Intel CorporationBarcelonaSpain

Personalised recommendations