International Journal of Parallel Programming

, Volume 35, Issue 5, pp 459–476 | Cite as

Nested Parallelization with OpenMP

  • Dieter an MeyEmail author
  • Samuel Sarholz
  • Christian Terboven


OpenMP is widely accepted as a de facto standard for shared memory parallel programming in Fortran, C and C++. Nested parallelization has been included in the first OpenMP specification, but it took a few years until the first commercially available compilers supported this optional part of the specification. We employed nested parallelization using OpenMP in three production codes: a C++ code for content-based image retrieval, a C++ code for the computation of critical points in multi-block CFD datasets, and a multi-block Navier-Stokes solver written in Fortran90. In this paper we discuss the opportunities as well as the deficiencies of the nested parallelization support in OpenMP.


OpenMP Nested parallelization ccNUMA Shared memory parallelization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Terboven, C., Deselaers, T., Bischof, C., Ney, H.: Shared-memory parallelization for content-based image retrieval. In: ECCV 2006 Workshop on Computation Intensive Methods for Computer VisionGoogle Scholar
  2. Nested OpenMP for Efficient Computation of 3D Critical Points in Multi-Block CFD Datasets; Super computing (2006) (to appear)Google Scholar
  3. Johnson, S., Leggett, P., Ierotheou, C., Spiegel, A., an Mey, D., Hoerschler, I.: Nested parallelization of the flow solver tfs using the parawise parallelization environment; IWOMP (2006);
  4. OpenMP Architecture Review Board: OpenMP application program interface, v2.5. (2005) or
  5. Solaris Memory Placement Optimization and Sun Fire Servers, Technical White Paper,
  6. Sun Studio 11: OpenMP API User’s Guide, Chapter 2, Nested Parallelism,
  7. Müller, H., Michoux, N., Bandon, D., Geissbuhler, A.: A review of content-based image retrieval systems in medical applications–clinical benefits and future directions. Int. J. Med. Inform. (73)1–23 (2004)Google Scholar
  8. Sun, Y., Zhang, H., Zhang, L., Li, M.: Myphotos a system for home photo management and processing. In: ACM Multimedia Confernce, pp. 81–82 Juan-les-Pins, France, (2002)Google Scholar
  9. Smeulders A.W.M., Worring M., Santini S., Gupta A., Jain R. (2000) Content-based image retrieval: the end of the early years. IEEE T. Pattern Anal. 22(12): 1349–1380CrossRefGoogle Scholar
  10. Deselaers, T., Keysers, D., Ney, H.: Features for image retrieval—a quantitative comparison. In: DAGM 2004, Pattern Recognition, 26th DAGM Symposium, pp. 228–236 Number 3175 in Lecture Notes in Computer Science, Tübingen, Germany (2004)Google Scholar
  11. Clough, P., Müller, H., Sanderson, M.: The CLEF cross language image retrieval track (ImageCLEF) 2004. In: Fifth Workshop of the Cross–Language Evaluation Forum (CLEF 2004). Volume 3491 of LNCS, pp. 597–613 (2005)Google Scholar
  12. Clough, P., Mueller, H., Deselaers, T., Grubinger, M., Lehmann, T., Jensen, J., Hersh, W.: The clef 2005 cross-language image retrieval track. In: Workshop of the Cross–Language Evaluation Forum (CLEF 2005). Lecture Notes in Computer Science, Vienna, Austria (2005) (in press)Google Scholar
  13. Hörschler I., Meinke M., Schröder W. (2003) Numerical simulation of the flow field in a model of the nasal cavity. Comput. Fluids 32: 3945CrossRefGoogle Scholar
  14. Hörschler, I., Brücker, C., Schröder, W., Meinke, M.: Investigation of the impact of the geometry on the nose flow, Eur. J. Mech. B/Fluids (In Press)
  15. ParaWise automatic parallelisation environment, PSP Inc.
  16. Jin, H., Frumkin, M., Yan, J.: Automatic generation of OpenMP directives and it application to computational fluid dynamics codes. International Symposium on High Performance Computing, p. 440 Tokyo, Japan, (2000)Google Scholar
  17. Johnson, S., Ierotheou, C.: Parallelization of the TFS multi-block code from RWTH Aachen using the ParaWise/CAPO tools, PSP Inc, TR-2005-09-02, (2005).
  18. Johnson S., Cross M., and Everett M. (1996) Exploitation of symbolic information in interprocedural dependence analysis. Parallel Comput. 22, 197–226zbMATHCrossRefGoogle Scholar
  19. Spiegel, A., an Mey, D., Bischof, C.: Hybrid parallelization of CFD Applications with Dynamic Thread Balancing, PARA04. In: Dongarra J., Madsen K., Wasniewski J. (eds.) Applied Parallel Computing State of the Art in Scientific Computing: 7th International Conference, PARA 2004, vol. 3732, pp. 433–441. Lyngby, Denmark (2006)Google Scholar
  20. McCalpin, J.D.: STREAM: sustainable memory bandwidth in high performance computers,
  21. Bull, M.: The status of OpenMP 3.0, SC06, OpenMP BoF

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Dieter an Mey
    • 1
    Email author
  • Samuel Sarholz
    • 1
  • Christian Terboven
    • 1
  1. 1.Center for Computing and CommunicationRWTH Aachen UniversityAachenGermany

Personalised recommendations