Making OpenMP Ready for C++ Executors

  • Thomas R. W. ScoglandEmail author
  • Dan Sunderland
  • Stephen L. Olivier
  • David S. Hollman
  • Noah Evans
  • Bronis R. de Supinski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11718)


For at least the last 20 years, many have tried to create a general resource management system to support interoperability across various concurrent libraries. The previous strategies all suffered from additional toolchain requirements, and/or a usage of a shared programing model that assumed it owned/controlled access to all resources available to the program. None of these techniques have achieved wide spread adoption. The ubiquity of OpenMP coupled with C++ developing a standard way to describe many different concurrent paradigms (C++23 executors) would allow OpenMP to assume the role of a general resource manager without requiring user code written directly in OpenMP. With a few added features such as the ability to use otherwise idle threads to execute tasks and to specify a task “width”, many interesting concurrent frameworks could be developed in native OpenMP and achieve high performance. Further, one could create concrete C++ OpenMP executors that enable support for general C++ executor based codes, which would allow Fortran, C, and C++ codes to use the same underlying concurrent framework when expressed as native OpenMP or using language specific features. Effectively, OpenMP would become the de facto solution for a problem that has long plagued the HPC community.


C++ executors OpenMP tasks 



This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525.


  1. 1.
    BOLT: A lightning-fast OpenMP implementation.
  2. 2.
    Argonne National Laboratory: MPICH2: High performance and portable message passing.
  3. 3.
    Augonnet, C., Thibault, S., Namyst, R.: StarPU: a runtime system for scheduling tasks over accelerator-based multicore machines. Technical report, RR-7240, Laboratoire Bordelais de Recherche en Informatique - LaBRI, RUNTIME - INRIA Bordeaux - Sud-Ouest, March 2010.
  4. 4.
    Baker, H.C., Hewitt, C.: The incremental garbage collection of processes. ACM SIGPLAN Not. 12(8), 55–59 (1977). Scholar
  5. 5.
    Bueno, J., Duran, A., Martorell, X., Ayguadé, E., Badia, R.M., Labarta, J.: Poster: programming clusters of GPUs with OmpSs. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SuperComputing). ACM, May 2011.,
  6. 6.
    Duran, A., et al.: OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. 21(2), 173–193 (2011). Scholar
  7. 7.
    Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014). Domain-Specific Languages and High-Level Frameworks for High-Performance ComputingCrossRefGoogle Scholar
  8. 8.
    Ford, B., Susarla, S.: CPU inheritance scheduling. In: OSDI, vol. 96, pp. 91–105 (1996)Google Scholar
  9. 9.
    Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, Boston (1994)zbMATHGoogle Scholar
  10. 10.
    Gutiérrez, S.K., et al.: Accommodating thread-level heterogeneity in coupled parallel applications. In: 2017 IEEE International Parallel & Distributed Processing Symposium (IPDPS), Orlando, Florida (2017)Google Scholar
  11. 11.
    Hoberock, J., Garland, M., Kohlhoff, C., Mysen, C., Edwards, C., Hollman, D.: P0443r10: a unified executors proposal for C++, January 2019.
  12. 12.
    Hollman, D., Kohlhoff, C., Lelbach, B., Hoberock, J., Brown, G., Dominiak, M.: P1393r0: a general property customization mechanism, January 2019.
  13. 13.
    Hornung, R., Keasler, J.: The RAJA portability layer: overview and status. Technical report, Lawrence Livermore National Laboratory (LLNL), Livermore, CA (2014)Google Scholar
  14. 14.
    Leiserson, C.E.: The Cilk++ concurrency platform. J. Supercomput. 51(3), 244–257 (2010)CrossRefGoogle Scholar
  15. 15.
    Pan, H., Hindman, B., Asanović, K.: Composing parallel software efficiently with lithe. ACM Sigplan Not. 45(6), 376–387 (2010)CrossRefGoogle Scholar
  16. 16.
    Seo, S., et al.: Argobots: a lightweight low-level threading and tasking framework. IEEE Trans. Parallel Distrib. Syst. 29(3), 512–526 (2018)CrossRefGoogle Scholar
  17. 17.
    Shoop, K., Niebler, E., Howes, L.: P1055r0: a modest executor proposal, April 2018.
  18. 18.
    Sutter, H.: Trip report: winter ISO C++ standards meeting (Kona), February 2019.
  19. 19.
    Wheeler, K.B., Murphy, R.C., Thain, D.: Qthreads: an API for programming with millions of lightweight threads. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8. IEEE (2008)Google Scholar

Copyright information

© This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2019

Authors and Affiliations

  • Thomas R. W. Scogland
    • 1
    Email author
  • Dan Sunderland
    • 2
  • Stephen L. Olivier
    • 2
  • David S. Hollman
    • 2
  • Noah Evans
    • 2
  • Bronis R. de Supinski
    • 1
  1. 1.Lawrence Livermore National LaboratoryLivermoreUSA
  2. 2.Center for Computing ResearchSandia National LaboratoriesAlbuquerqueUSA

Personalised recommendations