Abstract
OpenMP, as the de-facto standard programming model in symmetric multiprocessing for HPC, has seen its performance boosted continuously by the community, either through implementation enhancements or specification augmentations. Furthermore, the language has evolved from a prescriptive nature, as defined by the thread-centric model, to a descriptive behavior, as defined by the task-centric model. However, the overhead related to the orchestration of tasks is still relatively high. Applications exploiting very fine-grained parallelism and systems with a large number of cores available might fail on scaling.
In this work, we propose to include the concept of Task Dependency Graph (TDG) in the specification by introducing a new clause, named taskgraph, attached to task or target directives. By design, the TDG allows alleviating the overhead associated with the OpenMP tasking model, and it also facilitates linking OpenMP with other programming models that support task parallelism. According to our experiments, a GCC implementation of the taskgraph is able to significantly reduce the execution time of fine-grained task applications and increase their scalability with regard to the number of threads.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The execution has run in a node of the Marenostrum IV [1] supercomputer, equipped with an Intel Xeon Platinum 8160 CPU, having 2 sockets of 24 physical cores each, at 2.1 GHz and 33 MB L3 cache.
References
BSC: Marenostrum IV User’s Guide (2017). https://www.bsc.es/support/MareNostrum4-ug.pdf
Castello, A., Seo, S., Mayo, R., Balaji, P., Quintana-Orti, E.S., Pena, A.J.: GLTO: on the adequacy of lightweight thread approaches for openmp implementations. In: Proceedings of the International Conference on Parallel Processing, pp. 60–69 (2017)
Gautier, T., Perez, C., Richard, J.: On the impact of OpenMP task granularity. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 205–221. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_14
Giannozzi, P., et al.: Quantum espresso: a modular and open-source software project for quantum simulations of materials. J. Phys. Condens. Matter 21(39), 395502 (2009)
Kalray MPPA products (2021). https://www.kalrayinc.com/
Komatitsch, D., Tromp, J.: SPECFEM3D Cartesian (2021). https://github.com/geodynamics/specfem3d
Kukanov, A., Voss, M.J.: The foundations for scalable multi-core software in intel threading building blocks. Intel Technol. J. 11(4), 309–322 (2007). http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=79B311F4CEB9A4B610520177C7144D57?doi=10.1.1.71.8289&rep=rep1&type=pdf
Lagrone, J., Aribuki, A., Chapman, B.: A set of microbenchmarks for measuring OpenMP task overheads. In: Proceedingis of International Conference on Parallel and Distributed Processing Techniques and Applications II, pp. 594–600 (2011). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.217.9615&rep=rep1&type=pdf
Leiserson, C.E.: The Cilk++ concurrency platform. J. Supercomput. 51(3), 244–257 (2010)
Munera, A., Royuela, S., Quinones, E.: Towards a qualifiable OpenMP framework for embedded systems. In: Proceedings of the 2020 Design, Automation and Test in Europe Conference and Exhibition, DATE 2020, no. 2, pp. 903–908 (2020)
Nvidia: CUDA Graph programming guide (2021). https://docs.nvidia.com/cuda/cuda-c-programming-guide/#cuda-graphs
Olivier, S.L., Prins, J.F.: Evaluating OpenMP 3.0 run time systems on unbalanced task graphs. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 63–78. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02303-3_6
Perez, J.M., Beltran, V., Labarta, J., Ayguade, E.: Improving the integration of task nesting and dependencies in OpenMP. In: Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017, pp. 809–818 (2017)
Sainz, F., Mateo, S., Beltran, V., Bosque, J.L., Martorell, X., Ayguadé, E.: Leveraging OmpSs to exploit hardware accelerators. In: 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing, pp. 112–119. IEEE (2014)
Schuchart, J., Nachtmann, M., Gracia, J.: Patterns for OpenMP task data dependency overhead measurements. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) Scaling OpenMP for Exascale Performance and Portability, pp. 156–168. Springer International Publishing, Cham (2017)
Serrano, M.A., Melani, A., Vargas, R., Marongiu, A., Bertogna, M., Quiñones, E.: Timing characterization of OpenMP4 tasking model. In: 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES 2015, pp. 157–166 (2015)
Stpiczyński, P.: Language-based vectorization and parallelization using intrinsics, openmp, tbb and cilk plus. J. Supercomput. 74(4), 1461–1472 (2018)
TOP500 (2020). https://www.top500.org/lists/top500/2020/11/
Valero-Lara, P., Catalán, S., Martorell, X., Usui, T., Labarta, J.: sLASs: a fully automatic auto-tuned linear algebra library based on openmp extensions implemented in ompss (lass library). J. Parallel Distrib. Comput. 138, 153–171 (2020)
Yu, C., Royuela, S., Quiñones, E.: OpenMP to CUDA graphs: a compiler-based transformation to enhance the programmability of NVIDIA devices. In: Proceedings of the 23rd International Workshop on Software and Compilers for Embedded Systems, SCOPES 2020, pp. 42–47 (2020)
Acknowledgements
This work has been supported by the EU H2020 project AMPERE under the grant agreement no. 871669.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Yu, C., Royuela, S., Quiñones, E. (2021). Enhancing OpenMP Tasking Model: Performance and Portability. In: McIntosh-Smith, S., de Supinski, B.R., Klinkenberg, J. (eds) OpenMP: Enabling Massive Node-Level Parallelism. IWOMP 2021. Lecture Notes in Computer Science(), vol 12870. Springer, Cham. https://doi.org/10.1007/978-3-030-85262-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-85262-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85261-0
Online ISBN: 978-3-030-85262-7
eBook Packages: Computer ScienceComputer Science (R0)