Patterns for OpenMP Task Data Dependency Overhead Measurements
Starting with version 4.0, the OpenMP standard has introduced data dependencies to provide a way for synchronizing the concurrent execution of task based on dataflow information. This indirect approach to fine-grained sychronization offers a convenient way for creating a task graph without having to explicitly synchronize individual tasks and can be used to parallelize both regular and irregular applications to expose a higher level of concurrency to the runtime system. However, the cost associated with task creation and management, including matching input and output dependencies, is a crucial factor in designing the granularity of individual tasks, i.e., the amount of work to encapsulate in a task. In this work, we present a set of benchmarks designed to determine the overhead associated with dependency management and give an overview of the performance characteristics of a set of compilers widely used in parallel computing. We hope to provide application developers with a way to make informed decisions on the granularity of their tasks given the dependency patterns dictated by the algorithm. Our benchmark results show varying performance characteristics of different implementations that are both interesting and important to have in mind throughout the task design process.
Part of this work has been supported by the European Community through the project Mont Blanc 3 (H2020 programme under grant agreement number 671697). We gratefully acknowledge funding by the German Research Foundation (DFG) through the project SmartDASH under the German Priority Programme 1648 Software for Exascale Computing (SPPEXA). The authors would like to thank Christoph Niethammer for his initial input.
- 1.OpenMP Application Programming Interface, Version 4.5, November 2015. http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf. Accessed 2 June 2017
- 4.Chasapis, D., Casas, M., Moretó, M., Vidal, R., Ayguadé, E., Labarta, J., Valero, M.: PARSECSs: evaluating the impact of task parallelism in the PARSEC benchmark suite. ACM Trans. Archit. Code Optim. 12(4) (2015). Article No. 41Google Scholar
- 5.Contreras, G., Martonosi, M.: Characterizing and improving the performance of intel threading building blocks. In: IEEE International Symposium on Workload Characterization, September 2008Google Scholar
- 6.Cray Inc.: Cray C and C++ Reference Manual (8.5), June 2016. http://docs.cray.com/PDF/Cray_C_and_Cplusplus_Reference_Manual_85.pdf. Accessed 2 Mar 2017
- 7.Dallou, T., Engelhardt, N., Elhossini, A., Juurlink, B.H.H.: Nexus#: a distributed hardware task manager for task-based programming models. In: IEEE International Parallel and Distributed Processing Symposium, IPDPS (2015)Google Scholar
- 8.Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: International Conference on Parallel Processing, September 2009Google Scholar
- 9.Duran, A., Ayguadé, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. 21(02), 173–193 (2011)Google Scholar
- 10.Lagrone, J., Aribuki, A., Chapman, B.: A set of microbenchmarks for measuring OpenMP task overheads. In: International Conference on Parallel and Distributed Processing Techniques and Applications (2011)Google Scholar
- 11.Müller, M.S., Baron, J., Brantley, W.C., Feng, H., Hackenberg, D., Henschel, R., Jost, G., Molka, D., Parrott, C., Robichaux, J., Shelepugin, P., van Waveren, M., Whitney, B., Kumaran, K.: SPEC OMP2012 – An Application Benchmark Suite for Parallel Systems Using OpenMP. Springer, Heidelberg (2012)CrossRefGoogle Scholar
- 12.Perez, J., Badia, R., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: IEEE International Conference on Cluster Computing, September 2008Google Scholar
- 13.PGI Compilers and Tools: PGI Compiler User’s Guide for Intel 64 and AMD64C PUs. http://www.pgroup.com/doc/pgiug-x64.pdf. Accessed 2 Mar 2017
- 14.Virouleau, P., Brunet, P., Broquedis, F., Furmento, N., Thibault, S., Aumage, O., Gautier, T.: Evaluation of OpenMP dependent tasks with the KASTORS benchmark suite. In: DeRose, L., Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 16–29. Springer, Cham (2014). doi: 10.1007/978-3-319-11454-5_2 Google Scholar
- 15.Yazdanpanah, F., Álvarez, C., Jiménez-González, D., Badia, R.M., Valero, M.: Picos: a hardware runtime architecture support for OmpSs. Future Gener. Comput. Syst. 53, 130–139 (2015)Google Scholar