Abstract
Tasks offer a natural mechanism to express asynchronous operations in OpenMP as well as to express parallel patterns with dynamic sizes and shapes. Since the release of OpenMP 4 task dependencies have made an already flexible tool practical in many more situations. Even so, while tasks can be made asynchronous with respect to the encountering thread, there are no mechanisms to tie an OpenMP task into a truly asynchronous operation outside of OpenMP without blocking an OpenMP thread. Additionally, producer/consumer parallel patterns, or more generally pipeline parallel patterns, suffer from the lack of a convenient and efficient point-to-point synchronization and data passing mechanism. This paper presents a set of extensions, leveraging the task and dependency mechanisms, that can help users and implementers tie tasks into other asynchronous systems and more naturally express pipeline parallelism while decreasing the overhead of passing data between otherwise small tasks by as much as 80 %.
The rights of this work are transferred to the extent transferable according to title 17 U.S.C. 105.
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. (LLNL-CONF-694789).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
CUDA programming guide (2007). http://docs.nvidia.com/cuda/cuda-c-programming-guide/
The OpenCL Specification, November 2012. https://www.khronos.org/registry/cl/specs/opencl-1.2.pdf
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 863–874. Springer, Heidelberg (2009). http://www.springerlink.com/index/h013578235633mw3.pdf
Bueno, J., Planas, J., Duran, A., Badia, R.M., Martorell, X., Ayguadé, E., Labarta, J.: Productive programming of GPU clusters with OmpSs. In: International Parallel and Distributed Processing Symposium, pp. 557–568 (2012). http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6267858&contentType=Conference+Publications&matchBoolean%3Dtrue%26rowsPerPage%3D30%26searchField%3DSearch_All%26queryText%3D%28%22Productive+Programming+of+GPU+Clusters+with+OmpSs%22%29
Dijkstra, E.W.: Cooperating sequential processes. In: Hansen, P.B. (ed.) The Origin of Concurrent Programming, pp. 65–138. Springer, New York (1968)
Duran, A., Ayguadé, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. 21(2), 173–193 (2011). http://www.worldscinet.com/abstract?id=pii:S0129626411000151
Forum, M.P.I.: MPI: a message-passing interface standard. Technical report (1994). http://citeseer.ist.psu.edu/article/forum94mpi.html
Hoare, C.A.R.: Communicating sequential processes. Commun. ACM 21(8), 666–677 (1978). http://portal.acm.org/citation.cfm?doid=359576.359585
Pike, R.: The go programming language. Talk given at Google’s Tech Talks (2009)
Scogland, T.R.W., Feng, W.: Design and evaluation of scalable concurrent queues for many-core architectures. In: ACM/SPEC International Conference on Performance Engineering (ICPE), February 2015
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Scogland, T., de Supinski, B. (2016). A Case for Extending Task Dependencies. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham. https://doi.org/10.1007/978-3-319-45550-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-45550-1_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45549-5
Online ISBN: 978-3-319-45550-1
eBook Packages: Computer ScienceComputer Science (R0)