Abstract
OpenMP 3.0 introduced the concept of asynchronous tasks, independent units of work that may be dynamically created and scheduled. Task synchronization is accomplished via the insertion of taskwait and barrier constructs. However, the inappropriate use of these constructs may incur significant overhead owing to global synchronizations for specific algorithms on large platforms. The performance of such algorithms may benefit substantially if a mechanism of specifying finer gained point-to-point synchronization between tasks is available. In this paper we present extensions to the current OpenMP task directive to enable the specification of dependencies among tasks. A task waits only until the explicit dependencies as specified by the programmer are satisfied, thereby enabling support for a dataflow model within OpenMP. We evaluate the extensions implemented in the OpenUH OpenMP compiler using LU decomposition and Smith-Waterman algorithms. By applying the extensions to the two algorithms, we demonstrate significant performance improvement over the standard tasking versions. When comparing our results with those obtained using related dataflow models - OmpSs and QUARK, we observed that the versions using our task extensions delivered an average speedup of 2-6x.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Intel Concurrent Collections, http://software.intel.com/en-us/articles/intel-concurrent-collections-for-cc/
OpenMP 4.0 release candidate 2, http://www.openmp.org/mp-documents/OpenMP_4.0_RC2.pdf/
Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: The plasma and magma projects. In: Journal of Physics: Conference Series. vol. 180, p. 012037. IOP Publishing (2009)
Chapman, B., Eachempati, D., Hernandez, O.: Experiences developing the openuh compiler and runtime infrastructure. International Journal of Parallel Programming, 1–30 (2012)
Dallou, T., Juurlink, B.: Hardware-based task dependency resolution for the starss programming model. In: 2012 41st International Conference on Parallel Processing Workshops (ICPPW), pp. 367–374. IEEE (2012)
Desprez, F., Domas, S., Tourancheau, B.: Optimization of the scalapack lu factorization routine using communication/computation overlap. In: Euro-Par 1996 Parallel Processing, pp. 1–10. Springer (1996)
Dios, A.J., Asenjo, R., Navarro, A., Corbera, F., Zapata, E.L.: Evaluation of the task programming model in the parallelization of wavefront problems. In: 2010 12th IEEE International Conference on High Performance Computing and Communications (HPCC), pp. 257–264. IEEE (2010)
Duran, A., Perez, J.M., Ayguadé, E., Badia, R.M., Labarta, J.: Extending the openMP tasking model to allow dependent tasks. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 111–122. Springer, Heidelberg (2008)
Ghosh, P., Yan, Y., Chapman, B.: Support for dependency driven executions among openmp tasks. In: Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM 2012) in conjunction with PACT (September 2012)
Haidar, A., Ltaief, H., Luszczek, P., Dongarra, J.: A comprehensive study of task coalescing for selecting parallelism granularity in a two-stage bidiagonal reduction. In: 2012 IEEE 26th International Parallel & Distributed Processing Symposium (IPDPS), pp. 25–35. IEEE (2012)
Olivier, S.L., de Supinski, B.R., Schulz, M., Prins, J.F.: Characterizing and mitigating work time inflation in task parallel programs. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 65:1–65:12. IEEE Computer Society Press, Los Alamitos (2012)
Taşırlar, S., Sarkar, V.: Data-Driven Tasks and their Implementation. In: Proceedings of the International Conference on Parallel Processing (September 2011)
Vajracharya, S., Karmesin, S., Beckman, P., Crotinger, J., Malony, A., Shende, S., Oldehoeft, R., Smith, S.: Smarts: Exploiting temporal locality and parallelism through vertical execution. In: Proceedings of the 13th International Conference on Supercomputing, pp. 302–310. ACM (1999)
Weng, T.H.: Translation of OpenMP to Dataflow Execution Model for Data locality and Efficient Parallel Execution. PhD thesis, Department of Computer Science, University of Houston (2003)
Yan, Y., Chatterjee, S., Orozco, D.A., Garcia, E., Budimlić, Z., Shirako, J., Pavel, R.S., Gao, G.R., Sarkar, V.: Hardware and software tradeoffs for task synchronization on manycore architectures. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 6853, pp. 112–123. Springer, Heidelberg (2011)
YarKhan, A., Kurzak, J., Dongarra, J.: Quark users guide: Queueing and runtime for kernels. University of Tennessee Innovative Computing Laboratory Technical Report ICL-UT-11-02 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ghosh, P., Yan, Y., Eachempati, D., Chapman, B. (2013). A Prototype Implementation of OpenMP Task Dependency Support. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds) OpenMP in the Era of Low Power Devices and Accelerators. IWOMP 2013. Lecture Notes in Computer Science, vol 8122. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40698-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-40698-0_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40697-3
Online ISBN: 978-3-642-40698-0
eBook Packages: Computer ScienceComputer Science (R0)