Abstract
OpenMP is a widely used programming standard for a broad range of parallel systems. In the OpenMP programming model, synchronization points are specified by implicit or explicit barrier operations within a parallel region. However, certain classes of computations, such as stencil algorithms, can be supported with better synchronization efficiency and data locality when using doacross parallelism with point-to-point synchronization than wavefront parallelism with barrier synchronization. In this paper, we propose new synchronization constructs to enable doacross parallelism in the context of the OpenMP programming model. Experimental results on a 32-core IBM Power7 system using four benchmark programs show performance improvements of the proposed doacross approach over OpenMP barriers by factors of 1.4× to 5.2× when using all 32 cores.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Dagum, L., Menon, R.: OpenMP: An industry standard API for shared memory programming. IEEE Computational Science & Engineering (1998)
OpenMP specifications, http://openmp.org/wp/openmp-specifications
Cytron, R.: Doacross: Beyond vectorization for multiprocessors. In: Proceedings of the 1986 International Conference for Parallel Processing, pp. 836–844 (1986)
Mellor-Crummey, J., Scott, M.: Algorithms for Scalable Synchronization on Shared Memory Multiprocessors. ACM Transactions on Computer Systems 9(1), 21–65 (1991)
N. A. S. Division, NAS Parallel Benchmarks Changes, http://www.nas.nasa.gov/publications/npb_changes.html#url
Jin, H., Frumkin, M., Yan, J.: The openmp implementation of nas parallel benchmarks and its performance. Tech. Rep. (1999)
Unnikrishnan, P., Shirako, J., Barton, K., Chatterjee, S., Silvera, R., Sarkar, V.: A practical approach to doacross parallelization. In: International European Conference on Parallel and Distributed Computing, Euro-Par (2012)
Padua, D.A.: Multiprocessors: Discussion of sometheoretical and practical problems. PhD thesis, Department of Computer Science, University of Illinois, Urbana, Illinois (October 1979)
Midkiff, S.P., Padua, D.A.: Compiler algorithms for synchronization. IEEE Transactions on Computers C-36, 1485–1495 (1987)
Shirako, J., et al.: Phasers: a unified deadlock-free construct for collective and point-to-point synchronization. In: ICS 2008: Proceedings of the 22nd Annual International Conference on Supercomputing, pp. 277–288. ACM, New York (2008)
Miller, A.: Set your Java 7 Phasers to stun (2008), http://tech.puredanger.com/2008/07/08/java7-phasers/
Su, H.-M., Yew, P.-C.: On data synchronization for multiprocessors. In: Proc. of the 16th Annual International Symposium on Computer Architecture, Jerusalem, Israel, pp. 416–423 (April 1989)
Tang, C.Z.P., Yew, P.: Compiler techniques for data synchronization in nested parallel loop. In: Proc. of 1990 ACM Intl. Conf. on Supercomputing, Amsterdam, Amsterdam, pp. 177–186 (June 1990)
Li, Z.: Compiler algorithms for event variable synchronization. In: Proceedings of the 5th International Conference on Supercomputing, Cologne, West, Germany, pp. 85–95 (June 1991)
Ding-Kai Chen, P.-C.Y., Torrellas, J.: An efficient algorithm for the run-time parallelization of doacross loops. In: Proc. Supercomputing 1994, pp. 518–527 (1994)
Lowenthal, D.K.: Accurately selecting block size at run time in pipelined parallel programs. International Journal of Parallel Programming 28(3), 245–274 (2000)
Manjikian, N., Abdelrahman, T.S.: Exploiting wavefront parallelism on large-scale shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems 12(3), 259–271 (2001)
Pan, Z., Armstrong, B., Bae, H., Eigenmann, R.: On the interaction of tiling and automatic parallelization. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005 and IWOMP 2006. LNCS, vol. 4315, pp. 24–35. Springer, Heidelberg (2008)
Krothapalli, P.S.V.P.: Removal of redundant dependences in doacross loops with constant dependences. IEEE Transactions on Parallel and Distributed Systems, 281–289 (July 1991)
Chen, D.-K.: Compiler optimizations for parallel loops with fine-grained synchronization. PhD Thesis (1994)
Rajamony, A.L.C.R.: Optimally synchronizing doacross loops on shared memory multiprocessors. In: Proc. of Intl. Conf. on Parallel Architectures and Compilation Techniques (November 1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shirako, J., Unnikrishnan, P., Chatterjee, S., Li, K., Sarkar, V. (2013). Expressing DOACROSS Loop Dependences in OpenMP. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds) OpenMP in the Era of Low Power Devices and Accelerators. IWOMP 2013. Lecture Notes in Computer Science, vol 8122. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40698-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-40698-0_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40697-3
Online ISBN: 978-3-642-40698-0
eBook Packages: Computer ScienceComputer Science (R0)