Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP

Gan, Ge; Wang, Xu; Manzano, Joseph; Gao, Guang R.

doi:10.1007/978-3-642-02303-3_12

Ge Gan¹⁹,
Xu Wang²⁰,
Joseph Manzano¹⁹ &
…
Guang R. Gao¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 5568))

Included in the following conference series:

International Workshop on OpenMP

727 Accesses
4 Citations

Abstract

Tiling is widely used by compilers and programmer to optimize scientific and engineering code for better performance. Many parallel programming languages support tile/tiling directly through first-class language constructs or library routines. However, the current OpenMP programming language is tile oblivious, although it is the de facto standard for writing parallel programs on shared memory systems. In this paper, we introduce tile aware parallelization into OpenMP. We propose tile reduction, an OpenMP tile aware parallelization technique that allows reduction to be performed on multi-dimensional arrays. The paper has three contributions: (a) it is the first paper that proposes and discusses tile aware parallelization in OpenMP. We argue that, it is not only necessary but also possible to have tile aware parallelization in OpenMP; (b) the paper introduces the methods used to implement tile reduction, including the required OpenMP API extension and the associated code generation techniques; (c) we have applied tile reduction on a set of benchmarks. The experimental results show that tile reduction can make parallelization more natural and flexible. It not only can expose more parallelism in a program, but also can improve its data locality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anderson, J.M., Amarasinghe, S.P., Lam, M.S.: Data and computation transformations for multiprocessors. In: Proceedings of the Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming, Santa Barbara, California, July 19–21, pp. 166–178 (1995); SIGPLAN Notices 30(8) (August 1995)
Google Scholar
Anderson, J.M., Lam, M.S.: Global optimizations for parallelism and locality on scalable parallel machines. In: Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation, Albuquerque, New Mexico, June 23–25, pp. 112–125 (1993); SIGPLAN Notices 28(6) (June 1993)
Google Scholar
Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, Toronto, Ontario, June 26–28, pp. 30–44 (1991); SIGPLAN Notices 26(6) (June 1991)
Google Scholar
Lim, A.W., Lam, M.S.: Maximizing parallelism and minimizing synchronization with affine transforms. In: Conference Record of POPL 1997: The 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Paris, January 15–17, pp. 201–214 (1997)
Google Scholar
High Performance Fortran Forum: High-performance fortran language specification version 2.0. Technical report, Rice University (1997)
Google Scholar
El-Ghazawi, T., Carlson, W., Sterling, T., Yelick, K.: UPC: Distributed Shared-Memory Programming. Wiley-Interscience, Hoboken (2003)
Google Scholar
Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In: OOPSLA 2005: Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming, systems, languages and applications, pp. 519–538. ACM, New York (2005)
Chapter Google Scholar
Deitz, S.J.: High-level programming language abstractions for advanced and dynamic parallel computations. Ph.D thesis, Seattle, WA, USA, Chair-Lawrence Snyder (2005)
Google Scholar
Dotsenko, Y., Coarfa, C., Mellor-Crummey, J.: A multi-platform co-array fortran compiler. In: PACT 2004: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, Washington, DC, USA, pp. 29–40. IEEE Computer Society, Los Alamitos (2004)
Chapter Google Scholar
Hilfinger, P.N., Bonachea, D., Gay, D., Graham, S., Liblit, B., Pike, G., Yelick, K.: Titanium language reference manual. Technical report, Berkeley, CA, USA (2001)
Google Scholar
Guo, J., Bikshandi, G., Fraguela, B.B., Garzaran, M.J., Padua, D.: Programming with tiles. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 111–122. ACM, New York (2008)
Google Scholar
UPC Consortium: UPC Collective Operations Specifications V1.0 A publication of the UPC Consortium (2003)
Google Scholar
Forum, M.P.I.: MPI: A message-passing interface standard (version 1.0). Technical report (May 1994), http://www.mcs.anl.gov/mpi/mpi-report.ps
Deitz, S.J., Chamberlain, B.L., Choi, S.E., Snyder, L.: The design and implementation of a parallel array operator for the arbitrary remapping of data. In: PPoPP 2003: Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 155–166. ACM, New York (2003)
Google Scholar
OpenMP Architecture Review Board: OpenMP Application Program Interface Version 3.0 (May 2008), http://www.openmp.org/mp-documents/spec30.pdf
Deitz, S.J., Chamberlain, B.L., Snyder, L.: High-level language support for user-defined reductions. J. Supercomput. 23(1), 23–37 (2002)
Article MATH Google Scholar
Kusano, K., Satoh, S., Sato, M.: Performance evaluation of the omni openmp compiler. In: Valero, M., Joe, K., Kitsuregawa, M., Tanaka, H. (eds.) ISHPC 2000. LNCS, vol. 1940, pp. 403–414. Springer, Heidelberg (2000)
Chapter Google Scholar
Viswanathan, G., Larus, J.R.: User-defined reductions for efficient communication in data-parallel languages. Technical Report 1293, University of Wisconsin-Madison (January 1996)
Google Scholar
Scholz, S.B.: On defining application-specific high-level array operations by means of shape-invariant programming facilities. In: APL 1998: Proceedings of the APL 1998 conference on Array processing language, pp. 32–38. ACM, New York (1998)
Google Scholar
Kambadur, P., Gregor, D., Lumsdaine, A.: Openmp extensions for generic libraries. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 123–133. Springer, Heidelberg (2008)
Chapter Google Scholar
Knight, T.J., Park, J.Y., Ren, M., Houston, M., Erez, M., Fatahalian, K., Aiken, A., Dally, W.J., Hanrahan, P.: Compilation for explicitly managed memory hierarchies. In: PPoPP 2007: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 226–236. ACM, New York (2007)
Google Scholar
Eichenberger, A.E., O’Brien, K., O’Brien, K., Wu, P., Chen, T., Oden, P.H., Prener, D.A., Shepherd, J.C., So, B., Sura, Z., Wang, A., Zhang, T., Zhao, P., Gschwind, M.: Optimizing compiler for the cell processor. In: PACT 2005: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, Washington, DC, USA, pp. 161–172. IEEE Computer Society, Los Alamitos (2005)
Google Scholar
del Cuvillo, J., Zhu, W., Hu, Z., Gao, G.R.: Fast: A functionally accurate simulation toolset for the cyclops-64 cellular architecture. In: Workshop on Modeling, Benchmarking and Simulation (MoBS 2005) of ISCA 2005, Madison, Wisconsin (June 2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Dep. of Electrical and Computer Engineering, University of Delaware, Newark, Delaware, 19716, U.S.A.
Ge Gan, Joseph Manzano & Guang R. Gao
Dep. of Electrical Engineering, Jilin University, Jilin, China, 130000
Xu Wang

Authors

Ge Gan
View author publications
You can also search for this author in PubMed Google Scholar
Xu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Manzano
View author publications
You can also search for this author in PubMed Google Scholar
Guang R. Gao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Information Services and High Performance Computing (ZIH), Dresden University of Technology, 01062, Dresden, Germany
Matthias S. Müller
Lawrence Livermore National Laboratory, Center for Applied Scientific Computing, CA 94551-0808, Livermore, USA
Bronis R. de Supinski
Dept. of Computer Science, University of Houston, 501 Philip G. Hoffman Hall, 4800 Calhoun Rd, 77204-3475, Houston, TX, USA
Barbara M. Chapman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gan, G., Wang, X., Manzano, J., Gao, G.R. (2009). Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds) Evolving OpenMP in an Age of Extreme Parallelism. IWOMP 2009. Lecture Notes in Computer Science, vol 5568. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02303-3_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-02303-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02284-5
Online ISBN: 978-3-642-02303-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics