Abstract
Tiling is widely used by compilers and programmer to optimize scientific and engineering code for better performance. Many parallel programming languages support tile/tiling directly through first-class language constructs or library routines. However, the current OpenMP programming language is tile oblivious, although it is the de facto standard for writing parallel programs on shared memory systems. In this paper, we introduce tile aware parallelization into OpenMP. We propose tile reduction, an OpenMP tile aware parallelization technique that allows reduction to be performed on multi-dimensional arrays. The paper has three contributions: (a) it is the first paper that proposes and discusses tile aware parallelization in OpenMP. We argue that, it is not only necessary but also possible to have tile aware parallelization in OpenMP; (b) the paper introduces the methods used to implement tile reduction, including the required OpenMP API extension and the associated code generation techniques; (c) we have applied tile reduction on a set of benchmarks. The experimental results show that tile reduction can make parallelization more natural and flexible. It not only can expose more parallelism in a program, but also can improve its data locality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anderson, J.M., Amarasinghe, S.P., Lam, M.S.: Data and computation transformations for multiprocessors. In: Proceedings of the Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming, Santa Barbara, California, July 19–21, pp. 166–178 (1995); SIGPLAN Notices 30(8) (August 1995)
Anderson, J.M., Lam, M.S.: Global optimizations for parallelism and locality on scalable parallel machines. In: Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation, Albuquerque, New Mexico, June 23–25, pp. 112–125 (1993); SIGPLAN Notices 28(6) (June 1993)
Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, Toronto, Ontario, June 26–28, pp. 30–44 (1991); SIGPLAN Notices 26(6) (June 1991)
Lim, A.W., Lam, M.S.: Maximizing parallelism and minimizing synchronization with affine transforms. In: Conference Record of POPL 1997: The 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Paris, January 15–17, pp. 201–214 (1997)
High Performance Fortran Forum: High-performance fortran language specification version 2.0. Technical report, Rice University (1997)
El-Ghazawi, T., Carlson, W., Sterling, T., Yelick, K.: UPC: Distributed Shared-Memory Programming. Wiley-Interscience, Hoboken (2003)
Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In: OOPSLA 2005: Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming, systems, languages and applications, pp. 519–538. ACM, New York (2005)
Deitz, S.J.: High-level programming language abstractions for advanced and dynamic parallel computations. Ph.D thesis, Seattle, WA, USA, Chair-Lawrence Snyder (2005)
Dotsenko, Y., Coarfa, C., Mellor-Crummey, J.: A multi-platform co-array fortran compiler. In: PACT 2004: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, Washington, DC, USA, pp. 29–40. IEEE Computer Society, Los Alamitos (2004)
Hilfinger, P.N., Bonachea, D., Gay, D., Graham, S., Liblit, B., Pike, G., Yelick, K.: Titanium language reference manual. Technical report, Berkeley, CA, USA (2001)
Guo, J., Bikshandi, G., Fraguela, B.B., Garzaran, M.J., Padua, D.: Programming with tiles. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 111–122. ACM, New York (2008)
UPC Consortium: UPC Collective Operations Specifications V1.0 A publication of the UPC Consortium (2003)
Forum, M.P.I.: MPI: A message-passing interface standard (version 1.0). Technical report (May 1994), http://www.mcs.anl.gov/mpi/mpi-report.ps
Deitz, S.J., Chamberlain, B.L., Choi, S.E., Snyder, L.: The design and implementation of a parallel array operator for the arbitrary remapping of data. In: PPoPP 2003: Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 155–166. ACM, New York (2003)
OpenMP Architecture Review Board: OpenMP Application Program Interface Version 3.0 (May 2008), http://www.openmp.org/mp-documents/spec30.pdf
Deitz, S.J., Chamberlain, B.L., Snyder, L.: High-level language support for user-defined reductions. J. Supercomput. 23(1), 23–37 (2002)
Kusano, K., Satoh, S., Sato, M.: Performance evaluation of the omni openmp compiler. In: Valero, M., Joe, K., Kitsuregawa, M., Tanaka, H. (eds.) ISHPC 2000. LNCS, vol. 1940, pp. 403–414. Springer, Heidelberg (2000)
Viswanathan, G., Larus, J.R.: User-defined reductions for efficient communication in data-parallel languages. Technical Report 1293, University of Wisconsin-Madison (January 1996)
Scholz, S.B.: On defining application-specific high-level array operations by means of shape-invariant programming facilities. In: APL 1998: Proceedings of the APL 1998 conference on Array processing language, pp. 32–38. ACM, New York (1998)
Kambadur, P., Gregor, D., Lumsdaine, A.: Openmp extensions for generic libraries. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 123–133. Springer, Heidelberg (2008)
Knight, T.J., Park, J.Y., Ren, M., Houston, M., Erez, M., Fatahalian, K., Aiken, A., Dally, W.J., Hanrahan, P.: Compilation for explicitly managed memory hierarchies. In: PPoPP 2007: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 226–236. ACM, New York (2007)
Eichenberger, A.E., O’Brien, K., O’Brien, K., Wu, P., Chen, T., Oden, P.H., Prener, D.A., Shepherd, J.C., So, B., Sura, Z., Wang, A., Zhang, T., Zhao, P., Gschwind, M.: Optimizing compiler for the cell processor. In: PACT 2005: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, Washington, DC, USA, pp. 161–172. IEEE Computer Society, Los Alamitos (2005)
del Cuvillo, J., Zhu, W., Hu, Z., Gao, G.R.: Fast: A functionally accurate simulation toolset for the cyclops-64 cellular architecture. In: Workshop on Modeling, Benchmarking and Simulation (MoBS 2005) of ISCA 2005, Madison, Wisconsin (June 2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gan, G., Wang, X., Manzano, J., Gao, G.R. (2009). Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds) Evolving OpenMP in an Age of Extreme Parallelism. IWOMP 2009. Lecture Notes in Computer Science, vol 5568. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02303-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-02303-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02284-5
Online ISBN: 978-3-642-02303-3
eBook Packages: Computer ScienceComputer Science (R0)