Skip to main content

Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP

  • Conference paper
Evolving OpenMP in an Age of Extreme Parallelism (IWOMP 2009)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 5568))

Included in the following conference series:

Abstract

Tiling is widely used by compilers and programmer to optimize scientific and engineering code for better performance. Many parallel programming languages support tile/tiling directly through first-class language constructs or library routines. However, the current OpenMP programming language is tile oblivious, although it is the de facto standard for writing parallel programs on shared memory systems. In this paper, we introduce tile aware parallelization into OpenMP. We propose tile reduction, an OpenMP tile aware parallelization technique that allows reduction to be performed on multi-dimensional arrays. The paper has three contributions: (a) it is the first paper that proposes and discusses tile aware parallelization in OpenMP. We argue that, it is not only necessary but also possible to have tile aware parallelization in OpenMP; (b) the paper introduces the methods used to implement tile reduction, including the required OpenMP API extension and the associated code generation techniques; (c) we have applied tile reduction on a set of benchmarks. The experimental results show that tile reduction can make parallelization more natural and flexible. It not only can expose more parallelism in a program, but also can improve its data locality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, J.M., Amarasinghe, S.P., Lam, M.S.: Data and computation transformations for multiprocessors. In: Proceedings of the Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming, Santa Barbara, California, July 19–21, pp. 166–178 (1995); SIGPLAN Notices 30(8) (August 1995)

    Google Scholar 

  2. Anderson, J.M., Lam, M.S.: Global optimizations for parallelism and locality on scalable parallel machines. In: Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation, Albuquerque, New Mexico, June 23–25, pp. 112–125 (1993); SIGPLAN Notices 28(6) (June 1993)

    Google Scholar 

  3. Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, Toronto, Ontario, June 26–28, pp. 30–44 (1991); SIGPLAN Notices 26(6) (June 1991)

    Google Scholar 

  4. Lim, A.W., Lam, M.S.: Maximizing parallelism and minimizing synchronization with affine transforms. In: Conference Record of POPL 1997: The 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Paris, January 15–17, pp. 201–214 (1997)

    Google Scholar 

  5. High Performance Fortran Forum: High-performance fortran language specification version 2.0. Technical report, Rice University (1997)

    Google Scholar 

  6. El-Ghazawi, T., Carlson, W., Sterling, T., Yelick, K.: UPC: Distributed Shared-Memory Programming. Wiley-Interscience, Hoboken (2003)

    Google Scholar 

  7. Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In: OOPSLA 2005: Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming, systems, languages and applications, pp. 519–538. ACM, New York (2005)

    Chapter  Google Scholar 

  8. Deitz, S.J.: High-level programming language abstractions for advanced and dynamic parallel computations. Ph.D thesis, Seattle, WA, USA, Chair-Lawrence Snyder (2005)

    Google Scholar 

  9. Dotsenko, Y., Coarfa, C., Mellor-Crummey, J.: A multi-platform co-array fortran compiler. In: PACT 2004: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, Washington, DC, USA, pp. 29–40. IEEE Computer Society, Los Alamitos (2004)

    Chapter  Google Scholar 

  10. Hilfinger, P.N., Bonachea, D., Gay, D., Graham, S., Liblit, B., Pike, G., Yelick, K.: Titanium language reference manual. Technical report, Berkeley, CA, USA (2001)

    Google Scholar 

  11. Guo, J., Bikshandi, G., Fraguela, B.B., Garzaran, M.J., Padua, D.: Programming with tiles. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 111–122. ACM, New York (2008)

    Google Scholar 

  12. UPC Consortium: UPC Collective Operations Specifications V1.0 A publication of the UPC Consortium (2003)

    Google Scholar 

  13. Forum, M.P.I.: MPI: A message-passing interface standard (version 1.0). Technical report (May 1994), http://www.mcs.anl.gov/mpi/mpi-report.ps

  14. Deitz, S.J., Chamberlain, B.L., Choi, S.E., Snyder, L.: The design and implementation of a parallel array operator for the arbitrary remapping of data. In: PPoPP 2003: Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 155–166. ACM, New York (2003)

    Google Scholar 

  15. OpenMP Architecture Review Board: OpenMP Application Program Interface Version 3.0 (May 2008), http://www.openmp.org/mp-documents/spec30.pdf

  16. Deitz, S.J., Chamberlain, B.L., Snyder, L.: High-level language support for user-defined reductions. J. Supercomput. 23(1), 23–37 (2002)

    Article  MATH  Google Scholar 

  17. Kusano, K., Satoh, S., Sato, M.: Performance evaluation of the omni openmp compiler. In: Valero, M., Joe, K., Kitsuregawa, M., Tanaka, H. (eds.) ISHPC 2000. LNCS, vol. 1940, pp. 403–414. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  18. Viswanathan, G., Larus, J.R.: User-defined reductions for efficient communication in data-parallel languages. Technical Report 1293, University of Wisconsin-Madison (January 1996)

    Google Scholar 

  19. Scholz, S.B.: On defining application-specific high-level array operations by means of shape-invariant programming facilities. In: APL 1998: Proceedings of the APL 1998 conference on Array processing language, pp. 32–38. ACM, New York (1998)

    Google Scholar 

  20. Kambadur, P., Gregor, D., Lumsdaine, A.: Openmp extensions for generic libraries. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 123–133. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  21. Knight, T.J., Park, J.Y., Ren, M., Houston, M., Erez, M., Fatahalian, K., Aiken, A., Dally, W.J., Hanrahan, P.: Compilation for explicitly managed memory hierarchies. In: PPoPP 2007: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 226–236. ACM, New York (2007)

    Google Scholar 

  22. Eichenberger, A.E., O’Brien, K., O’Brien, K., Wu, P., Chen, T., Oden, P.H., Prener, D.A., Shepherd, J.C., So, B., Sura, Z., Wang, A., Zhang, T., Zhao, P., Gschwind, M.: Optimizing compiler for the cell processor. In: PACT 2005: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, Washington, DC, USA, pp. 161–172. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  23. del Cuvillo, J., Zhu, W., Hu, Z., Gao, G.R.: Fast: A functionally accurate simulation toolset for the cyclops-64 cellular architecture. In: Workshop on Modeling, Benchmarking and Simulation (MoBS 2005) of ISCA 2005, Madison, Wisconsin (June 2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gan, G., Wang, X., Manzano, J., Gao, G.R. (2009). Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds) Evolving OpenMP in an Age of Extreme Parallelism. IWOMP 2009. Lecture Notes in Computer Science, vol 5568. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02303-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02303-3_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02284-5

  • Online ISBN: 978-3-642-02303-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics