Polygonal Iteration Space Partitioning
This work presents a new set of loop transformations to expose and maximize data locality in loop-nests with non-uniform reuse patterns. The proposed set of transformations use the norms of the Polyhedral Model to represent loop-nests and then leverages such a representation to partition the iteration space into polygonally shaped partitions with maximum locality. However, the partitioning algorithm tends to produce partitions with complex geometry (shape) and with progressively smaller number of iterations, which, in practice, introduces much run-time overhead. This work also focuses on containing the number of partitions and properly manage their geometry at run-time, to contain unnecessary overhead. The proposed transformations also exposes loop level parallelism, by grouping together independent iterations, thus improving performance of both serial and parallel execution. In parallel execution a selective mapping of partitions to threads based on the type of reuse these partitions exhibit is proposed.
The proposed transformations show a consistent performance speedup on serial execution (up to 1.2x over Polly) and parallel execution (up to 3.17x over PLuTo) of some loop-nests.
KeywordsPolygonal partitions Shape and size independent tiling Temporal locality Polyhedral model
We would like to thank Benoît Meister and Vincent Loechner for providing us with their implementation which laid the foundation for this work. This work was supported in part by NSF award XPS 1533926.
- 1.Agarwal, A., et al.: Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors. TPDS 6(9), 943–962 (1995)Google Scholar
- 2.Bandishti, V., et al.: Tiling stencil computations to maximize parallelism. In: SC 2012, pp. 40:1–40:11. IEEE Computer Society Press, Los Alamitos (2012)Google Scholar
- 3.Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: PACT 13, Juan-les-Pins, France, pp. 7–16, September 2004Google Scholar
- 4.Bondhugula, U., et al.: A practical automatic polyhedral program optimization system. In: PLDI, June 2008Google Scholar
- 5.Grosser, T., et al.: Hybrid hexagonal/classical tiling for GPUs. In: CGO 2014, pp. 66:66–66:75. ACM, New York (2014)Google Scholar
- 6.Hartono, A., et al.: DynTile: parametric tiled loop generation for parallel execution on multicore processors. In: IPDPS 2010, pp. 1–12, April 2010Google Scholar
- 7.Irigoin, F., Triolet, R.: Supernode partitioning. In: POPL 1988, pp. 319–329. ACM, New York (1988)Google Scholar
- 8.Kim, D., et al.: Multi-level tiling: M for the price of one. In: SC 2007, pp. 1–12, November 2007Google Scholar
- 9.Meister, B., Loechner, V., Clauss, P.: The polytope model for optimizing cache locality. Technical report, Technical report RR 00–03, ICPS-LSIIT (2000)Google Scholar
- 11.Renganarayanan, L., et al.: Parameterized tiled loops for free. In: PLDI 2007, pp. 405–414. ACM, New York (2007)Google Scholar
- 12.Wolfe, M.: Iteration space tiling for memory hierarchies. In: Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing, pp. 357–361. SIAM, Philadelphia (1989)Google Scholar
- 13.Wolfe, M.: More iteration space tiling. In: SC 1989, pp. 655–664. ACM, New York (1989)Google Scholar