Abstract
Code transformations in optimizing compilers can often be classified as loop transformations that change the execution order of statement instances and data layout transformations that change the memory layouts of variables. There is a mutually dependent relationship between the two, i.e., the best statement execution order can depend on the underlying data layout and vice versa. Existing approaches have typically addressed this inter-dependency by picking a specific phase order, and can thereby miss opportunities to co-optimize loop transformations and data layout transformations. In this paper, we propose a cost-based integration of loop and data layout transformations, aiming to cover a broader optimization space than phase-ordered strategies and thereby to find better solutions. Our approach builds on the polyhedral model, and shows how both loop and data layout transformations can be represented as affine scheduling in a unified manner. To efficiently explore the broader optimization space, we build analytical memory and computational cost models that are parameterized with a range of machine features including hardware parallelism, cache and TLB locality, and vectorization. Experimental results obtained on 12-core Intel Xeon and 24-core IBM POWER8 platforms demonstrate that, for a set of 22 Polybench benchmarks, our proposed cost-based integration approach can respectively deliver 1.3\(\times \) and 1.6\(\times \) geometric mean improvements over a state-of-the-art polyhedral optimizer, PLuTo, and a 1.2\(\times \) geometric mean improvement on both platforms over a phase-ordered approach in which loop transformations are followed by the best data layout transformations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A scalar variable is considered as a degenerate case of an array.
- 2.
Note that [25] was presented in the IMPACT workshop, which does not have a formal published proceedings.
- 3.
We extended the existing ordered micro-benchmark to doacross.
References
The Polyhedral Compiler Collection. http://www.cs.ucla.edu/~pouchet/software/pocc/
Allen, J.R., Kennedy, K.: Automatic loop interchange. In: Proceedings of the 1984 SIGPLAN Symposium on Compiler Construction, SIGPLAN 1984, pp. 233–246. ACM, New York (1984)
Bacon, D.F., Chow, J.-H., Ju, D.-C.R., Muthukumar, K., Sarkar, V.: A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness. In: CASCON First Decade High Impact Papers, CASCON 2010, pp. 146–158. IBM Corp, USA (1994)
Bondhugula, U., Acharya, A., Cohen, A.: The pluto+ algorithm: a practical approach for parallelization and locality optimization of affine loop nests. ACM Trans. Program. Lang. Syst. 38(3), 12:1–12:32 (2016)
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: Proceedings of PLDI 2008. ACM, New York (2008)
Barik, R., Majeti, D., Meel, K.S., Sarkar, V.: Automatic data layout generation and kernel mapping for CPU+GPU architectures. In: 25th International Conference on Compiler Construction, March 2016
EPCC OpenMP micro-benchmarks. https://www.epcc.ed.ac.uk/research/computing/performance-characterisation-and-benchmarking/epcc-openmp-micro-benchmark-suite
Feautrier, P., Lengauer, C.: Polyhedron model. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1581–1592. Springer, US (2011)
Ferrante, J., Sarkar, V., Thrash, W.: On estimating and enhancing cache effectiveness. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds.) LCPC 1991. LNCS, vol. 589, pp. 328–343. Springer, Heidelberg (1992). https://doi.org/10.1007/BFb0038674
Grosser, T., Größlinger, A., Lengauer, C.: Polly - performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett. 22(4), 1250010 (2012)
Henretty, T., Stock, K., Pouchet, L.-N., Franchetti, F., Ramanujam, J., Sadayappan, P.: Data layout transformation for stencil computations on short-vector SIMD architectures. In: Knoop, J. (ed.) CC 2011. LNCS, vol. 6601, pp. 225–245. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19861-8_13
Irigoin, F., Triolet, R.: Supernode Partitioning. In: Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 1988, pp. 319–329. ACM, New York (1988)
Integer set library. http://isl.gforge.inria.fr
Jung, C., Rus, S., Railing, B.P., Clark, N., Pande, S.: Brainy: effective selection of data structures. In: Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, pp. 86–97. ACM, New York (2011)
Kandemir, M., Choudhary, A., Ramanujam, J., Banerjee, P.: Improving locality using loop and data transformations in an integrated framework. In: Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 31, pp. 285–297. IEEE Computer Society Press, Los Alamitos (1998)
Kennedy, K., McKinley, K.S.: Maximizing loop parallelism and improving data locality via loop fusion and distribution. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds.) LCPC 1993. LNCS, vol. 768, pp. 301–320. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-57659-2_18
Kong, M., Veras, R., Stock, K., Franchetti, F., Pouchet, L.-N., Sadayappan, P.: When polyhedral transformations meet SIMD code generation, vol. 48, pp. 127–138. ACM, New York, June 2013
McKinley, K.S., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACM Trans. Program. Lang. Syst. (TOPLAS) 18(4), 424–453 (1996)
Openscop specification and library. http://icps.u-strasbg.fr/bastoul/development/openscop/
PolyBench. The polyhedral benchmark suite. http://www.cse.ohio-state.edu/~pouchet/software/polybench/
Reddy, C., Bondhugula, U.: Effective automatic computation placement and data allocation for parallelization of regular programs. In: Proceedings of the 28th ACM International Conference on Supercomputing, ICS 2014, pp. 13–22. ACM, New York (2014)
Sarkar, V.: Automatic Selection of high order transformations in the IBM XL Fortran compilers. IBM J. Res. Dev. 41(3), 233–264 (1997)
Sharma, K., Karlin, I., Keasler, J., McGraw, J.R., Sarkar, V.: Data layout optimization for portable performance. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015. LNCS, vol. 9233, pp. 250–262. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48096-0_20
Shirako, J., Pouchet, L.-N., Sarkar, V.: Oil and water can mix: an integration of polyhedral and ast-based transformations. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 287–298. IEEE Press, Piscataway (2014)
Shirako, J., Sarkar, V.: Integrating data layout transformations with the polyhedral model. In: Proceedings of IMPACT 2019, Valencia, Spain, January 2019
Shirako, J., et al.: Expressing DOACROSS loop dependencies in OpenMP. In: Proceedings of the 2012 SIGPLAN Symposium on Compiler Construction (2012)
Verdoolaege, S., Grosser, T.: Polyhedral extraction tool. In: Proceedings of IMPACT 2012, Paris, France, January 2012
Thies, W., Vivien, F., Amarasinghe, S.P.: A step towards unifying schedule and storage optimization. ACM Trans. Program. Lang. Syst. 29(6), 34 (2007)
Thies, W., Vivien, F., Sheldon, J., Amarasinghe, S.P.: A unified framework for schedule and storage optimization. In: Proceedings of the 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Snowbird, Utah, USA, 20–22 June 2001, pp. 232–242 (2001)
Vasilache, N., Meister, B., Baskaran, M., Lethin, R.: Joint scheduling and layout optimization to enable multi-level vectorization. In: IMPACT-2: 2nd International Workshop on Polyhedral Compilation Techniques, Paris, France, January, Paris, France, January 2012
Wolf, M., Maydan, D., Chen, D.-K.: Combining loop transformations considering caches and scheduling. In: MICRO 29: Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 274–286 (1996)
Wolfe, M.: Loop skewing: the wavefront method revisited. Int. J. Parallel Program. 15(4), 279–293 (1986)
Wolfe, M.: Iteration space tiling for memory hierarchies. In: Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing, pp. 357–361. Society for Industrial and Applied Mathematics, Philadelphia (1989)
Wonnacott, D.G.: Constraint-based array dependence analysis. Ph.D. thesis. UMI Order No. GAX96-22167, College Park, MD, USA (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Shirako, J., Sarkar, V. (2022). An Affine Scheduling Framework for Integrating Data Layout and Loop Transformations. In: Chapman, B., Moreira, J. (eds) Languages and Compilers for Parallel Computing. LCPC 2020. Lecture Notes in Computer Science(), vol 13149. Springer, Cham. https://doi.org/10.1007/978-3-030-95953-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-95953-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95952-4
Online ISBN: 978-3-030-95953-1
eBook Packages: Computer ScienceComputer Science (R0)