An Affine Scheduling Framework for Integrating Data Layout and Loop Transformations

Shirako, Jun; Sarkar, Vivek

doi:10.1007/978-3-030-95953-1_1

Jun Shirako¹⁰ &
Vivek Sarkar¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13149))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

401 Accesses
1 Citations

Abstract

Code transformations in optimizing compilers can often be classified as loop transformations that change the execution order of statement instances and data layout transformations that change the memory layouts of variables. There is a mutually dependent relationship between the two, i.e., the best statement execution order can depend on the underlying data layout and vice versa. Existing approaches have typically addressed this inter-dependency by picking a specific phase order, and can thereby miss opportunities to co-optimize loop transformations and data layout transformations. In this paper, we propose a cost-based integration of loop and data layout transformations, aiming to cover a broader optimization space than phase-ordered strategies and thereby to find better solutions. Our approach builds on the polyhedral model, and shows how both loop and data layout transformations can be represented as affine scheduling in a unified manner. To efficiently explore the broader optimization space, we build analytical memory and computational cost models that are parameterized with a range of machine features including hardware parallelism, cache and TLB locality, and vectorization. Experimental results obtained on 12-core Intel Xeon and 24-core IBM POWER8 platforms demonstrate that, for a set of 22 Polybench benchmarks, our proposed cost-based integration approach can respectively deliver 1.3\(\times \) and 1.6\(\times \) geometric mean improvements over a state-of-the-art polyhedral optimizer, PLuTo, and a 1.2\(\times \) geometric mean improvement on both platforms over a phase-ordered approach in which loop transformations are followed by the best data layout transformations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A scalar variable is considered as a degenerate case of an array.
2.
Note that [25] was presented in the IMPACT workshop, which does not have a formal published proceedings.
3.
We extended the existing ordered micro-benchmark to doacross.

References

The Polyhedral Compiler Collection. http://www.cs.ucla.edu/~pouchet/software/pocc/
Allen, J.R., Kennedy, K.: Automatic loop interchange. In: Proceedings of the 1984 SIGPLAN Symposium on Compiler Construction, SIGPLAN 1984, pp. 233–246. ACM, New York (1984)
Google Scholar
Bacon, D.F., Chow, J.-H., Ju, D.-C.R., Muthukumar, K., Sarkar, V.: A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness. In: CASCON First Decade High Impact Papers, CASCON 2010, pp. 146–158. IBM Corp, USA (1994)
Google Scholar
Bondhugula, U., Acharya, A., Cohen, A.: The pluto+ algorithm: a practical approach for parallelization and locality optimization of affine loop nests. ACM Trans. Program. Lang. Syst. 38(3), 12:1–12:32 (2016)
Google Scholar
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: Proceedings of PLDI 2008. ACM, New York (2008)
Google Scholar
Barik, R., Majeti, D., Meel, K.S., Sarkar, V.: Automatic data layout generation and kernel mapping for CPU+GPU architectures. In: 25th International Conference on Compiler Construction, March 2016
Google Scholar
EPCC OpenMP micro-benchmarks. https://www.epcc.ed.ac.uk/research/computing/performance-characterisation-and-benchmarking/epcc-openmp-micro-benchmark-suite
Feautrier, P., Lengauer, C.: Polyhedron model. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1581–1592. Springer, US (2011)
Google Scholar
Ferrante, J., Sarkar, V., Thrash, W.: On estimating and enhancing cache effectiveness. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds.) LCPC 1991. LNCS, vol. 589, pp. 328–343. Springer, Heidelberg (1992). https://doi.org/10.1007/BFb0038674
Chapter Google Scholar
Grosser, T., Größlinger, A., Lengauer, C.: Polly - performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett. 22(4), 1250010 (2012)
Article MathSciNet Google Scholar
Henretty, T., Stock, K., Pouchet, L.-N., Franchetti, F., Ramanujam, J., Sadayappan, P.: Data layout transformation for stencil computations on short-vector SIMD architectures. In: Knoop, J. (ed.) CC 2011. LNCS, vol. 6601, pp. 225–245. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19861-8_13
Chapter Google Scholar
Irigoin, F., Triolet, R.: Supernode Partitioning. In: Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 1988, pp. 319–329. ACM, New York (1988)
Google Scholar
Integer set library. http://isl.gforge.inria.fr
Jung, C., Rus, S., Railing, B.P., Clark, N., Pande, S.: Brainy: effective selection of data structures. In: Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, pp. 86–97. ACM, New York (2011)
Google Scholar
Kandemir, M., Choudhary, A., Ramanujam, J., Banerjee, P.: Improving locality using loop and data transformations in an integrated framework. In: Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 31, pp. 285–297. IEEE Computer Society Press, Los Alamitos (1998)
Google Scholar
Kennedy, K., McKinley, K.S.: Maximizing loop parallelism and improving data locality via loop fusion and distribution. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds.) LCPC 1993. LNCS, vol. 768, pp. 301–320. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-57659-2_18
Chapter Google Scholar
Kong, M., Veras, R., Stock, K., Franchetti, F., Pouchet, L.-N., Sadayappan, P.: When polyhedral transformations meet SIMD code generation, vol. 48, pp. 127–138. ACM, New York, June 2013
Google Scholar
McKinley, K.S., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACM Trans. Program. Lang. Syst. (TOPLAS) 18(4), 424–453 (1996)
Article Google Scholar
Openscop specification and library. http://icps.u-strasbg.fr/bastoul/development/openscop/
PolyBench. The polyhedral benchmark suite. http://www.cse.ohio-state.edu/~pouchet/software/polybench/
Reddy, C., Bondhugula, U.: Effective automatic computation placement and data allocation for parallelization of regular programs. In: Proceedings of the 28th ACM International Conference on Supercomputing, ICS 2014, pp. 13–22. ACM, New York (2014)
Google Scholar
Sarkar, V.: Automatic Selection of high order transformations in the IBM XL Fortran compilers. IBM J. Res. Dev. 41(3), 233–264 (1997)
Article Google Scholar
Sharma, K., Karlin, I., Keasler, J., McGraw, J.R., Sarkar, V.: Data layout optimization for portable performance. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015. LNCS, vol. 9233, pp. 250–262. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48096-0_20
Chapter Google Scholar
Shirako, J., Pouchet, L.-N., Sarkar, V.: Oil and water can mix: an integration of polyhedral and ast-based transformations. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 287–298. IEEE Press, Piscataway (2014)
Google Scholar
Shirako, J., Sarkar, V.: Integrating data layout transformations with the polyhedral model. In: Proceedings of IMPACT 2019, Valencia, Spain, January 2019
Google Scholar
Shirako, J., et al.: Expressing DOACROSS loop dependencies in OpenMP. In: Proceedings of the 2012 SIGPLAN Symposium on Compiler Construction (2012)
Google Scholar
Verdoolaege, S., Grosser, T.: Polyhedral extraction tool. In: Proceedings of IMPACT 2012, Paris, France, January 2012
Google Scholar
Thies, W., Vivien, F., Amarasinghe, S.P.: A step towards unifying schedule and storage optimization. ACM Trans. Program. Lang. Syst. 29(6), 34 (2007)
Article Google Scholar
Thies, W., Vivien, F., Sheldon, J., Amarasinghe, S.P.: A unified framework for schedule and storage optimization. In: Proceedings of the 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Snowbird, Utah, USA, 20–22 June 2001, pp. 232–242 (2001)
Google Scholar
Vasilache, N., Meister, B., Baskaran, M., Lethin, R.: Joint scheduling and layout optimization to enable multi-level vectorization. In: IMPACT-2: 2nd International Workshop on Polyhedral Compilation Techniques, Paris, France, January, Paris, France, January 2012
Google Scholar
Wolf, M., Maydan, D., Chen, D.-K.: Combining loop transformations considering caches and scheduling. In: MICRO 29: Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 274–286 (1996)
Google Scholar
Wolfe, M.: Loop skewing: the wavefront method revisited. Int. J. Parallel Program. 15(4), 279–293 (1986)
Article Google Scholar
Wolfe, M.: Iteration space tiling for memory hierarchies. In: Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing, pp. 357–361. Society for Industrial and Applied Mathematics, Philadelphia (1989)
Google Scholar
Wonnacott, D.G.: Constraint-based array dependence analysis. Ph.D. thesis. UMI Order No. GAX96-22167, College Park, MD, USA (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Georgia Institute of Technology, Atlanta, USA
Jun Shirako & Vivek Sarkar

Authors

Jun Shirako
View author publications
You can also search for this author in PubMed Google Scholar
Vivek Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Shirako .

Editor information

Editors and Affiliations

Inst for Advanced Computational Science, Stony Brook University, Stony Brook, NY, USA
Barbara Chapman
IBM TJ Watson Research Center, Yorktown Heights, NY, USA
José Moreira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shirako, J., Sarkar, V. (2022). An Affine Scheduling Framework for Integrating Data Layout and Loop Transformations. In: Chapman, B., Moreira, J. (eds) Languages and Compilers for Parallel Computing. LCPC 2020. Lecture Notes in Computer Science(), vol 13149. Springer, Cham. https://doi.org/10.1007/978-3-030-95953-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-95953-1_1
Published: 16 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95952-4
Online ISBN: 978-3-030-95953-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Affine Scheduling Framework for Integrating Data Layout and Loop Transformations