Abstract
Computation Decomposition and Alignment (CDA) is a new loop transformation framework that extends the linear loop transformation framework and the more recently proposed Computation Alignment frameworks by linearly transforming computations at the granularity of subexpressions. It can be applied to achieve a number of optimization objectives, including the removal of data alignment constraints, the elimination of ownership tests, the reduction of cache conflicts, and improvements in data access locality.
In this paper we show how CDA can be used to effectively implement flexible computation rules with the objective of minimizing communication and, whenever possible, eliminating intrinsics that test whether computations need to be executed or not. We describe CDA, show how it can be used to implement flexible computation rules, and present an algorithm for deriving appropriate CDA transformations.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
R. Allen, D. Callahan, and K. Kennedy. Automatic decomposition of scientific programs for parallel execution. In Conference Record of the 14th Annual ACM Symposium on Principles of Programming Languages, pages 63–76, Munich, West Germany, January 1987.
J. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, volume 28, June 1993.
V. Bala, J. Ferrante, and L. Carter. Explicit data placement (xdp): A methodology for explicit compile-time representation and optimization of data movement. In Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, volume 28, pages 139–149, San Diego, CA, July 1993.
Utpal Banerjee. Unimodular transformations of double loops. In Proceedings of Third Workshop on Programming Languages and Compilers for Parallel Computing, Irvine, CA, August 1990.
S. Chatterjee, J.R. Gilbert, R. Schreiber, and S. Teng. Optimal evaluation of array expressions on massively parallel machines. ACM Transactions on Programming Languages and Systems, 17(1):123–156, January 1995.
P. Feautrier. Dataflow analysis of array and scalar references. International Journal of Parallel Programming, 20, 1991.
HPF Forum. HPF: High performance fortran language specification. Technical report, HPF Forum, 1993.
M. Gupta. Automatic data partitioning on distributed memory multicomputers. Technical report, Dept of computer Science, University of Illinois at Urbana Champaign, 1992.
W. Kelly and W. Pugh. A framework for unifying reordering transformations. Technical Report UMIACS-TR-92-126, University of Maryland, 1992.
W. Kelly, W. Pugh, and E. Rosser. Code generation for multiple mappings. Technical Report UMIACS-TR-94-87, University of Maryland, 1994.
K. Knobe, J.D. Lucas, and W.J. Dally. Dynamic alignment on distributed memory systems. In Proceedings of the Third Workshop on Compilers for Parallel Computers, Vienna, pages 394–404, 1992.
D. Kulkarni and M. Stumm. Computational alignment: A new, unified program transformation for local and global optimization. Technical Report CSRI-292, Computer Systems Research Institute, University of Toronto, January 1994.
D. Kulkarni and M. Stumm. CDA loop transformations. In Proceedings of Third workshop on languages, compilers and run-time systems for scalable computers, Troy, NY, May 1995.
D. Kulkarni, M. Stumm, R. Unrau, and W. Li. A generalized theory of linear loop transformations. Technical Report CSRI-317, Computer Systems Research Institute, University of Toronto, December 1994.
K.G. Kumar, D. Kulkarni, and A. Basu. Deriving good transformations for mapping nested loops on hierarchical parallel machines in polynomial time. In Proceedings of the 1992 ACM International Conference on Supercomputing, Washington, July 1992.
J. Li and M. Chen. The data alignment phase in compiling programs for distributed memory machines. Journal of parallel and distributed computing, 13:213–221, 1991.
W. Li and K. Pingali. A singular loop transformation framework based on nonsingular matrices. In Proceedings of the Fifth Workshop on Programming Languages and Compilers for Parallel Computing, August 1992.
D.E. Maydan, J.L. Hennessy, and M.S. Lam. Efficient and exact data dependence analysis. SIGPLAN Notices, 26(6):1–14, 1991.
D. Padua. Multiprocessors: Discussion of some theoretical and practical problems. PhD thesis, University of Illinois, Urbana-Champaign, 1979.
W. Pugh. Uniform techniques for loop optimization. In International Conference on Supercomputing, pages 341–352, Cologne, Germany, 1991.
J. Torres and E. Ayguade. Partitioning the statement per iteration space using non-singular matrices. In Proceedings of 1993 International Conference on Supercomputing, Tokyo, Japan, July 1993.
J. Torres, E. Ayguade, J. Labarta, and M. Valero. Align and distribute-based linear loop transformations. In Proceedings of Sixth Workshop on Programming Languages and Compilers for Parallel Computing, 1993.
M.E. Wolf and M.S. Lam. An algorithmic approach to compound loop transformation. In Proceedings of Third Workshop on Programming Languages and Compilers for Parallel Computing, Irvine, CA, August 1990.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kulkarni, D., Stumm, M., Unrau, R.C. (1995). Implementing flexible computation rules with subexpression-level loop transformations. In: Haridi, S., Ali, K., Magnusson, P. (eds) EURO-PAR '95 Parallel Processing. Euro-Par 1995. Lecture Notes in Computer Science, vol 966. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020475
Download citation
DOI: https://doi.org/10.1007/BFb0020475
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60247-7
Online ISBN: 978-3-540-44769-6
eBook Packages: Springer Book Archive