Implementing flexible computation rules with subexpression-level loop transformations

Kulkarni, Dattatraya; Stumm, Michael; Unrau, Ronald C.

doi:10.1007/BFb0020475

Dattatraya Kulkarni¹,
Michael Stumm¹ &
Ronald C. Unrau²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 966))

Included in the following conference series:

European Conference on Parallel Processing

182 Accesses
1 Citations

Abstract

Computation Decomposition and Alignment (CDA) is a new loop transformation framework that extends the linear loop transformation framework and the more recently proposed Computation Alignment frameworks by linearly transforming computations at the granularity of subexpressions. It can be applied to achieve a number of optimization objectives, including the removal of data alignment constraints, the elimination of ownership tests, the reduction of cache conflicts, and improvements in data access locality.

In this paper we show how CDA can be used to effectively implement flexible computation rules with the objective of minimizing communication and, whenever possible, eliminating intrinsics that test whether computations need to be executed or not. We describe CDA, show how it can be used to implement flexible computation rules, and present an algorithm for deriving appropriate CDA transformations.

Download to read the full chapter text

Chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

R. Allen, D. Callahan, and K. Kennedy. Automatic decomposition of scientific programs for parallel execution. In Conference Record of the 14th Annual ACM Symposium on Principles of Programming Languages, pages 63–76, Munich, West Germany, January 1987.
Google Scholar
J. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, volume 28, June 1993.
Google Scholar
V. Bala, J. Ferrante, and L. Carter. Explicit data placement (xdp): A methodology for explicit compile-time representation and optimization of data movement. In Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, volume 28, pages 139–149, San Diego, CA, July 1993.
Google Scholar
Utpal Banerjee. Unimodular transformations of double loops. In Proceedings of Third Workshop on Programming Languages and Compilers for Parallel Computing, Irvine, CA, August 1990.
Google Scholar
S. Chatterjee, J.R. Gilbert, R. Schreiber, and S. Teng. Optimal evaluation of array expressions on massively parallel machines. ACM Transactions on Programming Languages and Systems, 17(1):123–156, January 1995.
Google Scholar
P. Feautrier. Dataflow analysis of array and scalar references. International Journal of Parallel Programming, 20, 1991.
Google Scholar
HPF Forum. HPF: High performance fortran language specification. Technical report, HPF Forum, 1993.
Google Scholar
M. Gupta. Automatic data partitioning on distributed memory multicomputers. Technical report, Dept of computer Science, University of Illinois at Urbana Champaign, 1992.
Google Scholar
W. Kelly and W. Pugh. A framework for unifying reordering transformations. Technical Report UMIACS-TR-92-126, University of Maryland, 1992.
Google Scholar
W. Kelly, W. Pugh, and E. Rosser. Code generation for multiple mappings. Technical Report UMIACS-TR-94-87, University of Maryland, 1994.
Google Scholar
K. Knobe, J.D. Lucas, and W.J. Dally. Dynamic alignment on distributed memory systems. In Proceedings of the Third Workshop on Compilers for Parallel Computers, Vienna, pages 394–404, 1992.
Google Scholar
D. Kulkarni and M. Stumm. Computational alignment: A new, unified program transformation for local and global optimization. Technical Report CSRI-292, Computer Systems Research Institute, University of Toronto, January 1994.
Google Scholar
D. Kulkarni and M. Stumm. CDA loop transformations. In Proceedings of Third workshop on languages, compilers and run-time systems for scalable computers, Troy, NY, May 1995.
Google Scholar
D. Kulkarni, M. Stumm, R. Unrau, and W. Li. A generalized theory of linear loop transformations. Technical Report CSRI-317, Computer Systems Research Institute, University of Toronto, December 1994.
Google Scholar
K.G. Kumar, D. Kulkarni, and A. Basu. Deriving good transformations for mapping nested loops on hierarchical parallel machines in polynomial time. In Proceedings of the 1992 ACM International Conference on Supercomputing, Washington, July 1992.
Google Scholar
J. Li and M. Chen. The data alignment phase in compiling programs for distributed memory machines. Journal of parallel and distributed computing, 13:213–221, 1991.
Google Scholar
W. Li and K. Pingali. A singular loop transformation framework based on nonsingular matrices. In Proceedings of the Fifth Workshop on Programming Languages and Compilers for Parallel Computing, August 1992.
Google Scholar
D.E. Maydan, J.L. Hennessy, and M.S. Lam. Efficient and exact data dependence analysis. SIGPLAN Notices, 26(6):1–14, 1991.
Google Scholar
D. Padua. Multiprocessors: Discussion of some theoretical and practical problems. PhD thesis, University of Illinois, Urbana-Champaign, 1979.
Google Scholar
W. Pugh. Uniform techniques for loop optimization. In International Conference on Supercomputing, pages 341–352, Cologne, Germany, 1991.
Google Scholar
J. Torres and E. Ayguade. Partitioning the statement per iteration space using non-singular matrices. In Proceedings of 1993 International Conference on Supercomputing, Tokyo, Japan, July 1993.
Google Scholar
J. Torres, E. Ayguade, J. Labarta, and M. Valero. Align and distribute-based linear loop transformations. In Proceedings of Sixth Workshop on Programming Languages and Compilers for Parallel Computing, 1993.
Google Scholar
M.E. Wolf and M.S. Lam. An algorithmic approach to compound loop transformation. In Proceedings of Third Workshop on Programming Languages and Compilers for Parallel Computing, Irvine, CA, August 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Department of Electrical & Computer Engineering, University of Toronto, M5S 1A4, Toronto, Canada
Dattatraya Kulkarni & Michael Stumm
IBM Toronto Laboratory, Parallel Compiler Development, M3C 1V7, Toronto, Canada
Ronald C. Unrau

Authors

Dattatraya Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar
Michael Stumm
View author publications
You can also search for this author in PubMed Google Scholar
Ronald C. Unrau
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Seif Haridi Khayri Ali Peter Magnusson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kulkarni, D., Stumm, M., Unrau, R.C. (1995). Implementing flexible computation rules with subexpression-level loop transformations. In: Haridi, S., Ali, K., Magnusson, P. (eds) EURO-PAR '95 Parallel Processing. Euro-Par 1995. Lecture Notes in Computer Science, vol 966. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020475

Download citation

DOI: https://doi.org/10.1007/BFb0020475
Published: 09 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60247-7
Online ISBN: 978-3-540-44769-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics