Compiling nested loops for limited connectivity VLIWs

Slowik, Adrian; Piepenbrock, Georg; Pfahler, Peter

doi:10.1007/3-540-57877-3_10

Compiling nested loops for limited connectivity VLIWs

Adrian Slowik¹,
Georg Piepenbrock¹ &
Peter Pfahler¹

Conference paper
First Online: 01 January 2005

806 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 786))

Abstract

Instruction level parallelism (ILP) is a generally accepted means to speed up the execution of both scientific and non-scientific programs. Compilation techniques for ILP are in a sense “general-purpose” in that they do not depend on these source program characteristics. In this paper we investigate what can be gained by ILP techniques that are specialized for scientific code in the form of nested loop programs. This regular program form allows us to apply well-known techniques taken from the theory of loop transformation. We present a compilation algorithm based on both standard and non-standard transformations to increase fine-grained parallelism for software pipelining, to reduce communication overhead by integrated functional unit assignment and to minimize memory traffic by maximizing data reusability between adjacent computations. We present first results which show impressive speedups compared to conventionally software-pipelined code. Our investigations are based on the limited connectivity VLIW architectural model which is a realistic (= realizable) VLIW machine made up of multiple clusters with private register files.

Download to read the full chapter text

Chapter PDF

References

C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In 3rd ACM SIGPLAN Symposium on Principles and Practise of Parallel Programming, pages 39–50, July 1991.
Google Scholar
Randy Allen and Ken Kennedy. Automatic translation of FORTRAN programs to vector form. ACM Transactions on Programming Languages and Systems, 9(4):491–542, October 1987.
Article Google Scholar
Utpal Banerjee. Loop Transformations for Restructuring Compilers. Kluwer Academic Publishers, 1993.
Google Scholar
A. Capitanio, N. Dutt, and A. Nicolau. Partitioned register files for VLIWs: A preliminary analysis of tradeoffs. In Proc. 25th Annual Int'l Symp. on Microarchitecture, 1992.
Google Scholar
R. P. Colwell, R. P. Nix, O'Donnel, J. J. Pappworth, and P. K. Rodman. A VLIW architecture for a trace scheduling compiler. In 2nd International Conference on Architectural Support for Programming Languages and Operating Systems, October 1987.
Google Scholar
Michael L. Dowling. Optimal code parallelization using unimodular transformations. Parallel Computing, 16:157–171, 1990.
Article Google Scholar
Paul Feautrier. Toward automatic distribution. Technical Report 92.95, IBP/MASI, December 1992.
Google Scholar
Sun Yuan Kung. VLSI Array Processors. Information and system sciences series. Prentice Hall, 1988.
Google Scholar
Leslie Lamport. The parallel execution of DO loops. COMMUNICATIONS OF THE ACM, 17(2):83–93, 1974.
Article Google Scholar
Dan I. Moldovan and Jose A. B. Fortes. Partitioning and mapping algorithms into fixed size systolic arrays. IEEE-TRANSACTIONS ON COMPUTERS, c-35:1–12, January 1986.
Google Scholar
Alexandru Nicolau. Loop quantization: A generalized loop unwinding technique. Journal of Parallel and Distributed Computing, 5:568–586, 1988.
Article Google Scholar
P. Pfahler. A code generation environment for fine-grained parallelization. In Proc. 2nd PASA Workshop, GI/ITG Mitteilungen der Fachgruppe 3.1.2 “Parallel-Algorithmen und Rechnerstrukturen (PARS)”, February 1992.
Google Scholar
B.R. Rau and J.A. Fisher. Instruction-level processing: History, overview, and perspective. The Journal of Supercomputing, 7(1/2), 1993.
Google Scholar
J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. Journal of Parallel and Distributed Computing, 16:108–120, 1992.
Article Google Scholar
Michael E. Wolf and Monica S. Lam. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN 91 Conference on Programming Language Design and Implementation, Toronto, Ontario, Canada, pages 30–44, June 1991.
Google Scholar
Michael E. Wolf and Monica S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2(4):452–471, October 1991.
Article Google Scholar
Michael Wolfe. Data dependence and programm restructuring. The Journal of Supercomputing, 4:321–344, 1990.
Article Google Scholar
Hans Zima and Barbara Chapman. Supercompilers for Parallel and Vector Computers. ACM Press Frontier Series. Addison Wesley, 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

Fachbereich Mathematik/Informatik, Universität-GH Paderborn, Warburger Str. 100, D-33098, Paderborn, Germany
Adrian Slowik, Georg Piepenbrock & Peter Pfahler

Authors

Adrian Slowik
View author publications
You can also search for this author in PubMed Google Scholar
Georg Piepenbrock
View author publications
You can also search for this author in PubMed Google Scholar
Peter Pfahler
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Peter A. Fritzson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Slowik, A., Piepenbrock, G., Pfahler, P. (1994). Compiling nested loops for limited connectivity VLIWs. In: Fritzson, P.A. (eds) Compiler Construction. CC 1994. Lecture Notes in Computer Science, vol 786. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57877-3_10

Download citation

DOI: https://doi.org/10.1007/3-540-57877-3_10
Published: 30 May 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57877-2
Online ISBN: 978-3-540-48371-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics