Abstract
Instruction level parallelism (ILP) is a generally accepted means to speed up the execution of both scientific and non-scientific programs. Compilation techniques for ILP are in a sense “general-purpose” in that they do not depend on these source program characteristics. In this paper we investigate what can be gained by ILP techniques that are specialized for scientific code in the form of nested loop programs. This regular program form allows us to apply well-known techniques taken from the theory of loop transformation. We present a compilation algorithm based on both standard and non-standard transformations to increase fine-grained parallelism for software pipelining, to reduce communication overhead by integrated functional unit assignment and to minimize memory traffic by maximizing data reusability between adjacent computations. We present first results which show impressive speedups compared to conventionally software-pipelined code. Our investigations are based on the limited connectivity VLIW architectural model which is a realistic (= realizable) VLIW machine made up of multiple clusters with private register files.
Chapter PDF
References
C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In 3rd ACM SIGPLAN Symposium on Principles and Practise of Parallel Programming, pages 39–50, July 1991.
Randy Allen and Ken Kennedy. Automatic translation of FORTRAN programs to vector form. ACM Transactions on Programming Languages and Systems, 9(4):491–542, October 1987.
Utpal Banerjee. Loop Transformations for Restructuring Compilers. Kluwer Academic Publishers, 1993.
A. Capitanio, N. Dutt, and A. Nicolau. Partitioned register files for VLIWs: A preliminary analysis of tradeoffs. In Proc. 25th Annual Int'l Symp. on Microarchitecture, 1992.
R. P. Colwell, R. P. Nix, O'Donnel, J. J. Pappworth, and P. K. Rodman. A VLIW architecture for a trace scheduling compiler. In 2nd International Conference on Architectural Support for Programming Languages and Operating Systems, October 1987.
Michael L. Dowling. Optimal code parallelization using unimodular transformations. Parallel Computing, 16:157–171, 1990.
Paul Feautrier. Toward automatic distribution. Technical Report 92.95, IBP/MASI, December 1992.
Sun Yuan Kung. VLSI Array Processors. Information and system sciences series. Prentice Hall, 1988.
Leslie Lamport. The parallel execution of DO loops. COMMUNICATIONS OF THE ACM, 17(2):83–93, 1974.
Dan I. Moldovan and Jose A. B. Fortes. Partitioning and mapping algorithms into fixed size systolic arrays. IEEE-TRANSACTIONS ON COMPUTERS, c-35:1–12, January 1986.
Alexandru Nicolau. Loop quantization: A generalized loop unwinding technique. Journal of Parallel and Distributed Computing, 5:568–586, 1988.
P. Pfahler. A code generation environment for fine-grained parallelization. In Proc. 2nd PASA Workshop, GI/ITG Mitteilungen der Fachgruppe 3.1.2 “Parallel-Algorithmen und Rechnerstrukturen (PARS)”, February 1992.
B.R. Rau and J.A. Fisher. Instruction-level processing: History, overview, and perspective. The Journal of Supercomputing, 7(1/2), 1993.
J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. Journal of Parallel and Distributed Computing, 16:108–120, 1992.
Michael E. Wolf and Monica S. Lam. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN 91 Conference on Programming Language Design and Implementation, Toronto, Ontario, Canada, pages 30–44, June 1991.
Michael E. Wolf and Monica S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2(4):452–471, October 1991.
Michael Wolfe. Data dependence and programm restructuring. The Journal of Supercomputing, 4:321–344, 1990.
Hans Zima and Barbara Chapman. Supercompilers for Parallel and Vector Computers. ACM Press Frontier Series. Addison Wesley, 1990.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Slowik, A., Piepenbrock, G., Pfahler, P. (1994). Compiling nested loops for limited connectivity VLIWs. In: Fritzson, P.A. (eds) Compiler Construction. CC 1994. Lecture Notes in Computer Science, vol 786. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57877-3_10
Download citation
DOI: https://doi.org/10.1007/3-540-57877-3_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57877-2
Online ISBN: 978-3-540-48371-7
eBook Packages: Springer Book Archive