Loop Striping: Maximize Parallelism for Nested Loops
The majority of scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are generally applied to increase parallelism for these nested loops. Most of the existing loop transformation techniques either can not achieve maximum parallelism, or can achieve maximum parallelism but with complicated loop bounds and loop indexes calculations. This paper proposes a new technique, loop striping, that can maximize parallelism while maintaining the original row-wise execution sequence with minimum overhead. Loop striping groups iterations into stripes, where a stripe is a group of iterations in which all iterations are independent and can be executed in parallel. Theorems and efficient algorithms are proposed for loop striping transformations. The experimental results show that loop striping always achieves better iteration period than software pipelining and loop unfolding, improving average iteration period by 50% and 54% respectively.
KeywordsNest Loop Static Schedule Loop Index Loop Body Software Pipeline
Unable to display preview. Download preview PDF.
- 1.Aiken, A., Nicolau, A.: Optimal loop parallelization. In: ACM Conference on Programming Language Design and Implementation, pp. 308–317 (1988)Google Scholar
- 2.Aiken, A., Nicolau, A.: Fine-Grain Parallelization and the Wavefront Method. MIT Press, Cambridge (1990)Google Scholar
- 3.Allen, J.R., Kennedy, K.: Automatic loop interchange. In: ACM SIGPLAN symposium on Compiler construction, pp. 233–246 (1984)Google Scholar
- 4.Anderson, J.M., Lam, M.S.: Global optimizations for parallelism and locality on scalable parallel machines. In: ACM SIGPLAN Conference on Programming Language Design and Implementations, pp. 112–125 (June 1993)Google Scholar
- 5.Banerjee, U.: Unimodular Transformations of Double Loops. MIT Press, Cambridge (1991)Google Scholar
- 6.Iwano, K., Yeh, S.: An efficient algorithm for optimal loop parallelization (December 1990)Google Scholar
- 8.Lamport, L.: The parallel execution of do loops. Communications of the ACM SIGPLAN 17, 82–93 (1991)Google Scholar
- 11.Passos, N.L., Sha, E.H.-M.: Full parallelism in uniform nested loops using multi-dimensional retiming. In: International Conference on Parallel Processing, August 1994, pp. 130–133 (1994)Google Scholar
- 12.Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: ACM SIGPLAN conference on Programming Language Design and Implementation, June 1991, vol. 2, pp. 30–44 (1991)Google Scholar