Loop Striping: Maximize Parallelism for Nested Loops

  • Chun Xue
  • Zili Shao
  • Meilin Liu
  • Meikang Qiu
  • Edwin H. -M. Sha
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4096)


The majority of scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are generally applied to increase parallelism for these nested loops. Most of the existing loop transformation techniques either can not achieve maximum parallelism, or can achieve maximum parallelism but with complicated loop bounds and loop indexes calculations. This paper proposes a new technique, loop striping, that can maximize parallelism while maintaining the original row-wise execution sequence with minimum overhead. Loop striping groups iterations into stripes, where a stripe is a group of iterations in which all iterations are independent and can be executed in parallel. Theorems and efficient algorithms are proposed for loop striping transformations. The experimental results show that loop striping always achieves better iteration period than software pipelining and loop unfolding, improving average iteration period by 50% and 54% respectively.


Nest Loop Static Schedule Loop Index Loop Body Software Pipeline 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aiken, A., Nicolau, A.: Optimal loop parallelization. In: ACM Conference on Programming Language Design and Implementation, pp. 308–317 (1988)Google Scholar
  2. 2.
    Aiken, A., Nicolau, A.: Fine-Grain Parallelization and the Wavefront Method. MIT Press, Cambridge (1990)Google Scholar
  3. 3.
    Allen, J.R., Kennedy, K.: Automatic loop interchange. In: ACM SIGPLAN symposium on Compiler construction, pp. 233–246 (1984)Google Scholar
  4. 4.
    Anderson, J.M., Lam, M.S.: Global optimizations for parallelism and locality on scalable parallel machines. In: ACM SIGPLAN Conference on Programming Language Design and Implementations, pp. 112–125 (June 1993)Google Scholar
  5. 5.
    Banerjee, U.: Unimodular Transformations of Double Loops. MIT Press, Cambridge (1991)Google Scholar
  6. 6.
    Iwano, K., Yeh, S.: An efficient algorithm for optimal loop parallelization (December 1990)Google Scholar
  7. 7.
    Karp, R.M.: A characterization of the minimum cycle mean in a digraph. Discrete Mathematics 23, 309–311 (1978)MATHMathSciNetGoogle Scholar
  8. 8.
    Lamport, L.: The parallel execution of do loops. Communications of the ACM SIGPLAN 17, 82–93 (1991)Google Scholar
  9. 9.
    Leiserson, C.E., Saxe, J.B.: Retiming synchronous circuitry. Algorithmica 6, 5–35 (1991)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Parhi, K.K., Messerschmitt, D.G.: Static rate-optimal scheduling of iterative data-flow programs via optimum unfolding. IEEE Transactions on Computers 40, 178–195 (1991)CrossRefGoogle Scholar
  11. 11.
    Passos, N.L., Sha, E.H.-M.: Full parallelism in uniform nested loops using multi-dimensional retiming. In: International Conference on Parallel Processing, August 1994, pp. 130–133 (1994)Google Scholar
  12. 12.
    Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: ACM SIGPLAN conference on Programming Language Design and Implementation, June 1991, vol. 2, pp. 30–44 (1991)Google Scholar
  13. 13.
    Wolf, M.E., Lam, M.S.: A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems 2, 452–471 (1991)CrossRefGoogle Scholar
  14. 14.
    Wolfe, M.: Loop skewing: the wavefront method revisited. International Journal of Parallel Programming 15(4), 284–294 (1986)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Chun Xue
    • 1
  • Zili Shao
    • 2
  • Meilin Liu
    • 1
  • Meikang Qiu
    • 1
  • Edwin H. -M. Sha
    • 1
  1. 1.University of Texas at DallasRichardsonUSA
  2. 2.Hong Kong Polytechnic UniversityHung Hom, Kowloon, Hong Kong

Personalised recommendations