Pipelining-dovetailing: A transformation to enhance software pipelining for nested loops

  • Jian Wang
  • Guang R. Gao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1060)


The objective of software pipelining is to generate code which can maximally exploit instruction-level parallelism (ILP) in modern multiissue processor architectures, such as VLIW and superscalar processors. Since the amount of ILP is usually fixed to a small number, four — eight, using state-of-the-art software pipelining scheduling techniques, modern compilers have been able to schedule instructions in a small window of successive iterations and keep the machine resources usefully busy. To maximally take advantage of software pipelining, it is beneficial if the number of iterations of the loops to be software pipelined is large (called trip counts in this paper). Therefore, software pipelining of nested loops becomes important, especially when the innermost loops have smaller trip counts.

This paper presents a loop transformation which extends software pipelining from the innermost loops to the enclosing loop nests. Unlike some popular loop transformation techniques (e.g. unimodular transformation) targeted to multi-processor machines (where the goal has been to maximally expose loop-level parallelism i.e. the transformed loop nests have maximum number of doall loops), the goal of our transformation, pipelining-dovetailing, is to extend the software pipelining of the innermost loop to the surrounding loop nests. Thus all iterations of the loop nests can be smoothly software pipelined through, and the number of effective trip counts is maximized. We also define the condition under which pipelining-dovetailing is valid. As a result, a software pipelining framework is derived for loop nests which integrates software pipelining and pipelining-dovetailing together.


Instruction-Level Parallelism Fine-Grain Parallelism Software Pipelining Loop Scheduling Nested Loop Very Long Instruction Word(VLIW) Superscalar 


  1. 1.
    B. R. Rau and J.A. Fisher. Instruction-level parallel processing: History, overview and perspective. The Journal of Supercomputing, 7(1), January 1993.Google Scholar
  2. 2.
    B.R. Rau and C.D. Glaeser. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In proceedings of the 14th International Symposium on Microprogramming and Microarchitectures (MICRO-14), pages 183–198, October 1981.Google Scholar
  3. 3.
    K. Ebcioglu and T. Nakatani. A new compilation technique for paralelizing loops with unpredictable branches on a vliw architecture. In A. Nicolau D. Gelernter and D. Padua, editors, Languages and Compilers for Parallel Computing, pages 213–229. Pitman/The MIT Press, London, 1989.Google Scholar
  4. 4.
    M.S. Lam. A Systolic Array Optimizing Compiler. PhD thesis, CMU, 1987. CMU-CS-87-187.Google Scholar
  5. 5.
    C. Eisenbeis, W. Jalby, and A. Lichnewsky. Compile-time optimization of memory and register usage on the cray-2. In proceedings of the second Workshop on Languages and Compilers, 1989.Google Scholar
  6. 6.
    A. Aiken and A. Nicolau. A realistic resource-constrainted software pipelining algorithm. In T. Gross A. Nicolau, D. Gelernter and D. Padua, editors, Languages and Compilers for Parallel Computing, pages 274–290. Pitman/The MIT Press, London, 1991.Google Scholar
  7. 7.
    R. Huff. Lifetime-sensitive modulo scheduling. In proceedings of ACM SIGPLAN PLDI, pages 258–267, June 1993.Google Scholar
  8. 8.
    Q. Ning and G.R. Gao. A novel framework of register allocation for software pipelining. In proceedings of POPL, January 1993.Google Scholar
  9. 9.
    Jian Wang, Christine Eisenbeis, Martin Jourdan, and Bogong Su. Decomposed Software Pipelining: A new perspective and a new approach. International Journal of Parallel Programming, 22(3):357–379, 1994.Google Scholar
  10. 10.
    Michael E. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems, 2(4), 1991.Google Scholar
  11. 11.
    U. Banerjee. Loop Transformations for Restructuring Compilers. Kluwer Academic, 1993.Google Scholar
  12. 12.
    A. Darte, L. Risset, and Y. Robert. Loop nest scheduling and transformations. In proceedings of Environments and Tools for Parallel Scientific Computing, 1992.Google Scholar
  13. 13.
    Amy W. Lim and M. S. Lam. Communication-free parallelization via affine transformations. In proceedings of LCPC'94, 1994.Google Scholar
  14. 14.
    F. Gasperoni. Compilation techniques for vliw architectures. Technical Report TR435, New York University, March 1989.Google Scholar
  15. 15.
    Hans Zima and Barbara Chapman. Supercompilers for Parallel and Vector Computers. ACM Press, New York, 1990.Google Scholar
  16. 16.
    U. Banerjee. Unimodular transformations of double loops. In proceedings of the 3rd Workshop on Languages and Compilers for Parallel Computing, 1990.Google Scholar
  17. 17.
    Bogong Su, Shiyuan Ding, Jian Wang, and Jinshi Xia. GURPR-a method for global software pipelining. In proceedings of the 20th Annual International Workshop on Microprogramming (MICRO-20), pages 88–96. ACM and IEEE, November 1987.Google Scholar
  18. 18.
    Guang R. Gao, Qi Ning, and Vincent Van Dongen. Extending software pipelining techniques for scheduling nested loops. In proceedings of the 6th Workshop on Languages and Compilers for Parallel Computing, 1993.Google Scholar
  19. 19.
    Ki chang Kim and Alexandru Nicolau. Parallelizing tightly nested loops. In proceedings of International Conference on Parallel Processing, 1991.Google Scholar
  20. 20.
    P. Feautrier. A collection of papers on the systematic construction of parallel and distributed programs. Technical Report Hors-serie, Lab. MASI, Universite P. et M. Curie, 1992.Google Scholar
  21. 21.
    M. J. Wolfe. Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge, MA, 1989.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Jian Wang
    • 1
  • Guang R. Gao
    • 1
  1. 1.School of Computer ScienceMcGill UniversityMontréalCanada

Personalised recommendations