Multi-dimensional Kernel Generation for Loop Nest Software Pipelining

  • Alban Douillet
  • Hongbo Rong
  • Guang R. Gao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4128)


Single-dimension Software Pipelining (SSP) has been proposed as an effective software pipelining technique for multi-dimensional loops [16]. This paper introduces for the first time the scheduling methods that actually produce the kernel code. Because of the multi-dimensional nature of the problem, the scheduling problem is more complex and challenging than with traditional modulo scheduling. The scheduler must handle multiple subkernels and initiation rates under specific scheduling constraints, while producing a solution that minimizes the execution time of the final schedule.

In this paper three approaches are proposed: the level-by-level method, which schedules operations in loop level order, starting from the innermost, and does not let other operations interfere with the already scheduled levels, the flat method, which schedules operations from different loop levels with the same priority, and the hybrid method, which uses the level-by-level mechanism for the innermost level and the flat solution for the other levels. The methods subsume Huff’s modulo scheduling [8] for single loops as a special case. We also break a scheduling constraint introduced in earlier publications and allow for a more compact kernel. The proposed approaches were implemented in the Open64/ORC compiler, and evaluated on loop nests from the Livermore, SPEC200 and NAS benchmarks.


Loop Nest Loop Level Software Pipeline Kernel Generation Schedule Constraint 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allan, V.H., Jones, R.B., Lee, R.M., Allan, S.J.: Software pipelining. ACM Comput. Surv. 27(3), 367–432 (1995)CrossRefGoogle Scholar
  2. 2.
    Carr, S., Ding, C., Sweany, P.: Improving software pipelining with unroll-and-jam. In: Proc. of HICSS 1996, pp. 183–192. IEEE Computer Society, Los Alamitos (1996)Google Scholar
  3. 3.
    Darte, A., Schreiber, R., Rau, B.R., Vivien, F.: Constructing and exploiting linear schedules with prescribed parallelism. ACM Trans. Des. Autom. Electron. Syst. 7(1), 159–172 (2002)CrossRefGoogle Scholar
  4. 4.
    Douillet, A.: A Compiler Framework for Loop Nest Software-Pipelining. PhD thesis, University of Delaware, Newark, Delaware, USA (2006)Google Scholar
  5. 5.
    Douillet, A., Gao, G.R.: Register pressure in software-pipelined loop nests: Fast computation and impact on architecture design. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds.) LCPC 2005. LNCS, vol. 4339, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Gao, G.R., Ning, Q., Dongen, V.: Extending software pipelining techniques for scheduling nested loops. In: Pingali, K.K., Gelernter, D., Padua, D.A., Banerjee, U., Nicolau, A. (eds.) LCPC 1994. LNCS, vol. 892, pp. 340–357. Springer, Heidelberg (1995)Google Scholar
  7. 7.
    Govindarajan, R., Altman, E.R., Gao, G.R.: A framework for resource-constrained rate-optimal software pipelining. IEEE Trans. Parallel Distrib. Syst. 7(11), 1133–1149 (1996)CrossRefGoogle Scholar
  8. 8.
    Huff, R.A.: Lifetime-sensitive modulo scheduling. In: Proc. of PLDI 1993, pp. 258–267. ACM Press, New York (1993)CrossRefGoogle Scholar
  9. 9.
    Lam, M.: Software pipelining: an effective scheduling technique for vliw machines. In: Proc. of PLDI 1988, pp. 318–328. ACM Press, New York (1988)Google Scholar
  10. 10.
    Llosa, J.: Swing modulo scheduling: A lifetime-sensitive approach. In: Proc. of PACT 1996, p. 80. IEEE Computer Society, Los Alamitos (1996)Google Scholar
  11. 11.
    Muthukumar, K., Doshi, G.: Software pipelining of nested loops. In: Wilhelm, R. (ed.) CC 2001 and ETAPS 2001. LNCS, vol. 2027, pp. 165–181. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  12. 12.
    Petkov, D., Harr, R., Amarasinghe, S.: Efficient pipelining of nested loops: unroll-and-squash. In: Proc. of IPDPS 2002, IEEE, Los Alamitos (2002)Google Scholar
  13. 13.
    Rau, B.R.: Iterative modulo scheduling: an algorithm for software pipelining loops. In: Proc. of MICRO 27, pp. 63–74. ACM Press, New York (1994)CrossRefGoogle Scholar
  14. 14.
    Rong, H., Douillet, A., Gao, G.R.: Register allocation for software pipelined multi-dimensional loops. In: Proc. of PLDI 2005, pp. 154–167 (2005)Google Scholar
  15. 15.
    Rong, H., Douillet, A., Govindarajan, R., Gao, G.R.: Code generation for single-dimension software pipelining of multi-dimensional loops. In: Proc. of CGO 2004, pp. 175–186 (2004)Google Scholar
  16. 16.
    Rong, H., Tang, Z., Govindarajan, R., Douillet, A., Gao, G.R.: Single-dimension software pipelining for multi-dimensional loops. In: Proc. of CGO 2004, pp. 163–174 (2004)Google Scholar
  17. 17.
    Wang, J., Gao, G.R.: Pipelining-dovetailing: A transformation to enhance software pipelining for nested loops. In: Gyimóthy, T. (ed.) CC 1996. LNCS, vol. 1060, pp. 1–17. Springer, Heidelberg (1996)Google Scholar
  18. 18.
    Wolf, M.E., Maydan, D.E., Chen, D.K.: Combining loop transformations considering caches and scheduling. Int. J. Parallel Program. 26(4), 479–503 (1998)CrossRefGoogle Scholar
  19. 19.
    Wood, G.: Global optimization of microprograms through modular control constructs. In: Proc. of MICRO 12, pp. 1–6. IEEE, Los Alamitos (1979)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Alban Douillet
    • 1
  • Hongbo Rong
    • 2
  • Guang R. Gao
    • 1
  1. 1.University of DelawareNewarkUSA
  2. 2.Microsoft CorporationRedmondUSA

Personalised recommendations