A comparison of modulo scheduling techniques for software pipelining

  • Peter Pfahler
  • Georg Piepenbrock
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1060)


Software pipelining is a well-known and effective technique for generating compact loop schedules for instruction level parallel computers. This paper presents the results of an experimental evaluation and comparison of different scheduling algorithms that generate software pipelines. We implemented these algorithms in an uniform retargetable compiler environment that can be instantiated by providing target machine descriptions. This environment and a carefully designed benchmark suite enable us to perform a fair comparison of the implemented techniques. We evaluate well-known non-hierarchical and hierarchical schedulers and a hybrid technique developed in our group. Our analysis indicates that scheduling algorithms based on variations of the “classical” non-hierarchical modulo scheduling technique will probably yield the most effective software pipelines.


Software Pipelining Instruction Level Parallelism Superscalar Processors VLIW 


  1. [CNO87]
    Colwell, R. P., Nix, R. P., O'Donnel, J. J., Pappworth, D. B. and Rodman, P. K., A VLIW Architecture for a Trace Scheduling Compiler, in Proc. 2nd Int'l Conf. on Arch. Support for Progr. Languages and Operating Systems, Oct 87, 180–192.Google Scholar
  2. [DeT93]
    Dehnert, J. C. and Towle, R. A., Compiling for the Cydra-5, Journal of Supercomputing (7) (May 1993), 181–227.Google Scholar
  3. [DGS92]
    Düsterwald, E., Gupta, R. and Soffa, M. L., Register Pipelining: An Integrated Approach to Register Allocation for Scalar and Subscripted Variables, in Proc. 4th International Conference on Compiler Construction, CC'92, vol. 641, Springer-Verlag, 1992, 192–206.Google Scholar
  4. [Ell86]
    Ellis, J. R., Bulldog: A Compiler for VLIW Architectures, MIT Press, Cambridge, MA, 1986.Google Scholar
  5. [GoH88]
    Goodman, J. R. and Hsu, W. C., Code Scheduling and Register Allocation in Large Basic Blocks, in Proc. International Conference on Supercomputing, ACM, 1988, 442–452.Google Scholar
  6. [HGA92]
    Hendren, L. J., Gao, G. R., Altman, E. R. and Mukerji, C., A Register Allocation Framework Based on Hierachical Cyclic Interval Graphs, in Proc. 4th International Conference on Compiler Construction, CC'92, vol. 641, Lecture Notes in Computer Science, Springer-Verlag, 1992, 176–191.Google Scholar
  7. [Lam88]
    Lam, M., Software Pipelining: An Effective Scheduling Technique for VLIW Machines, in Proc. SIGPLAN 88 Conf. on Programming Language Design and Implementation, June 1988, 318–328.Google Scholar
  8. [LVA95]
    Llosa, J., Valero, M. and Ayguade, E., Bidirectional Scheduling to Minimize Register Requirements, in 5th Workshop on Compilers for Parallel Computers, Malaga Spain, 1995.Google Scholar
  9. [Pie95]
    Piepenbrock, G., Methoden des Software-Pipelining für Prozessoren mit Instruktionsparallelität, PhD Thesis Universität-GH Paderborn (1995).Google Scholar
  10. [Pin93]
    Pinter, S. S., Register Allocation with Instruction Scheduling: a New Approach, in Proc SIGPLAN 93 Conference on Programming Language Design and Implementation, 1993, 248–257.Google Scholar
  11. [Pug92]
    Pugh, W., A Practical Algorithm for Exact Array Dependence Analysis, Communications of the ACM 35-8 (1992), 102–114.Google Scholar
  12. [RaG81]
    Rau, B. R. and Glaeser, C. D., Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing, in Proc. 14th Annual Microprogramming Workshop, 1981, 183–198.Google Scholar
  13. [RaF93]
    Rau, B. R. and Fisher, J. A., Instruction-Level Parallel Processing: History, Overview and Perspective, Journal of Supercomputing 7 (May 1993), 9–50.Google Scholar
  14. [SPP94]
    Slowik, A., Pfahler, P. and Piepenbrock, G., Compiling Nested Loops for Limited Connectivity VLIWs, in Proc. 5th International Conference on Compiler Construction, CC'94, vol. 786, Lecture Notes in Computer Science, Springer-Verlag, 1994, 143–157.Google Scholar
  15. [WHS92]
    Warter, N. J., Haab, G. E., Subramanian, K. and Bockhaus, J. W., Enhanced Modulo Scheduling for Loops with Conditional Branches, Proc. 25th Intern. Symposium on Microarchitecture, Portland (1992).Google Scholar
  16. [WoL91]
    Wolf, M. E. and Lam, M. S., A Loop Transformation Theory and an Algorithm to Maximize Parallelism, Transactions on Parallel and Distributetd Systems 2/4 (Oct. 1991), 452–471.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Peter Pfahler
    • 1
  • Georg Piepenbrock
    • 1
  1. 1.Universität-GH PaderbornPaderbornGermany

Personalised recommendations