Skip to main content
Log in

Minimizing Register Requirements of a Modulo Schedule via Optimum Stage Scheduling

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Modulo scheduling is an efficient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present an approach that schedules the loop operations for minimum register requirements, given a modulo reservation table. Our method determines optimal register requirements for machines with finite resources and for general dependence graphs. Measurements on a benchmark suite of 1327 loops from the Perfect Club, SPEC-89, and the Livermore Fortran Kernels show that the register requirements decrease by 24.8% on average when applying the optimal stage scheduler to the MRT-schedules of a register-insensitive modulo scheduler.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. B. R. Rau and J. A. Fisher, Instruction-level parallel processing: History, overview, and perspective, The Journal of Supercomputing, 7:9–50 (1993).

    Article  Google Scholar 

  2. P. Y. Hsu, Highly Concurrent Scalar Processing. Ph.D. Thesis, University of Illinois at Urbana-Champaign, 1986.

    Google Scholar 

  3. B. R. Rau and C. D. Glaeser, Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing, Proc. of the Fourteenth Ann. Workshop on Microprogramming, pp. 183–198 (October 1981).

  4. M. Lam, Software pipelining: An effective scheduling technique for VLIW machines, Proc. of the ACM SIGPLAN ’88 Conf. on Programming Language Design and Implementation, pp. 318–328 (June 1988).

  5. N. J. Warter, G. E. Haab, K. Subramanian, and J. W. Bockhaus, Enhanced Modulo Scheduling for loops with conditional branches, Proc. of the 25th Ann. Int. Symp. on Microarchitecture, pp. 170–179, (December 1992).

  6. N. J. Warter, Modulo Scheduling with Isomorphic Control Transformations, Ph.D. Thesis, University of Illinois at Urbana-Champaign, 1994.

    Google Scholar 

  7. P. P. Tirumalai, M. Lee, and M. S. Schlansker, Parallelization of loops with exits on pipelined architectures, Proc. of Supercomputing ’90, pp. 200–212 (November 1990).

  8. B. R. Rau, M. Lee, P. P. Tirumalai, and M. S. Schlansker, Register allocation for software pipelined loops, Proc. of the ACM SIGPLAN ’92 Conf. on Programming Language Design and Implementation, pp. 283–299 (June 1992).

  9. W. Mangione-Smith, S. G. Abraham, and E. S. Davidson, Register requirements of pipelined processors, Proc. of the Int. Conf. on Supercomputing, pp. 260–271 (July 1992).

  10. C. Eisenbeis and D. Windheiser, Optimal software pipelining in presence of resource constraints, Proc. of the Int. Conf. on Parallel Architecture and Compiler Techniques (August 1993).

  11. R. A. Huff, Lifetime-sensitive modulo scheduling, Proc. of the ACM SIGPLAN ’93 Conf. on Programming Language Design and Implementation, pp. 258–267 (June 1993).

  12. J. H. Patel and E. S. Davidson, Improving the throughput of a pipeline by insertion of delays, Proc. of the Third Ann. Int. Symp. on Computer Architecture, pp. 159–164 (1976).

  13. C. Eisenbeis, W. Jalby, and A. Lichnewsky, Squeezing more performance out of a Cray-2 by vector block scheduling, Proc. of Supercomputing ’88, pp. 237–246 (November 1988).

  14. G. R. Beck, D. W. L. Yen, and T. L. Anderson, The Cydra 5 mini-supercomputer: Architecture and implementation, The Journal of Supercomputing, 7:143–180 (1993).

    Article  Google Scholar 

  15. A. E. Eichenberger and E. S. Davidson, Stage scheduling: A technique to reduce the register requirements of a modulo schedule, Proc. of the 28th Ann. Int. Symp. on Microarchitecture, pp. 338–349 (December 1995).

  16. Q. Ning and G. R. Gao, A novel framework of register allocation for software pipelining, Proc. of the 20th Ann. ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pp. 29–42 (1993).

  17. R. Govindarajan, E. R. Altman, and G. R. Gao, Minimizing register requirements under resource-constrained rate-optimal software pipelining, Proc. of the 27th Ann. Int. Symp. on Microarchitecture, pp. 85–94 (November 1994).

  18. J. Wang, A. Krall, and M. A. Ertl, Decomposed software pipelining with reduced register requirement, Proc. of the Int. Conf. on Parallel Architecture and Compiler Techniques (June 1995).

  19. Dupont de Dinechin, Simplex scheduling: More than lifetime-sensitive instruction scheduling, Proc. of the Int. Conf. on Parallel Architecture and Compiler Techniques (1994).

  20. A. E Eichenberger, E. S. Davidson, and S. G. Abraham, Optimum modulo schedules for minimum register requirements, Proc. of the Int. Conf. on Supercomputing, pp. 31–40 (July 1995).

  21. J. Llosa, M. Valero, E. Ayguadé, and A. González, Hypernode reduction modulo scheduling, Proc. of the 28th Ann. Int. Symp. on Microarchitecture, pp. 350–360 (November 1995).

  22. S. Chaudhuri, R. A. Walker, and J. E. Mitchell, Analyzing and exploiting the structure of the constraints in the ILP approach to the scheduling problem, IEEE Transaction on Very Large Scale Integration 2(4):456–471 (December 1994).

    Article  Google Scholar 

  23. K. Paton, An algorithm for finding a fundamental set of cycles of a graph, Comm. ACM 12(9):514–518 (September 1969).

    Article  MATH  Google Scholar 

  24. B. R. Rau, Iterative Modulo Scheduling: An algorithm for software pipelining loops, Proc. 27th Ann. Int. Symp. on Microarchitecture, pp. 63–74 (November 1994).

  25. M. Berry, D. Chen, D. Kuck, S. Lo, Y. Pang, L. Pointer, R. Roloff, A. Samah, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Schwarzmeier, L. Lue, S. Orszag, F. Seidl, O. Johnson, R. Goodrum, and J. Martin, The Perfect Club Benchmarks: Effective performance evaluation of supercomputers, Int. J. of Supercomputer Applications, 3(3):5–40 (Fall 1989).

    Article  Google Scholar 

  26. J. Uniejewski, SPEC Benchmark Suite: Designed for today’s advanced system, SPEC Newsletter (Fall 1989).

  27. F. H. McMahon, The Livermore Fortran Kernels: A computer test of the numerical performance range. Technical Report UCRL-53745, Lawrence Livermore National Laboratory, Livermore, California, 1986.

    Google Scholar 

  28. J. C. Dehnert and R. A. Towle, Compiling for the Cydra 5, The Journal of Supercomputing, 7:181–227 (1993).

    Article  Google Scholar 

  29. G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Optimization, John Wiley & Sons (1988).

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandre E. Eichenberger.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Eichenberger, A.E., Davidson, E.S. & Abraham, S.G. Minimizing Register Requirements of a Modulo Schedule via Optimum Stage Scheduling. Int J Parallel Prog 24, 103–132 (1996). https://doi.org/10.1007/BF03356744

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03356744

Key Words

Navigation