Minimizing Register Requirements of a Modulo Schedule via Optimum Stage Scheduling

Eichenberger, Alexandre E.; Davidson, Edward S.; Abraham, Santosh G.

doi:10.1007/BF03356744

Minimizing Register Requirements of a Modulo Schedule via Optimum Stage Scheduling

Published: 26 May 2016

Volume 24, pages 103–132, (1996)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Alexandre E. Eichenberger¹,
Edward S. Davidson¹ &
Santosh G. Abraham²

15 Accesses
15 Citations
3 Altmetric
Explore all metrics

Abstract

Modulo scheduling is an efficient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present an approach that schedules the loop operations for minimum register requirements, given a modulo reservation table. Our method determines optimal register requirements for machines with finite resources and for general dependence graphs. Measurements on a benchmark suite of 1327 loops from the Perfect Club, SPEC-89, and the Livermore Fortran Kernels show that the register requirements decrease by 24.8% on average when applying the optimal stage scheduler to the MRT-schedules of a register-insensitive modulo scheduler.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Low-Cost Loop Expansion for Modulo Scheduling

A Static Greedy and Dynamic Adaptive Thread Spawning Approach for Loop-Level Parallelism

Article 17 November 2014

Optimal and Heuristic Global Code Motion for Minimal Spilling

References

B. R. Rau and J. A. Fisher, Instruction-level parallel processing: History, overview, and perspective, The Journal of Supercomputing, 7:9–50 (1993).
Article Google Scholar
P. Y. Hsu, Highly Concurrent Scalar Processing. Ph.D. Thesis, University of Illinois at Urbana-Champaign, 1986.
Google Scholar
B. R. Rau and C. D. Glaeser, Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing, Proc. of the Fourteenth Ann. Workshop on Microprogramming, pp. 183–198 (October 1981).
M. Lam, Software pipelining: An effective scheduling technique for VLIW machines, Proc. of the ACM SIGPLAN ’88 Conf. on Programming Language Design and Implementation, pp. 318–328 (June 1988).
N. J. Warter, G. E. Haab, K. Subramanian, and J. W. Bockhaus, Enhanced Modulo Scheduling for loops with conditional branches, Proc. of the 25th Ann. Int. Symp. on Microarchitecture, pp. 170–179, (December 1992).
N. J. Warter, Modulo Scheduling with Isomorphic Control Transformations, Ph.D. Thesis, University of Illinois at Urbana-Champaign, 1994.
Google Scholar
P. P. Tirumalai, M. Lee, and M. S. Schlansker, Parallelization of loops with exits on pipelined architectures, Proc. of Supercomputing ’90, pp. 200–212 (November 1990).
B. R. Rau, M. Lee, P. P. Tirumalai, and M. S. Schlansker, Register allocation for software pipelined loops, Proc. of the ACM SIGPLAN ’92 Conf. on Programming Language Design and Implementation, pp. 283–299 (June 1992).
W. Mangione-Smith, S. G. Abraham, and E. S. Davidson, Register requirements of pipelined processors, Proc. of the Int. Conf. on Supercomputing, pp. 260–271 (July 1992).
C. Eisenbeis and D. Windheiser, Optimal software pipelining in presence of resource constraints, Proc. of the Int. Conf. on Parallel Architecture and Compiler Techniques (August 1993).
R. A. Huff, Lifetime-sensitive modulo scheduling, Proc. of the ACM SIGPLAN ’93 Conf. on Programming Language Design and Implementation, pp. 258–267 (June 1993).
J. H. Patel and E. S. Davidson, Improving the throughput of a pipeline by insertion of delays, Proc. of the Third Ann. Int. Symp. on Computer Architecture, pp. 159–164 (1976).
C. Eisenbeis, W. Jalby, and A. Lichnewsky, Squeezing more performance out of a Cray-2 by vector block scheduling, Proc. of Supercomputing ’88, pp. 237–246 (November 1988).
G. R. Beck, D. W. L. Yen, and T. L. Anderson, The Cydra 5 mini-supercomputer: Architecture and implementation, The Journal of Supercomputing, 7:143–180 (1993).
Article Google Scholar
A. E. Eichenberger and E. S. Davidson, Stage scheduling: A technique to reduce the register requirements of a modulo schedule, Proc. of the 28th Ann. Int. Symp. on Microarchitecture, pp. 338–349 (December 1995).
Q. Ning and G. R. Gao, A novel framework of register allocation for software pipelining, Proc. of the 20th Ann. ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pp. 29–42 (1993).
R. Govindarajan, E. R. Altman, and G. R. Gao, Minimizing register requirements under resource-constrained rate-optimal software pipelining, Proc. of the 27th Ann. Int. Symp. on Microarchitecture, pp. 85–94 (November 1994).
J. Wang, A. Krall, and M. A. Ertl, Decomposed software pipelining with reduced register requirement, Proc. of the Int. Conf. on Parallel Architecture and Compiler Techniques (June 1995).
Dupont de Dinechin, Simplex scheduling: More than lifetime-sensitive instruction scheduling, Proc. of the Int. Conf. on Parallel Architecture and Compiler Techniques (1994).
A. E Eichenberger, E. S. Davidson, and S. G. Abraham, Optimum modulo schedules for minimum register requirements, Proc. of the Int. Conf. on Supercomputing, pp. 31–40 (July 1995).
J. Llosa, M. Valero, E. Ayguadé, and A. González, Hypernode reduction modulo scheduling, Proc. of the 28th Ann. Int. Symp. on Microarchitecture, pp. 350–360 (November 1995).
S. Chaudhuri, R. A. Walker, and J. E. Mitchell, Analyzing and exploiting the structure of the constraints in the ILP approach to the scheduling problem, IEEE Transaction on Very Large Scale Integration 2(4):456–471 (December 1994).
Article Google Scholar
K. Paton, An algorithm for finding a fundamental set of cycles of a graph, Comm. ACM 12(9):514–518 (September 1969).
Article MATH Google Scholar
B. R. Rau, Iterative Modulo Scheduling: An algorithm for software pipelining loops, Proc. 27th Ann. Int. Symp. on Microarchitecture, pp. 63–74 (November 1994).
M. Berry, D. Chen, D. Kuck, S. Lo, Y. Pang, L. Pointer, R. Roloff, A. Samah, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Schwarzmeier, L. Lue, S. Orszag, F. Seidl, O. Johnson, R. Goodrum, and J. Martin, The Perfect Club Benchmarks: Effective performance evaluation of supercomputers, Int. J. of Supercomputer Applications, 3(3):5–40 (Fall 1989).
Article Google Scholar
J. Uniejewski, SPEC Benchmark Suite: Designed for today’s advanced system, SPEC Newsletter (Fall 1989).
F. H. McMahon, The Livermore Fortran Kernels: A computer test of the numerical performance range. Technical Report UCRL-53745, Lawrence Livermore National Laboratory, Livermore, California, 1986.
Google Scholar
J. C. Dehnert and R. A. Towle, Compiling for the Cydra 5, The Journal of Supercomputing, 7:181–227 (1993).
Article Google Scholar
G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Optimization, John Wiley & Sons (1988).
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Advanced Computer Architecture Laboratory, EECS Department, University of Michigan, 1301 Beal Avenue, Ann Arbor, Michigan, 48109-2122, USA
Alexandre E. Eichenberger & Edward S. Davidson
Hewlett Packard Laboratories, 1501 Page Mill Road, Palo Alto, California, 94304, USA
Santosh G. Abraham

Authors

Alexandre E. Eichenberger
View author publications
You can also search for this author in PubMed Google Scholar
Edward S. Davidson
View author publications
You can also search for this author in PubMed Google Scholar
Santosh G. Abraham
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexandre E. Eichenberger.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eichenberger, A.E., Davidson, E.S. & Abraham, S.G. Minimizing Register Requirements of a Modulo Schedule via Optimum Stage Scheduling. Int J Parallel Prog 24, 103–132 (1996). https://doi.org/10.1007/BF03356744

Download citation

Published: 26 May 2016
Issue Date: April 1996
DOI: https://doi.org/10.1007/BF03356744

Key Words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Minimizing Register Requirements of a Modulo Schedule via Optimum Stage Scheduling

Abstract

Access this article

Similar content being viewed by others

Adaptive Low-Cost Loop Expansion for Modulo Scheduling

A Static Greedy and Dynamic Adaptive Thread Spawning Approach for Loop-Level Parallelism

Optimal and Heuristic Global Code Motion for Minimal Spilling

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key Words

Navigation

Minimizing Register Requirements of a Modulo Schedule via Optimum Stage Scheduling

Abstract

Access this article

Similar content being viewed by others

Adaptive Low-Cost Loop Expansion for Modulo Scheduling

A Static Greedy and Dynamic Adaptive Thread Spawning Approach for Loop-Level Parallelism

Optimal and Heuristic Global Code Motion for Minimal Spilling

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key Words

Search

Navigation