Register pipelining: An integrated approach to register allocation for scalar and subscripted variables

  • Evelyn Duesterwald
  • Rajiv Gupta
  • Mary Lou Soffa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 641)

Abstract

Conventional compilers typically ignore the potential benefits of keeping array elements in registers for reuse, due in part to the fact that standard data flow analysis techniques used in register allocation are not expressive enough to distinguish among individual array elements. This paper introduces the concept of register pipelining as an integrated approach to register allocation for both scalar and subscripted variables. A register pipeline is a set of registers that is allocated to the live ranges of array elements inside a loop. By preserving the computed array elements in the pipeline stages, reuse is enabled across loop iterations. We present an efficient data flow algorithm that extends the construction of live ranges to array elements. To enable a fair competition among the live ranges of subscripted and scalar variables for the available registers, we developed an integrated version of the standard graph coloring algorithm for register allocation.

References

  1. 1.
    A. V. Aho, R. Sethi, and J. D. Ullman, in Compilers, principles, techniques, and tools, Addison-Wesley Publishing Company, Massachusetts, 1986.Google Scholar
  2. 2.
    A. Aiken and A. Nicolau, “Optimal loop parallelization,” Proc. of the ACM SIGPLAN '88 Conf. on Programming Language Design and Implementation, pp. 308–317, Atlanta, Georgia, June 1988.Google Scholar
  3. 3.
    M. E. Benitez and J. W. Davidson, “Code generation for streaming: an access/execute mechanism,” Proc. of the 4th Int. Conf. on Architectural Support for Programming Languages and Operating Systems., pp. 132–141, Santa Clara, California, April 1991.Google Scholar
  4. 4.
    D. Callahan, S. Carr, and K. Kennedy, “Improving register allocation for subscripted variables,” Proc. of the ACM SIGPLAN '90 Conf. Programming Language Design and Implementation, pp. 53–65, White Plains, New York, June 1990.Google Scholar
  5. 5.
    D. Callahan, K. Kennedy, and A. Porterfield, “Software Prefetching,” Proc. of the 4th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 40–S2, Santa Clara, California, April 1991.Google Scholar
  6. 6.
    G. J. Chaitin, “Register allocation and spilling via graph coloring,” (Proc. of the ACM SIGPLAN 82 Symp. on Compiler Construction), ACM SIGPLAN Notices, vol. 17, no. 6, pp. 201–207, June 1982.Google Scholar
  7. 7.
    F. Chow and J. Hennessy, “Register allocation by priority-based coloring,” ACM SIGPLAN Notices, vol. 19, no. 6, pp. 222–232, 1984.Google Scholar
  8. 8.
    J.C. Dehnert, P.Y.-T. Hsu, and J.P. Bratt, “Overlapped loop support in the Cydra 5,” Proc. of the 3rd Int. Conf. on Architectural Support for Programming Languages and Operating Systems., pp. 26–39, Boston, Massachusetts, April 1989.Google Scholar
  9. 9.
    D. Gannon, W. Jalby, and K. Gallivan, “Strategies for cache and local memory management by global program transformation,” Journal of Parallel and Distributed Computing, no. 5, pp. 587–616, 1988.Google Scholar
  10. 10.
    A. D. Kallis and D. Klappholz, “Reaching definitions analysis on code containing array references,” 4th Workshop on Languages and Compilers for Parallel Computing, Santa Clara, California, August 1991.Google Scholar
  11. 11.
    D.J. Kuck, R.H. Kuhn, D. Padua, B.R, Leisure, and M. Wolfe, “Dependence graphs and compiler optimization,” Proc. of the 8th ACM Symp. on Principles of Programming Languages, pp. 207–218, Williamsburgh, Virginia, January, 1981.Google Scholar
  12. 12.
    M. S. Lam, “Software pipelining: An effective scheduling technique for VLIW machines,” Proc. of the ACM SIGPLAN '88 Conf. Programming Language Design and Implementation, pp. 318–328, Atlanta, Georgia, June 1988.Google Scholar
  13. 13.
    M. S. Lam, E. E. Rothberg, and M. E. Wolf, “The cache performance and optimizations of blocked algorithms,” Proc. of the 4th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 63–74, Santa Clara, California, April 1991.Google Scholar
  14. 14.
    A. Porterfield, “Software methods for improvement of cache performance on supercomputer applications,” PhD. thesis, Rice University, May 1989.Google Scholar
  15. 15.
    B. R. Rau, “Data flow and dependence analysis for instruction-level parallelism,” 4th Workshop on Languages and Compilers for Parallel Computing, Santa Clara, California, August 1991.Google Scholar
  16. 16.
    M. E. Wolf and M. S. Lam, “A data locality optimizing algorithm,” Proc. of the ACM SIGPLAN '91 Conf. Programming Language Design and Implementation, pp. 30–44, Toronto, Ontario, Canada, June 1991.Google Scholar
  17. 17.
    M. Wolfe and U. Banerjee, “Data dependence and its application to parallel processing,” Int. Journal of Parallel Programming, vol. 16, no. 2, pp. 137–178, 1987.MathSciNetGoogle Scholar
  18. 18.
    M. Wolfe, “Optimizing supercompilers for supercomputers,” Pitman Publishing Company, London, MIT Press, Cambridge, Massachusets, 1989.Google Scholar

Copyright information

© Springer-Verlag 1992

Authors and Affiliations

  • Evelyn Duesterwald
    • 1
  • Rajiv Gupta
    • 1
  • Mary Lou Soffa
    • 1
  1. 1.Department of Computer ScienceUniversity of PittsburghPittsburgh

Personalised recommendations