Collective loop fusion for array contraction

  • G. Gao
  • R. Olsen
  • V. Sarkar
  • R. Thekkath
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 757)


In this paper we propose a loop fusion algorithm specifically designed to increase opportunities for array contraction. Array contraction is an optimization that transforms array variables into scalar variables within a loop nest. In contrast to array elements, scalar variables have better cache behavior and can be allocated to registers. In past work we investigated loop interchange and loop reversal as optimizations that increase opportunities for array contraction [13]. This paper extends this work by including the loop fusion optimization. The fusion method discussed in this paper uses the maxflow-mincut algorithm to do loop clustering. Our collective loop fusion algorithm is efficient, and we demonstrate its usefulness for array contraction with a simple example.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    J. R. Allen and K. Kennedy. Vector register allocation. Technical Report TR86-45, Rice University, Houston, TX, December 1986.Google Scholar
  2. [2]
    John R. Allen. Dependence Analysis for Subscripted Variables and its Application to Program Transformation. PhD thesis, Rice University, 1983.Google Scholar
  3. [3]
    R. Allen and K. Kennedy. Automatic translation of FORTRAN programs to vector form. ACM Transactions on Programming Languages and Systems, 9:491–542, 1987.Google Scholar
  4. [4]
    David Callahan, Steve Carr, and Ken Kennedy. Improving register allocation for subscripted variables. Proceedings of the SIGPLAN '90 Conference on Programming Language Design and Implementation, June 1990. White Plains, NY.Google Scholar
  5. [5]
    J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 9(3):319–349, July 1987.Google Scholar
  6. [6]
    Jeanne Ferrante, Vivek Sarkar, and Wendy Thrash. On estimating and enhancing cache effectiveness. Proceedings of the Fourth Workshop on Languages and Compilers for Parallel Computing, August 1991. To appear in Springer Verlag's Lecture Notes in Computer Science series.Google Scholar
  7. [7]
    Allen Goldberg and Robert Paige. Stream processing. In 1984 ACM Symposium on Lisp and Functional Programming, pages 53–62, Austin, TX, August 1984.Google Scholar
  8. [8]
    Ken Kennedy and Kathryn S. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. Technical report, Rice University, August 1992. Rice COMP TR92-189.Google Scholar
  9. [9]
    D. J. Kuck, Kuhn R., D. Padua, B. Leasure, and M. Wolfe. Dependence graphs and compiler optimizations. In Proceedings of the Eighth ACM Symposium on Principles of Programming Languages, pages 207–218, January 1981.Google Scholar
  10. [10]
    Russell Olsen. Analysis and transformation of loop clusters. Master's thesis, McGill University, Montreal, May 1992.Google Scholar
  11. [11]
    Allan K. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Rice University, May 1989. COMP TR89-93.Google Scholar
  12. [12]
    Vivek Sarkar. The PTRAN parallel programming system. In B. Szymanski, editor, Parallel Functional Programming Languages and Environments. McGraw-Hill Series in Supercomputing and Parallel Processing, 1990.Google Scholar
  13. [13]
    Vivek Sarkar and Guang R. Gao. Optimization of array accesses by collective loop transformations. Proceedings of the 1991 ACM International Conference on Supercomputing, pages 194–205, June 1991.Google Scholar
  14. [14]
    R. E. Tarjan. Data Structures and Network Algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, 1983.Google Scholar
  15. [15]
    Joe Warren. A hierarchical basis for reordering transformations. Eleventh ACM Principles of Programming Languages Symposium, pages 272–282, January 1984. Salt Lake City, UT.Google Scholar
  16. [16]
    Michael E. Wolf and Monica S. Lam. A data locality optimizing algorithm. ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, June 26–28 1991.Google Scholar
  17. [17]
    Michael J. Wolfe. Optimizing Supercompilers for Supercomputers. Pitman, London and MIT Press, Cambridge, MA, 1989. In the series, Research Monographs in Parallel and Distributed Computing. Revised version of the author's Ph.D. dissertation, Published as Technical Report UIUCDCS-R-82-1105, University of Illinois at Urbana-Champaign, 1982.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1993

Authors and Affiliations

  • G. Gao
    • 1
  • R. Olsen
    • 1
  • V. Sarkar
    • 2
  • R. Thekkath
    • 3
  1. 1.McGill UniversityMontréalCanada
  2. 2.IBM Palo Alto Scientific CenterPalo Alto
  3. 3.University of Washington at SeattleUSA

Personalised recommendations