Compiler optimizations for massively parallel machines: Transformations on iterative spatial loops
This paper presents a set of compiler optimizations and their application strategies for a common class of data parallel loop nests. The arrays updated in the body of the loop nests are assumed to be partitioned into blocks (rectangular, rows, or columns) where each block is assigned to a processor.
These optimizations are demonstrated in the context of a FORTRAN-90 compiler with very encouraging preliminary results. In the case of solving tridiagonal systems by Gaussian Elimination, the performance of the optimized native code is two orders of magnitude better than the CM-FORTRAN compiler and approaching that of the hand-written Connection Machine Scientific Library (CMSSL) routine.
Unable to display preview. Download preview PDF.
- CM Fortran Reference Manual, version 1.0. Thinking Machines Corp., Cambridge, MA, February, 1991.Google Scholar
- CMSSL Release Notes, version 2.2. Thinking Machines Corp., Cambridge, MA, June, 1991.Google Scholar
- S. Abraham and D. Hudak. Compile-time partitioning of iterative parallel loops to reduce cache coherency traffic. IEEE Trans. on Parallel and Distributed Systems, 2(3):318–328, July 1991.Google Scholar
- U. Banerjee. Speedup of Ordinary Programs. PhD thesis, University of Illinois at Urbana-Champaign, 1979.Google Scholar
- M. Chen, Y. Choo, and J. Li. Compiling parallel programs by optimizing performance. Journal of Supercomputing, 1(2):171–207, July 1988.Google Scholar
- M. Chen and J. Cowie. Prototyping Fortran-90 compilers for massively parallel machines. In Proceedings of the ACM SIGPLAN'92 Conference on Programming Language Design and Implementation, June 1992.Google Scholar
- M. Chen and J. Wu. Optimizing Fortran-90 programs for data motion on massively parallel systems. Technical Report YALEU/DCS/TR-882, Department of Computer Science, Yale University, 1991.Google Scholar
- Y. Hu. Boolean cube emulation of PM2I networks encoded by Gray code. Manuscripts, November 1991.Google Scholar
- H. Siegel. Interconnention Networks for Large Scale Parallel Processing. Lexington Books, Lexington, MA, 1985.Google Scholar
- M. E. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. on Parallel and Distributed Systems, 2(4):452–471, Oct. 1991.Google Scholar
- M. J. Wolfe. Optimizing Supercompilers for Supercomputers. PhD thesis, University of Illinois at Urbana-Champaign, 1982.Google Scholar
- M. J. Wolfe. Optimizing Supercompilers for Supercomputers. The MIT Press, 1989.Google Scholar