International Journal of Parallel Programming

, Volume 29, Issue 5, pp 545–581

Optimized Unrolling of Nested Loops

Authors

  • Vivek Sarkar
    • IBM T. J. Watson Research Center
Article

DOI: 10.1023/A:1012246031671

Cite this article as:
Sarkar, V. International Journal of Parallel Programming (2001) 29: 545. doi:10.1023/A:1012246031671

Abstract

Loop unrolling is a well known loop transformation that has been used in optimizing compilers for over three decades. In this paper, we address the problems of automatically selecting unroll factors for perfectly nested loops, and generating compact code for the selected unroll factors. Compared to past work, the contributions of our work include (i) a more detailed cost model that includes register locality, instruction-level parallelism and instruction-cache considerations; (ii) a new code generation algorithm that generates more compact code than the unroll-and-jam transformation; and (iii) a new algorithm for efficiently enumerating feasible unroll vectors. Our experimental results confirm the wide applicability of our approach by showing a 2.2× speedup on matrix multiply, and an average 1.08× speedup on seven of the SPEC95fp benchmarks (with a 1.2× speedup for two benchmarks). Larger performance improvements can be expected on processors that have larger numbers of registers and larger degrees of instruction-level parallelism than the processor used for our measurements (PowerPC 604).

loop transformationsloop unrollingunroll-and-jamunroll factors
Download to read the full article text

Copyright information

© Plenum Publishing Corporation 2001