Loop quantization or unwinding done right

  • Alexandru Nicolau
Session 4A: Compilers And Restructuring Techniques I
Part of the Lecture Notes in Computer Science book series (LNCS, volume 297)


Loop unwinding is a known technique for reducing loop overhead, exposing parallelism and increasing the efficiency of pipelining. Traditional loop unwinding is limited to the innermost loop in a group of nested loops and the amount of unwinding is either fixed or has to be specified by the user, on a case by case basis. In this paper we present a general technique for automatically unwinding multiply nested loops, explain its advantages over other transformation techniques and illustrate its practical effectiveness. Loop Quantization could be beneficial by itself, or coupled with other loop transformations (e.g., Do-across).


Memory Location Nest Loop Execution Order Loop Body Loop Transformation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    A.Aiken and A.Nicolau. Loop Quantization: An analysis and Algorithm. Technical Report No.87-821, Department of Computer Science, Cornell University, March 1987.Google Scholar
  2. [2]
    J.R.Allen and K.Kennedy. Automatic Loop Interchange. In the Proceedings of the Symposium on Compiler Construction, SIGPLAN Notices, Vol.19 No.6, 1984.Google Scholar
  3. [3]
    Alliant. Product Summary. Alliant Computer Systems Corporation. Acton Mass. January 1985.Google Scholar
  4. [4]
    U.Banerjee. Speedup of Ordinary Programs. University of Illinois Computer Science Technical Report UIUCDS-R-79-989, Oct. 1979.Google Scholar
  5. [5]
    R. Bogen. MACSYMA Reference Manual. Symbolics Inc., Cambridge, Mass. December 1983.Google Scholar
  6. [6]
    R.Brent. The Parallel Evaluation of General Arithmetic Expressions. Journal of the ACM 21, pp. 201–206, 1974.Google Scholar
  7. [7]
    A.E. Charlesworth. An approach to Scientific Array Processing: The Architectural Design of the AP-120b/FPS-164 Family. IEEE Computer, Vol.14, No.3, pp.18–27, 1981.Google Scholar
  8. [8]
    R.Cytron. Doacross: beyond vectorization for multiprocessors. Proceedings of the 1986 International Conference on Parallel Processing, pp.836–844, Aug.1986.Google Scholar
  9. [9]
    J.A.Fisher, J.R.Ellis, J.C.Ruttenberg and A.Nicolau. Parallel Processing: A Smart Compiler and a Dumb Machine. Proc. of the ACM Symposium on Compiler Construction, 1984.Google Scholar
  10. [10]
    J. A. Fisher. The Optimization of Horizontal Microcode within and beyond Basic Blocks: an Application of Processor Scheduling with Resources. New York University Ph. D. thesis, New York, 1979.Google Scholar
  11. [11]
    J.A.Fisher Very long instruction word architectures and the ELI-512. Yale University Department of Computer Science, Technical report # 253, 1982.Google Scholar
  12. [12]
    J. R. Goodman, J. Hsieh, K. Liou, A. R. Pleszkun, P. B. Schechter, H. C. Young. PIPE: A VLSI Decoupled Architecture. The 12th Annual International Symposium on Computer Architecture, June 17–19, 1985, Boston, MA, 20–27.Google Scholar
  13. [13]
    R.W.Heuft and W.D.Little. Improved Time and Parallel Processor Bounds for Fortran-like Loops. IEEE Transactions on Computers Vol.31, No.1, 1982.Google Scholar
  14. [14]
    D.J. Kuck. Parallel Processing of Ordinary Programs. In Advances in Computers, Vol. 15, pp. 119–179, 1976.Google Scholar
  15. [15]
    R.H.Khun. Optimization and Interconnection Complexity for: Parallel Processors, Single-Stage Networks and Decision Trees. Ph.D. Thesis, University of Illinois at Urbana-Champaign, 1980.Google Scholar
  16. [16]
    F. H. McMahon. Lawrence Livermore National Laboratory FORTRAN Kernels: MFLOPS. Livermore, CA. 1983.Google Scholar
  17. [17]
    Y.Muraoka. Parallelism Exposure and Exploitation in Programs. University of Illinois, Urbana, Dept. of Computer Science, Tech. Rep. 71–424, 1971.Google Scholar
  18. [18]
    A.Nicolau. Parallelism, Memory Anti-Aliasing and Correctness for Trace Scheduling Compilers. Yale University Ph.D. Thesis, June 1984.Google Scholar
  19. [19]
    A.Nicolau. Percolation Scheduling: A Parallel Compilation Technique. Cornell University, Dept. of Computer Science Technical Report TR-85-678, May 1985.Google Scholar
  20. [20]
    A. Nicolau and K. Karplus. ROPE: a Statically Scheduled Supercomputer Architecture. First International Conference on Supercomputing Systems, St. Petersburg, FL, December 1985.Google Scholar
  21. [21]
    C.L.Seitz. The Cosmic Cube. Communications of the ACM, Vol.28, No.1 January 1985.Google Scholar
  22. [22]
    J.Solworth and A.Nicolau. Microflow: A fine-grain Parallel Processing Approach. Cornell University, Dept. of Computer Science Technical Report TR-85-710Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1988

Authors and Affiliations

  • Alexandru Nicolau
    • 1
  1. 1.Department of Computer ScienceCornell UniversityIthaca

Personalised recommendations