Advanced loop optimizations for parallel computers

  • Constantine D. Polychronopoulos
Session 4A: Compilers And Restructuring Techniques I
Part of the Lecture Notes in Computer Science book series (LNCS, volume 297)


So far, most of the work on program dependence analysis has concentrated primarily on compile-time techniques, that are not always accurate and which are often conservative. By coupling the compiler's ability to perform elaborate program optimizations and the run-time system's more accurate knowledge of certain program characteristics, we can uncover and exploit even more parallelism in ordinary programs. By performing run-time dependence checking, certain types of loops that were previously treated as serial can be executed concurrently. This paper presents a run-time dependence checking scheme and a new compiler optimization aiming at parallelizing serial loops. In particular we present cycle shrinking, a compiler transformation that "shrinks" the dependence distances in serial loops, allowing parts of such loops to execute concurrently. Code reordering for minimizing communication and Subscript blocking, are also discussed briefly.


Dependence Graph Iteration Space Nest Loop Index Expression True Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [AhSU86]
    A.V. Aho, R. Sethi, and J.D. Ullman, Compilers: Principles, Techniques, and Tools, Addison-Wesley, Reading, Massachusetts, 1986.Google Scholar
  2. [AlCo72]
    F.E. Allen and J. Cocke, “A Catalogue of Optimizing Transformations,” Design and Optimization of Compilers, R. Rustin, Ed. Prentice-Hall, Englewood Cliffs, N.J., 1972, pp. 1–30.Google Scholar
  3. [AlKe82]
    J.R. Allen and K. Kennedy, “PFC: A Program to Convert Fortran to Parallel Form,” Techn. Rept. MASC-TR82-6, Rice University, Houston, Texas, March 1982.Google Scholar
  4. [Alli85]
    Alliant Computer Systems Corp., "FX/Series Architecture Manual," Acton, Massachusetts, 1985Google Scholar
  5. [ANSI86]
    American National Standards Institute, American National Standard for Information Systems. Programming Language Fortran S8 (X3.9-198x). Revision of X3.9-1978, Draft S8, Version 99, ANSI, New York, April 1986.Google Scholar
  6. [Bane79]
    U. Banerjee, "Speedup of Ordinary Programs," Ph.D. Thesis, University of Illinois at Urbana-Champaign, DCS Report No. UIUCDCS-R-79-989, October 1979.Google Scholar
  7. [Brod81]
    B. Brode, “Precompilation of Fortran Programs to Facilitate Array Processing,” Computer 14, 9, September 1981, pp. 46–51.Google Scholar
  8. [Chen83]
    S. Chen, "Large-scale and High-speed Multiprocessor System for Scientific Applications — Cray-X-MP-2 Series," Proc. of NATO Advanced Research Workshop on High Speed Computing, Kawalik(Editor), pp. 59–67, June 1983.Google Scholar
  9. [Cytr84]
    R.G. Cytron, “Doacross: Beyond Vectorization for Multiprocessors (Extended Abstract),” Proceedings of the 1986 International Conference on Parallel Processing, St. Charles, IL, pp. 836–844, August, 1986.Google Scholar
  10. [Beck81]
    J. R. Beckman Davies, “Parallel Loop Constructs for Multiprocessors,” M.S. Thesis, University of Illinois at Urbana-Champaign, DCS Report No. UIUCDCS-R-81-1070, May, 1981.Google Scholar
  11. [Kenn80]
    K. Kennedy, “Automatic Vectorization of Fortran Programs to Vector Form,” Technical Report, Rice University, Houston, TX, October, 1980.Google Scholar
  12. [KKLW80]
    D.J. Kuck, R.H. Kuhn, B. Leasure, and M. Wolfe, "The Structure of an Advanced Vectorizer for Pipelined Processors," Fourth International Computer Software and Applications Conference, October, 1980.Google Scholar
  13. [KKPL81]
    D.J. Kuck, R. Kuhn, D. Padua, B. Leasure, and M. Wolfe, "Dependence Graphs and Compiler Optimizations," Proceedings of the 8-th ACM Symposium on Principles of Programming Languages, pp. 207–218, January 1981.Google Scholar
  14. [KDLS86]
    D. J. Kuck, E. S. Davidson, D. H. Lawrie, and A.H. Sameh, “Parallel Supercomputing Today and the Cedar Approach,” Science 231, 4740 February 28, 1986, pp. 967–974.Google Scholar
  15. [Kuck78]
    D.J. Kuck, The Structure of Computers and Computations, Volume 1, John Wiley and Sons, New York, 1978.Google Scholar
  16. [MeRo85]
    P. Mehrotra and J. Van Rosendale, “The Blaze Language: A Parallel Language for Scientific Programming,” Rep. 85–29, Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, Hampton, Va., May 1985.Google Scholar
  17. [MiUc84]
    K. Miura and K. Uchida, “Facom Vector Processor VP-100/VP-200,” High Speed Computation, NATO ASI Series, Vol. F7, J.S. Kowalik Ed., Springer-Verlag, New York, 1984.Google Scholar
  18. [NIOK84]
    S. Nagashima, Y. Inagami, T. Odaka, and S. Kawabe, “Design Consideration for a High-Speed Vector Processor: The Hitachi S-810,” Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers, ICCD 84, IEEE Press, New York, 1984.Google Scholar
  19. [Padu79]
    D.A. Padua Haiek, "Multiprocessors: Discussions of Some Theoretical and Practical Problems," Ph.D. Thesis, University of Illinois at Urbana-Champaign, DCS Report No. UIUCDCS-R-79-990, November 1979.Google Scholar
  20. [PaWo86]
    D.A. Padua, and M. Wolfe, “Advanced Compiler Optimizations for Supercomputers,” Communications of the ACM, Vol. 29, No. 12, pp. 1184–1201, December 1986.Google Scholar
  21. [PoKu87]
    C. D. Polychronopoulos and D. J. Kuck, “Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers,” to appear IEEE Transactions on Computers, Special Issue on Supercomputing, December, 1987.Google Scholar
  22. [PoKP87]
    C. D. Polychronopoulos, D. J. Kuck, and D. A. Padua, “Utilizing Multidimensional Loop Parallelism on Large-Scale Parallel Processor Systems,” accepted for publication, IEEE Transactions on Computers, September 1987.Google Scholar
  23. [Poly86]
    C. D. Polychronopoulos, “On Program Restructuring, Scheduling, and Communication for Parallel Processor Systems,” Ph.D. Thesis, CSRD No. 595, Center for Supercomputing Research and Development, University of Illinois, August, 1986.Google Scholar
  24. [Poly87]
    C. D. Polychronopoulos, “More on Loop Optimizations,” Technical Report, Center for Supercomputing Research and Development, University of Illinois, September, 1987.Google Scholar
  25. [Wolf82]
    M. J. Wolfe, “Optimizing Supercompilers for Supercomputers,” Ph.D. Thesis, University of Illinois at Urbana-Champaign, DCS Report No. UIUCCDCS-R-82-1105, 1982.Google Scholar
  26. [ZhYe84]
    C. Q. Zhu and P. C. Yew, “A Synchronization Scheme and Its Applications for Large Multiprocessor Systems,” Proc. of the 1984 International Conference on Distributed Computing Systems, pp. 486–493, May 1984.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1988

Authors and Affiliations

  • Constantine D. Polychronopoulos
    • 1
  1. 1.Center for Supercomputing Research and Development and Department of Electrical and Computer EngineeringUniversity of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations