An algorithm for automatic detection of loop indices for communication overlapping

  • Kazuaki Ishizaki
  • Hideaki Komatsu
  • Toshio Nakatani
IV Compilers
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1336)


This paper presents a compiler algorithm that automatically detects the appropriate loop indices of a given nested loop and applies loop interchange and tiling in order to overlap communication with computation. It also describes method of generating communication for the tiled loop on distributed memory machines. The algorithm presented here has been implemented in our High Performance Fortran (HPF) compiler, and experimental results have shown its effectiveness on the RISC System/6000 Scalable POWERparallel System.


Nest Loop Cache Line Loop Index Tile Size Interprocessor Communication 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Stanford SUIF Compiler Group: “SUIF: A Parallelizing and Optimizing Research Compiler,” Technical Report, Stanford University, CSL-TR-94-620, 1994Google Scholar
  2. 2.
    C. W. Tseng: “An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines,” PhD thesis, Rice University, CRPC-TR93291, 1993Google Scholar
  3. 3.
    Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, and S. Ranka: “Fortran90D/HPF Compiler for Distributed Memory MIMD Computers: Design, Implementation and Performance Results,” in Proceedings of Supercomputing `93, pp. 351–360, 1993Google Scholar
  4. 4.
    P. Banerjee, J. A. Chandy, M. Gupta, J. G. Holm, A. Lain, D. J. Palermo, and S. Ramaswamy: “The PARADIGM Compiler for Distributed-Memory Message Passing Multicomputers,” in Proceedings of the First International Workshop on Parallel Processing, pp. 322–330, 1994Google Scholar
  5. 5.
    High Performance Fortran Forum: “High Performance Fortran Language Specification, Version 1.0,” Technical Report, Rice University, CRPC-TR92225, 1992Google Scholar
  6. 6.
    S. Hiranandani, K. Kennedy, and C. W. Tseng: “Compiling Fortran D for MIMD Distributed-Memory Machines,” Communications of the ACM, Vol. 35, pp. 66–80, 1992CrossRefGoogle Scholar
  7. 7.
    T. Horie, K. Hayashi, T. Shimizu, and H. Ishihata: “Improving AP1000 Parallel Computer Performance with Message Communication,” in the 20th Annual International Symposium on Computer Architecture, pp. 314–325, 1993Google Scholar
  8. 8.
    A. Rogar and K. Pingali: “Process Decomposition Through Locality of Reference,” in Proceedings of the SIGPLAN '89 Conference on Program Language Design and Implementation, 1989Google Scholar
  9. 9.
    D. J. Palermo, E. Su, J. A. Chandy, and P. Banerjee: “Communication Optimizations Used in the PARADIGM Compiler for Distributed-Memory Multicomputers,” In Proceedings of the 23rd International Conference on Parallel Processing, pp. II:1–10, 1994Google Scholar
  10. 10.
    M. Wolfe: “High Performance Compiler for Parallel Computing,” Addison-Wesley Publishing Company, 1995Google Scholar
  11. 11.
    M. E. Wolfe and M. S. Lam: “A Loop Transformation and Theory and an Algorithm to Maximize Parallelism,” IEEE Transaction on Parallel and Distributed Systems, Vol. 2, No. 4, pp. 452–471, 1991CrossRefGoogle Scholar
  12. 12.
    T. Agewara, J. L. Martin, J. H. Mirza, D. C. Sadler, D. M. Dias, and M. Snir: “SP2 System Architecture,” IBM Systems Journal 344, No.2. pp. 152–184,1995Google Scholar
  13. 13.
    A. Lain and P. Banerjee: “Techniques to Overlap Computation and Communication in Irregular Iterative Applications,” in Proceedings of the International Conference on Supercomputing, pp. 236–245, 1994Google Scholar
  14. 14.
    C. Koelbel, P. Mehrotra, and J. V. Rosendale: “Supporting Shared Data Structures on Distributed Memory Architectures,” in Proceedings of the ACM SIGPLAN `90 Symposium on Principles and Practice of Parallel Programming, pp. 177–186, 1990Google Scholar
  15. 15.
    S. Hiranandani, K. Kennedy, and C. W. Tseng: “Preliminary Experiences with the Fortran D Compiler,” in Proceedings of Supercomputing `93, pp. 338–350, 1993Google Scholar
  16. 16.
    R. Hanxlenden and K. Kennedy: “GIVE-N-TAKE: A Balanced Code Placement Framework,” in Proceedings of the ACM SIGPLAN `94 Conference on Program Language Design and Implementation, pp. 107–120,1994Google Scholar
  17. 17.
    A. W. Lim and M. S. Lam: ”Maximizing Parallelism and Minimizing Synchronization with Affine Transforms,” Conference Record of the 24th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1997Google Scholar
  18. 18.
    J. M. Anderson, S. P. Amarasinghe and M. S. Lam: “Data and Computation Transformations for Multiprocessors,“ in Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Processing, 1995Google Scholar
  19. 19.
    M. E. Wolfe and M. S. Lam: “A Data Locality Optimizing Algorithm,” in Proceedings of the ACM SIGPLAN `91 Conference on Program Language Design and Implementation, pp. 30–44, 1991Google Scholar
  20. 20.
    K. Ishizaki and H. Komatsu: “A Loop Parallelization Algorithm for HPF Compilers,” 8th Workshop on Language and Compilers for Parallel Computing, pp. 12.l–15, 1995Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Kazuaki Ishizaki
    • 1
  • Hideaki Komatsu
    • 1
  • Toshio Nakatani
    • 1
  1. 1.IBM Tokyo Research LaboratoryYamato, KanagawaJapan

Personalised recommendations