International Journal of Parallel Programming

, Volume 28, Issue 6, pp 607–631 | Cite as

Index Set Splitting

  • Martin Griebl
  • Paul Feautrier
  • Christian Lengauer
Article

Abstract

There are many algorithms for the space-time mapping of nested loops. Some of them even make the optimal choices within their framework. We propose a preprocessing phase for algorithms in the polytope model, which extends the model and yields space-time mappings whose schedule is, in some cases, orders of magnitude faster. These are cases in which the dependence graph has small irregularities. The basic idea is to split the index set of the loop nests into parts with a regular dependence structure and apply the existing space-time mapping algorithms to these parts individually. This work is based on a seminal idea in the more limited context of loop parallelization at the code level. We elevate the idea to the model level (our model is the polytope model), which increases its applicability by providing a clearer and wider range of choices at an acceptable analysis cost. Index set splitting is one facet in the effort to extend the power of the polytope model and to enable the generation of competitive target code.

automatic loop parallelization scheduling polytope model 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

REFERENCES

  1. 1.
    P. Feautrier, Automatic parallelization in the polytope model, The Data Parallel Programming Model, G.-R. Perrin and A. Darte (eds.), Springer-Verlag, Lecture Notes in Computer Science, Vol. 1132, pp. 79–103 (1996).Google Scholar
  2. 2.
    C. Lengauer, Loop parallelization in the polytope model, CONCUR'93, E. Best (ed.), Springer-Verlag, Lecture Notes in Computer Science, Vol. 715, pp. 398–416 (1993).Google Scholar
  3. 3.
    A. Darte, L. Khachiyan, and Y. Robert, Linear scheduling is nearly optimal, Parallel Processing Letters 1(2):73–81 (December 1991).Google Scholar
  4. 4.
    P. Quinton, The systematic design of systolic arrays, Automata Networks in Computer Science, F. F. Soulié, Y. Robert, and M. Tchuenté (eds.), Chap. 9, Manchester University Press, pp. 229–260 (1987). [Also: Technical Reports 193 and 216, IRISA (INRIA-Rennes), 1983].Google Scholar
  5. 5.
    A. Darte and F. Vivien, On the optimality of Allen and Kennedy's algorithm for parallelism extraction in nested loops, Euro-Par'96: Parallel Processing, Vol. I, L. Bouge, P. Fraigniaud, A. Mignotte, and Y. Robert (eds.), Springer-Verlag, Lecture Notes in Computer Science, Vol. 1123, pp. 379–388 (1996).Google Scholar
  6. 6.
    C. Ancourt and F. Irigoin, Scanning polyhedra with DO loops, Proc. Third ACM SIGPLAN Symp. Principles Practice of Parallel Programming (PPoPP'91), ACM Press, pp. 39–50 (1991).Google Scholar
  7. 7.
    J. Ferrante, W. Giloi, S. Rajopadhye, and L. Thiele (eds.), Tiling for optimal resource utilization. Technical Report 221, Schloß Dagstuhl (August 1998).Google Scholar
  8. 8.
    R. Andonov, S. Rajopadhye, and N. Yanev, Optimal orthogonal tiling, Euro-Par'98: Parallel Processing, D. Pritchard and J. Reeve (eds.), Springer-Verlag, Lecture Notes in Computer Science, Vol. 1470, pp. 480–490 (1998).Google Scholar
  9. 9.
    R. Barua, D. Kranz, and A. Agarwal, Communication-minimal partitioning of parallel loops and data arrays for cache-coherent distributed-memory multiprocessors, Languages and Compilers for Parallel Computing (LCPC'96), D. Sehr, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua (eds.), Springer-Verlag, Lecture Notes in Computer Science, Vol. 1239, pp. 350–368 (1997).Google Scholar
  10. 10.
    P. Feautrier, Some efficient solutions to the affine scheduling problem. Part I. One-dimen-sional time, IJPP 21(5):313–348 (1992).Google Scholar
  11. 11.
    P. Feautrier, Some efficient solutions to the affine scheduling problem. Part II. Multi-dimensional time, IJPP 21(6):389–420 (1992).Google Scholar
  12. 12.
    M. Wolfe, Optimizing supercompilers for supercomputers, Research Monographs in Parallel and Distributed Computing, MIT Press (1989).Google Scholar
  13. 13.
    J. R. Allen and K. Kennedy, Automatic translation of FORTRAN programs to vector form, ACM Trans. Progr. Lang. Syst. 9(4):491-542 (October 1997).Google Scholar
  14. 14.
    U. Banerjee, Speedup of ordinary programs, Ph.D. thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, Report 79–989 (October 1979).Google Scholar
  15. 15.
    Z. Mahjoub and M. Jemni, Restructuring and parallelizing a static conditional loop, Parallel Computing 21(2):339–347 (February 1995).Google Scholar
  16. 16.
    Z. Mahjoub and M. Jemni, On the parallelization of single dynamic conditional loops, Simulation Practice and Theory 4:141–154 (1996).Google Scholar
  17. 17.
    P. Feautrier, Dataflow analysis of array and scalar references, IJPP 20(1):23–53 (February 1991).Google Scholar
  18. 18.
    W. Pugh and D. Wonnacott, Static analysis of upper and lower bounds on dependences and parallelism, ACM Trans. Progr. Lang. Syst. 16(4):1248–1278 (July 1994).Google Scholar
  19. 19.
    R. W. Floyd and R. Beigel, The Language of Machines-An Introduction to Compatibility and Formal Languages, Chap. 4.4, Computer Science Press (1994).Google Scholar
  20. 20.
    W. Pugh and D. Wonnacott, Eliminating false data dependences using the Omega test, Proc. ACM SIGPLAN Conf. Progr. Lang. Design and Implementation (PLDI'92), ACM SIGPLAN Notices 27(7):140–151 (July 1992).Google Scholar
  21. 21.
    A. Darte and F. Vivien, Automatic parallelization based on multi-dimensional scheduling. Technical Report 94-24, Laboratoire de l'Informatique du Parallélisme, Ecole Normale Supérieure de Lyon (September 1994).Google Scholar
  22. 22.
    A. Darte and F. Vivien, Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs. Technical Report 96–06, Laboratoire de l'Infor-matique du Parallélisme, Ecole Normale Supérieure de Lyon (April 1996).Google Scholar
  23. 23.
    M. Griebl and C. Lengauer, The loop parallelizer LooPo-Announcement, Languages and Compilers for Parallel Computing (LCPC'96), D. Sehr, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua (eds.), Springer-Verlag, Lecture Notes in Computer Science, Vol. 1239, pp. 603–604 (1997).Google Scholar
  24. 24.
    J.-F. Collard and M. Griebl, A precise fixpoint reaching definition analysis for arrays, Languages and Compilers for Parallel Computing (LCPC'99), J. Ferrante (ed.), Springer-Verlag, Lecture Notes in Computer Science (to appear).Google Scholar
  25. 25.
    D. K. Arvind, K. Ebcioglu, C. Lengauer, and R. S. Schreiber, (eds.), Instruction-Level Parallelism and Parallelizing Compilation, Schloß Dagstuhl, Report 237 (1999).Google Scholar
  26. 26.
    G. H. Hardy and E. M. Wright, An Introduction to the Theory of Numbers, Oxford Science Publications, Fifth ed., Oxford University Press (1990).Google Scholar

Copyright information

© Plenum Publishing Corporation 2000

Authors and Affiliations

  • Martin Griebl
    • 1
  • Paul Feautrier
    • 2
  • Christian Lengauer
    • 1
  1. 1.Universität PassauFMIPassauGermany
  2. 2.Université de VersaillesPRiSMVersaillesFrance

Personalised recommendations