Optimal tile sizing
Iteration space tiling is a common strategy used by parallelizing compilers to reduce communication overhead. We address the problem of determining the optimal tile size (which minimizes the total execution time of the program), for a particular program schema. We use a realistic model of the architecture which accounts for coprocessors that permit overlapping of communication and computation, context switching times, etc. Determining the optimal tile size is shown to reduce to a non-linear optimization problem. We solve this analytically, yielding a closed form solution that involves only parameters of the architecture and program that are easily determined at compile time. It can thus be used by a compiler before code generation. Although we solve the problem for a particular schema of programs, our results can be generalized to uniform dependence loops and also to certain classes of loop programs with dynamic dependence vectors.
Keywords2-dimensional discrete nonlinear optimization communication-computation overlap dynamic data dependencies message vectorization SPMD programs
Unable to display preview. Download preview PDF.
- 1.Vassil Aleksandrov. Parallel Algorithms for Discrete Optimization Problems. PhD thesis, Center of Computer Science and Technology, Acad. G. Bonchev st., bl. 25a, Sofia 1113, Bulgaria, 1993. Preliminary Project.Google Scholar
- 2.R. Andonov and S. Rajopadhye. An Optimal Algo-tech-cuit for the Knapsack Problem. Technical Report PI-791, IRISA, Campus de Beaulieu, Rennes, France, January 1994. (submitted to IEEE Transactions on Parallel and Distributed Systems).Google Scholar
- 3.R. Andonov and S. Rajopadhye. Optimal Tiling. Technical Report PI-792, IRISA, Campus de Beaulieu, Rennes, France, January 1994. (submitted to Journal of Parallel and Distributed Computing).Google Scholar
- 4.P. Boulet, A. Darte, T. Risset, and Y. Robert. (Pen)-ultimate tiling? Research Report 93–36, ENS de Lyon, 46, Allée d'Italie, 69364 Lyon Cedex 07, France, November 1993.Google Scholar
- 5.François Irigoin. Partitionnement des boucles imbriquées, une technique d'optimisation pour les programmes scientifiques. PhD thesis, Ecole Nationale Supèrieure des Mines de Paris, 1987.Google Scholar
- 6.C-T. King, W-H. Chou, and L. Ni. Pipelined data-parallel algorithms: Part II-design. IEEE Transactions on Parallel and Distributed Systems, 1(4):486–499, October 1990.Google Scholar
- 7.S. Miguet and Y. Robert. Path planning on a ring of processors. Intern. J. Computer Math., 32:61–74, 1990.Google Scholar
- 8.D. I. Moldovan and J. A. B. Fortes. Partitioning and mapping algorithms into fixed size systolic arrays. IEEE Transaction on Computers, C-35(1):1–12, January 1986.Google Scholar
- 9.E. Montagne, M. Rukoz, R. Surós, and F. Breant. Modelling optimal granularity when adapting systolic algorithms to transputer based supercomputers. Parallel Computing, (20):807–814, May 1994.Google Scholar
- 10.J. Ramanujam and P. Sadayappan. Tiling multidimensional itereation spaces for non shared-memory machines. In Supercomputing 91, pages 111–120, 1991.Google Scholar
- 11.M. Wolfe. Iteration space tiling for memory hierarchies. Parallel Processing for Scientific Computing (SIAM), 357–361, 1987.Google Scholar