Affine data mapping with residual communication optimization: Evaluation of heuristics
Minimizing time spent on communications when mapping affine loop nests onto distributed memory parallel computers (DMPCs) is a key problem with regard to performance, and many authors have dealt with it. All communications are not equivalent. Local communications (translations), simple communications (horizontal or vertical ones) or structured communications (broadcasts, gathers, scatters, or reductions) are performed much faster than general affine communications onto DMPCs.
Dion, Randriamaro and Robert have presented an heuristic based on the following strategy: (1) zero out as many nonlocal communications as possible, (2) as it is generally impossible to obtain a communication local mapping, try to optimize residual communications.
The aim of this paper is to present an evaluation of the heuristic given by Dion, Randriamaro and Robert. First, we recall the motivations of their approach and we evaluate its efficiency on classical linear algebra examples.
Unable to display preview. Download preview PDF.
- 1.Jennifer M. Anderson and Monica S. Lam. Global optimizations for parallelism and locality on scalable parallel machines. ACM Sigplan Notices, 28(6):112–125, June 1993.Google Scholar
- 2.Tzung-Shi Chen and Jang-Ping Sheu. Communication-free data allocation techniques for parallelizing compilers on multicomputers. IEEE Transactions on Parallel and Distributed Systems, 5(9):924–938, 1994.Google Scholar
- 3.Alain Darte and Yves Robert. Mapping uniform loop nests onto distributed memory architectures. Parallel Computing, 20:679–710, 1994.Google Scholar
- 4.Michèle Dion, Cyril Randriamaro, and Yves Robert. How to optimize residual communications? Research Report 95-27, Ecole Normale Supérieure de Lyon, 46 Allée d'Italie, 69364 Lyon Cedex 07, France, 1995. Available by anonymous ftp to lip.ens-lyon.fr in the pub/rapports/RR/RR94 directory.Google Scholar
- 5.Michèle Dion and Yves Robert. Mapping affine loop nests: New results. In Bob Hertzberger and Guiseppe Serazzi, editors, High-Performance Computing and Networking, International Conference and Exhibition, volume LCNS 919, pages 184–189. Springer-Verlag, 1995. Extended version available as Technical Report 94-30, LIP, ENS Lyon (anonymous ftp to lip.ens-lyon.fr).Google Scholar
- 6.J.R. Evans and E. Minieka. Optimization Algorithms for Networks and Graphs. Marcel Dekker Inc, 1992.Google Scholar
- 7.Paul Feautrier. Towards automatic distribution. Parallel Processing Letters, 4(3):233–244, 1994.Google Scholar
- 8.Gene H. Golub and Charles F. Van Loan. Matrix computations. Johns Hopkins, 2 edition, 1989.Google Scholar
- 9.C.H. Huang and P. Sadayappan. Communication-free hyperplane partitioning of nested loops. In Banerjee, Gelernter, Nicolau, and Padua, editors, Languages and Compilers for Parallel Computing, volume 589 of Lecture Notes in Computer Science, pages 186–200. Springer Verlag, 1991.Google Scholar
- 10.Kathleen Knobe, Joan D. Lukas, and Guy L. Steele. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8:102–118, 1990.Google Scholar
- 11.Jingke Li and Marina Chen. The data alignment phase in compiling programs for distributed memory machines. Journal of Parallel and Distributed Computing, 13:213–221, 1991.Google Scholar
- 12.Alexis Platonoff. Contribution à la Distribution Automatique des Données pour Machines Massivement Parallèles. PhD thesis, École Nationale Supérieure des Mines de Paris, March 1995.Google Scholar
- 13.W. Shang and Z. Shu. Data alignment of loop nests without nonlocal communications. In Application Specific Array Processors, pages 439-450. IEEE Computer Society Press, 1994.Google Scholar