# Performance comparison of strategies for static mapping of parallel programs

## Abstract

We address the problem of efficient strategies for mapping arbitrary parallel programs onto distributed memory message-passing parallel computers. An efficient task assignment strategy based on two phases (task clustering and task reassignment) is proposed. This strategy is suitable for applications which could be partitioned into parallel executable tasks and its design is a trade-off between solution quality and computational complexity. It is evaluated by comparison with some representative heuristics from the literature that cover the range of different complexity categories: two simple greedy algorithms (Largest Processing Time First and Largest Global Cost First) with low complexity, two iterative heuristics (Simulated Annealing and Tabu Search) with the highest complexity and a mixed heuristic (Even Distribution and Task Reassignment) with an intermediate complexity between the other two. It is shown to be very effective for computations whose communication topology matches some well-known regular graph families such as trees, rings and meshes, as well as for arbitrary computations with irregular communications patterns. Its solution quality is better than that produced by the other mixed heuristic and as good as (and sometimes better than) that produced by Simulated Annealing.

## Keywords

Mapping task assignment static load balancing distributed-memory parallel machines program graphs heuristics## Preview

Unable to display preview. Download preview PDF.

## References

- [1]M. G. Norman & P. Thanish, “Models of machines and computation for mapping in multicomputer”. ACM Computer Surveys, 25 (3) (1993), pp. 263–302.Google Scholar
- [2]O. Krämer and H. Mühlenbein, “Mapping strategies in message-based multiprocessor systems”, Parallel Computing, Vol. 9 (1988/89), pp. 213–225.Google Scholar
- [3]H.A-Choi and B. Narahari, “Efficient algorithms for mapping and partitioning a class of parallel computations”, J. Parallel Distributed Computing. 19(4) (1993), pp. 349–363.Google Scholar
- [4]C. Lee and L. Bic, “Comparing quenching and slow simulated annealing on the mapping problem”, Proc. 3rd. Annual Parallel Proc. Symp. (1989), pp. 671–685.Google Scholar
- [5]M.R. Garey and D.S. Johnson, “Computer and Intractability — A Guide to the Theory of NP-Completeness”, (Freeman, New York, 1979).Google Scholar
- [6]P. Bouvry, J. Chassin de Kergommeaux & D. Trystram, “Efficient solutions for mapping parallel programs”. Proc. of EuroPar'95. Springer-Verlag, August (1995), pp. 379–390.Google Scholar
- [7]V. Chaudhary and J. Aggarwal, “A generalized scheme for mapping parallel algorithms”, IEEE Trans. on Parall. and Distrib. Systems, Vol. 4, no. 3 (1993), pp. 328–346.Google Scholar
- [8]V. M. Lo, “Algorithms for static assignment and symmetric contraction in distributed computing systems”, Proc IEEE Int. Conf. on Parallel Proc., (1988), pp. 239–244.Google Scholar
- [9]P. Sadayappan, F.Ercal and J. Ramanujam, “Cluster partitioning approaches to mapping parallel problems on to a hypercube”, Parallel Computing, 13 (1) (1990), pp. 1–6.Google Scholar
- [10]P. Sadayappan and F. Ercal, “Nearest-neighbor mapping of finite element graphs onto processor meshes”, IEEE Trans. Comput, Vol 36, no. 12 (1990), pp.1408–1424.Google Scholar
- [11]S. Selvakumar & C. Silva Ram Murthy, “An efficient heuristic algorithm for mapping parallel programs onto multicomputers”. Microprocessing and Microprogramming 36 (1992/1993) pp 83–92.Google Scholar
- [12]Shen Shen Wu & D. Sweeting, “Heuristic algorithms for task assignment and scheduling in a processor network”. Parallel Computing 20 (1994), pp. 1–14.Google Scholar
- [13]T. Agerwala et alter, “SP2 system architecture”, IBM Systems Journal, Vol. 34, No. 2 (1995), pp. 152–184.Google Scholar
- [14]M. D. May, P. W. Thompson & P. H. Welch, “Networks, Routers & Transputers”, IOS Press (1993).Google Scholar
- [15]A. Gerasoulis and T. Yang, “A comparison of heuristics for scheduling directed acyclic graphs on multiprocessors”, J. of Par. and Distrib. Comput., Vol. 16 (1992), pp. 276–291.Google Scholar
- [16]M. A. Senar, A. Cortés, A. Ripoll and E. Luque, “A clustering-reassigning strategy for mapping parallel programs”, Proc. of 8th IASTED Int. Conf. on Par. and Distrib. Computing and Systems, Acta Press, Anaheim, (1996), pp. 166–170.Google Scholar
- [17]M. A. Senar, A. Ripoll, A. Cortés and E. Luque, “An efficient clustering-based approach for mapping parallel programs”, Proc. of 5th Euromicro Workshop on Parallel and Distributed Processing, London, January, (1997), pp. 407–412.Google Scholar
- [18]R.Graham, “Bounds on multiprocessing timing anomalies”, SIAM, J. Appl. Math., Vol. 17, No. 2, (1969), pp. 416–429.Google Scholar
- [19]E. Luque et alter, “PSEE: a tool for parallel systems learning”, Computers and Artificial Intelligence, Vol. 14, No. 1 (1996), pp. 319–339.Google Scholar