Overlapping communication and computation in hypercubes
This paper presents a method to derive efficient algorithms for hypercubes. The method exploits two features of the underlying hardware: a) the parallelism provided by the multiple communication links of each node and b) the possibility of overlapping computations and communications, which is a feature of machines supporting an asynchronous communication protocol. The method can be applied to a generic class of hypercube algorithms. Many examples of this class of algorithms are found in the literature for different problems. The paper shows the efficiency of the method using two of these problems as an example: FFT and Vector Add. The results show that the reduction in communication overhead is very significant in many cases and the algorithms produced by our method are always very close to the optimum in terms of execution time.
Unable to display preview. Download preview PDF.
- 1.Agarwal, R. C., Gustavson, F. G., Zubair, M.: An Efficient Algorithm for the 3-D FFT NAS Parallel Benchmark. Scalable High-Performance Computing Conf. (1994) 129–133Google Scholar
- 2.Aykanat, C., Dervis, A.: An Overlapped FFT Algorithm for Hypercube Multicomputer. ICPP (1991) III-316–III-317Google Scholar
- 5.Díaz de Cerio, L., González, A., Valero-García, M.: Communication Pipelining in Hypercubes (submitted for publishing)Google Scholar
- 6.Díaz de Cerio, L., Valero-García, M., González, A.: Overlapping Communication and Computation in Hypercubes. DAC/UPC Research Report No. RR-96/02 (1996)Google Scholar
- 7.Fox, G. et al.: Solving Problems on Concurrent Processors. Englewood Cliffs, N. J. Prentice-Hall (1988)Google Scholar
- 10.Lam, M.: Software Pipelining: An Effective Scheduling Technique for VLIW machines. Conf. on Programming Language Design and Implementation (1988) 318–328Google Scholar
- 12.Sahay, A.: Hiding Communication Costs in Bandwidth-Limited Parallel FFT Computation Report: UCB/CSD 93/722, University of California (1993)Google Scholar
- 13.Suarez A., Ojeda-Guerra, C.: Overlapping Computations and Communications in Tours Networks. 4th Euromicro Workshop on Parallel and Distributed Processing (1996) 163–169Google Scholar
- 14.Thomson Leighton, F.: Introduction to Parallel Algorithms and Architectures: Arrays, Trees and Hypercubes. Morgan Kaufmann Publishers (1992)Google Scholar