Hardware mapping of a parallel algorithm for matrix-vector multiplication overlapping communications and computations
The parallelization of numerical algorithms is very important in scientific applications, but many points of this parallelization remain open today. Specifically, the overhead introduced by loading and unloading the data degrades the efficiency, and in a realistic approach should be taking into account for performance estimation. The authors of this paper present a way of overcoming the bottleneck of loading and unloading the data by overlapping computations and communications in a specific algorithm such as matrix-vector multiplication. Also, a way of mapping this algorithm in hardware is presented in order to demonstrate the parallelization methodology.
Unable to display preview. Download preview PDF.
- 1.Quinn M.J.: Parallel Computing. Theory and Practice. McGraw-Hill International Editions. (1994)Google Scholar
- 2.Banerjee U.: Dependence Analysis for Supercomputing. Kluwer Academic Publishers. (1988)Google Scholar
- 4.Golub G.H., Van Loan C.F.: Matrix Computations. Second edition. The Johns Hopkins University Press. (1989)Google Scholar
- 5.Moldovan D.I., Fortes J.A.B.: Partitioning and mapping algorithms into fixed systolic arrays. IEEE transactions on computers vol. C-35 no. 1. (1986)Google Scholar
- 6.Ojeda-Guerra C.N., Suárez A.: Solving Linear Systems of Equations Overlapping Computations and Communications in Torus Networks. Fifth Euromicro Workshop on Parallel and Distributed Processing. (1997) 453–460Google Scholar
- 7.Suárez A., Ojeda-Guerra C.N.: Overlapping Computations and Communications on Torus Networks. Fourth Euromicro Workshop on Parallel and Distributed Processing. (1996) 162–169Google Scholar
- 8.Trimberger S.N.: Field Programmable Gate Array Technology. Kluwer Academic Publishers. (1994)Google Scholar