Improving the vector performance via algorithmic domain decomposition
To use the full potential of a local memory vector computer, algorithms have to comply with the memory hierarchy. Using the IBM 3090 as a paradigm we give a fairly complete account of its cache storage which turns out to play a crucial rôle in vector processing. On the basis of these results we are able to improve the vector performance of algorithms by decomposing the data domain.
Unable to display preview. Download preview PDF.
- W. Buchholz: The IBM System/370 vector architecture. IBM Systems J. 25 (1986) 51–62.Google Scholar
- O. Buneman: A compact non-iterative Poisson solver. Report 294, Stanford Univ. Inst. Plasma Research (1969).Google Scholar
- R. S. Clark and T. L. Wilson: Vector system performance of the IBM 3090. IBM Systems J. 25 (1986) 63–82.Google Scholar
- K. Hwang and F. A. Briggs: Computer architecture and parallel processing. McGraw-Hill, New York (1984).Google Scholar
- B. Liu and N. Strother: Programming in VS FORTRAN on the IBM 3090 for maximum vector performance. IEEE Computer 21 (1988) 65–76.Google Scholar
- H. S. Stone: High-performance computer architecture. Addison-Wesley, Reading (1987).Google Scholar
- K. Stüben and U. Trottenberg: Multigrid methods: Fundamental algorithms, model problem analysis and applications. In: W. Hackbusch and U. Trottenberg (eds.): Multigrid methods. Springer, Berlin (1982) pp. 1–176.Google Scholar
- S. G. Tucker: The IBM 3090 system: An overview. IBM Systems J. 25 (1986) 4–19.Google Scholar