Advertisement

Improving the vector performance via algorithmic domain decomposition

  • Helmut Weberpals
Efficient Use Of Vector Processors
Part of the Lecture Notes in Computer Science book series (LNCS, volume 457)

Abstract

To use the full potential of a local memory vector computer, algorithms have to comply with the memory hierarchy. Using the IBM 3090 as a paradigm we give a fairly complete account of its cache storage which turns out to play a crucial rôle in vector processing. On the basis of these results we are able to improve the vector performance of algorithms by decomposing the data domain.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Agarwal, J. Hennessy and M. Horowitz: Cache performance of operating system and multiprogramming workloads. ACM Transact. Computer Systems 6 (1988) 393–431.CrossRefGoogle Scholar
  2. M. Bessenrodt-Weberpals and H. Weberpals: A fast vector algorithm for solving tridiagonal linear equations. Parallel Computing 9 (1988/89) 367–372.CrossRefGoogle Scholar
  3. W. Buchholz: The IBM System/370 vector architecture. IBM Systems J. 25 (1986) 51–62.Google Scholar
  4. O. Buneman: A compact non-iterative Poisson solver. Report 294, Stanford Univ. Inst. Plasma Research (1969).Google Scholar
  5. R. S. Clark and T. L. Wilson: Vector system performance of the IBM 3090. IBM Systems J. 25 (1986) 63–82.Google Scholar
  6. M. D. Hill and A. J. Smith: Evaluating associativity in CPU caches. IEEE Transact. Computers 38 (1989) 1612–1630.CrossRefGoogle Scholar
  7. K. Hwang and F. A. Briggs: Computer architecture and parallel processing. McGraw-Hill, New York (1984).Google Scholar
  8. B. Liu and N. Strother: Programming in VS FORTRAN on the IBM 3090 for maximum vector performance. IEEE Computer 21 (1988) 65–76.Google Scholar
  9. A. Padegs, B. B. Moore, R. M. Smith, and W. Buchholz: The IBM System/370 vector architecture: Design considerations. IEEE Transact. Computers 37 (1988) 509–520.CrossRefGoogle Scholar
  10. R. Reuter: Solving tridiagonal systems of linear equations on the IBM 3090 VF. Parallel Computing 8 (1988) 371–376.CrossRefGoogle Scholar
  11. K. So and R. N. Rechtschaffen: Cache operations by MRU change. IEEE Transact. Computers 37 (1988) 700–709.CrossRefGoogle Scholar
  12. H. S. Stone: High-performance computer architecture. Addison-Wesley, Reading (1987).Google Scholar
  13. K. Stüben and U. Trottenberg: Multigrid methods: Fundamental algorithms, model problem analysis and applications. In: W. Hackbusch and U. Trottenberg (eds.): Multigrid methods. Springer, Berlin (1982) pp. 1–176.Google Scholar
  14. S. G. Tucker: The IBM 3090 system: An overview. IBM Systems J. 25 (1986) 4–19.Google Scholar
  15. H. Weberpals: Architectural approach to the IBM 3090E vector performance. Parallel Computing 13 (1990) 47–59.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1990

Authors and Affiliations

  • Helmut Weberpals
    • 1
  1. 1.Gesellschaft für Wissenschaftliche Datenverarbeitung Göttingen and Institut für Numerische und Angewandte Mathematikder Universität GöttingenGöttingenGermany

Personalised recommendations