Fast Sparse Matrix-Vector Multiplication for TeraFlop/s Computers

  • Gerhard Wellein
  • Georg Hager
  • Achim Basermann
  • Holger Fehske
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2565)


Eigenvalue problems involving very large sparse matrices are common to various fields in science. In general, the numerical core of iterative eigenvalue algorithms is a matrix-vector multiplication (MVM) involving the large sparse matrix. We present three different programming approaches for parallel MVM on present day supercomputers. In addition to a pure message-passing approach, two hybrid parallel implementations are introduced based on simultaneous use of message-passing and shared-memory programming models. For a modern SMP cluster (HITACHI SR8000) performance and scalability of the hybrid implementations are discussed and compared with the pure message-passing approach on massively-parallel systems (CRAY T3E), vector computers (NEC SX5e) and distributed shared-memory systems (SGI Origin3800).


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    S.W. Bova et al., The International Journal of High Performance Computing Applications, 14, pp. 49–60, 2000. 287CrossRefGoogle Scholar
  2. [2]
    L.A. Smith and P. Kent, Proceedings of the First European Workshop on OpenMP, Lund, Sweden, Sept. 1999, pp. 6–9. 287Google Scholar
  3. [3]
    D. S. Henty, Performance of Hybrid Message-Passing and Shared-Memory Parallelism for Discrete Element Modelling. In Proceedings of SC2000, 2000. 287Google Scholar
  4. [4]
    H. Shan et al., A Comparison of Three Programming Models for Adaptive Applications on the Origin2000. In Proceedings of SC2000, 2000. 287Google Scholar
  5. [5]
    W.D. Gropp et al., Performance Modeling and Tuning of an Unstructured Mesh CF Application. In Proceedings of SC2000, 2000. 287Google Scholar
  6. [6]
    R. Rabenseifner, Communication Bandwidth of Parallel Programming Models on Hybrid Architectures. To be published in the proceedings of WOMPEI 2002, Kansai Science City, Japan. LNCS 2327. 287Google Scholar
  7. [8]
    J. Dongarra et al., Iterative Solver Benchmark, available at 287, 289
  8. [9]
    R. Barrett et al., Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, Philadelphia (1994). 288Google Scholar
  9. [10]
    M. Kinateder et al., E. Krause and W. Jäger, eds.: High Performance Computing in Science and Engineering 2000, Springer, Berlin (2001), pp. 188–204. 288, 289Google Scholar
  10. [11]
    W. Schönauer, Architecture and Use of Shared and Distributed Memory Parallel Computers, eds.: W. Schönauer, ISBN 3-00-005484-7. 296Google Scholar
  11. [12]
    P.W. Anderson, Phys. Rev. B 109, 1492 (1958). 293CrossRefGoogle Scholar
  12. [14]
    G. Wellein et al., Exact Diagonalization of Large Sparse Matrices: A Challenge for Modern Supercomputers, In Proceedings of CUG SUMMIT 2001, CD-ROM. 297Google Scholar
  13. [15]
    M. Brehm, LRZ Munich, private communication. 297Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Gerhard Wellein
    • 1
  • Georg Hager
    • 1
  • Achim Basermann
    • 2
  • Holger Fehske
    • 3
  1. 1.Regionales Rechenzentrum ErlangenErlangenGermany
  2. 2.C&,C Research LaboratoriesNEC Europe LtdSankt AugustinGermany
  3. 3.Institut für PhysikUniversität GreifswaldGreifswaldGermany

Personalised recommendations