The Journal of Supercomputing

, Volume 58, Issue 2, pp 195–205 | Cite as

Analyzing the execution of sparse matrix-vector product on the Finisterrae SMP-NUMA system

  • Juan C. Pichel
  • Juan A. Lorenzo
  • Dora B. Heras
  • Jose C. Cabaleiro
  • Tomás F. Pena


In this paper, the sparse matrix-vector product (SpMV) is evaluated on the FinisTerrae SMP-NUMA supercomputer. Its architecture particularities make the tuning of SpMV especially relevant due to the significant impact on the performance. First, we have estimated the influence of data and thread allocation. Moreover, because of the indirect and irregular memory access patterns of SpMV, we have also studied the influence of the memory hierarchy in the performance. According to the behavior observed in the study, a set of optimizations specially tuned for FinisTerrae were successfully applied to SpMV. Noticeable improvements are obtained in comparison with the SpMV naïve implementation.


Sparse matrix NUMA Thread affinity Memory hierarchy 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Galicia Supercomputing Center (CESGA)
  2. 2.
    Klug Tobias JW, Ott M, Trinitis C (2008) Autopin—Automated optimization of thread-to-core pinning on multicore systems. Trans HiPEAC, 3(4) Google Scholar
  3. 3.
    Broquedis F et al (2009) Dynamic task and data placement over NUMA architectures: an OpenMP runtime perspective. In: 5th Int workshop on OpenMP. LNCS, vol 5568. Springer, Berlin, pp 79–92 Google Scholar
  4. 4.
    Kotakemori H et al (2005) Performance evaluation of parallel sparse matrix-vector products on SGI Altix3700. In: 1st Int workshop on OpenMP. LNCS, vol 4315. Springer, Berlin, pp 153–166 Google Scholar
  5. 5.
    Williams S et al (2007) Optimization of sparse matrix-vector multiply on emerging multicore platforms. In: Proc of supercomputing (SC) Google Scholar
  6. 6.
    Goumas G et al (2008) Understanding the performance of sparse matrix-vector multiplication. In: Euromicro conf on parallel, distributed and network-based processing, pp 283–292 Google Scholar
  7. 7.
    Hewllet-Packard Company. HP integrity rx7640 server quick specs Google Scholar
  8. 8.
    Saad Y (2003) Iterative methods for sparse linear systems. SIAM, New York zbMATHCrossRefGoogle Scholar
  9. 9.
    Davis T (1997) University of Florida Sparse Matrix Collection. NA Digest, 97(23), June 1997.
  10. 10.
    Pichel JC, Singh DE, Carretero J (2008) Reordering algorithms for increasing locality on multicore processors. In: 10th IEEE int conf on high performance computing and communications, pp 123–130 Google Scholar
  11. 11.
    Alam SR et al (2008) An evaluation of the Oak Ridge National Laboratory Cray XT3. Int J High Perform Comput Appl 22(1):52–80 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Juan C. Pichel
    • 1
  • Juan A. Lorenzo
    • 2
  • Dora B. Heras
    • 2
  • Jose C. Cabaleiro
    • 2
  • Tomás F. Pena
    • 2
  1. 1.Galicia Supercomputing Center (CESGA)Santiago de CompostelaSpain
  2. 2.Electronics and Computer Science Dpt.Univ. of Santiago de CompostelaSantiago de CompostelaSpain

Personalised recommendations