The Journal of Supercomputing

, Volume 64, Issue 1, pp 100–109 | Cite as

Performance evaluation of sparse matrix products in UPC

  • Jorge González-Domínguez
  • Óscar García-López
  • Guillermo L. Taboada
  • María J. Martín
  • Juan Touriño


Unified Parallel C (UPC) is a Partitioned Global Address Space (PGAS) language whose popularity has increased during the last years owing to its high programmability and reasonable performance through an efficient exploitation of data locality, especially on hierarchical architectures like multicore clusters. However, the performance issues that arise in this language due to the irregular structure of sparse matrix operations have not yet been studied. Among them, the selection of an adequate storage format for the sparse matrices can significantly improve the efficiency of the parallel codes. This paper presents an evaluation, using UPC, of the most common sparse storage formats with different implementations of the matrix-vector and matrix-matrix products, which are key kernels in many scientific applications.


PGAS UPC Sparse products Performance evaluation 



This work was funded by Hewlett-Packard (Project “Improving UPC Usability and Performance in Constellation Systems: Implementation/Extensions of UPC Libraries”), the Ministry of Science and Innovation of Spain (Project TIN2010-16735), the Ministry of Education (FPU Grant AP2008-01578), and the Spanish network CAPAP-H3 (Project TIN2010-12011-E). We gratefully thank CESGA (Galicia Supercomputing Center) for providing access to the Finis Terrae supercomputer.


  1. 1.
    Barton C, Casçaval C, Almási G, Zheng Y, Farreras M, Chatterjee S, Amaral JN (2006) Shared memory programming for large scale machines. In: Proc ACM SIGPLAN conf on programming language design and implementation (PLDI’06), Ottawa, Canada, pp 108–117 Google Scholar
  2. 2.
    Bell C, Nishtala R (2004) UPC implementation of the sparse triangular solve and NAS FT. Last visit: April 2012.
  3. 3.
    Bell C, Bonachea D, Nishtala R, Yelick K (2006) Optimizing bandwidth limited problems using one-sided communication and overlap. In: Proc 20th intl parallel and distributed processing symp (IPDPS’06), Rhodes Island, Greece Google Scholar
  4. 4.
    Buluç A, Gilbert JR (2008) Challenges and advances in parallel sparse matrix-matrix multiplication. In: Proc 37th intl conf on parallel processing (ICPP’08), Portland, OR, USA, pp 503–510 Google Scholar
  5. 5.
    Dongarra J (2000) Templates for the solution of algebraic eigenvalue problems: a practical guide. SIAM, Philadelphia, Chap 10 MATHGoogle Scholar
  6. 6.
    El-Ghazawi T, Cantonnet F (2002) UPC performance and potential: a NPB experimental study. In: Proc 15th ACM/IEEE conf on supercomputing (SC’02), Baltimore, MD, USA Google Scholar
  7. 7.
    González-Domínguez J, Martin MJ, Taboada GL, Touriño J, Doallo R, Mallón DA, Wibecan B (2012) UPCBLAS: a library for parallel matrix computations in unified parallel C. Concurr Comput Pract Exp. Available online. doi: 10.1002/cpe.1914
  8. 8.
    Hugues MR, Petiton SG (2010) Sparse matrix formats evaluation and optimization on a GPU. In: Proc 12th IEEE intl conf on high performance computing and communications (HPCC’10), Melbourne, Australia, pp 122–129 CrossRefGoogle Scholar
  9. 9.
    Jiogo CD, Manneback P, Kuonen P (2006) Well balanced sparse matrix-vector multiplication on a parallel heterogeneous system. In: Proc. 8th IEEE intl conf on cluster computing (CLUSTER’06), Barcelona, Spain Google Scholar
  10. 10.
    Liu S, Zhang Y, Sun X, Qiu R (2009) Performance evaluation of multithreaded sparse matrix-vector multiplication using OpenMP. In: Proc 11th IEEE intl conf on high performance computing and communications (HPCC’09), Seoul, Korea, pp 659–665 CrossRefGoogle Scholar
  11. 11.
    Luján M, Usman A, Freeman TL, Gurd JR (2005) Storage formats for sparse matrices in Java. In: Proc 5th intl conf on computational science (ICCS’05), Atlanta, GA, USA, pp 364–371 Google Scholar
  12. 12.
    Mallón DA, Taboada GL, Teijeiro C, Touriño J, Fraguela BB, Gómez A, Doallo R, Mouriño JC (2009) Performance evaluation of MPI, UPC and OpenMP on multicore architectures. In: Proc 16th European PVM/MPI users’ group meeting (EuroPVM/MPI’09), Espoo, Finland, pp 174–184 Google Scholar
  13. 13.
    Nishtala R, Hargrove PH, Bonachea D, Yelick K (2009) Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap. In: Proc 23rd intl parallel and distributed processing symp (IPDPS’09), Rome, Italy, 2009 Google Scholar
  14. 14.
    Shahnaz R, Usman A, Chughtai IR (2006) Implementation and evaluation of parallel sparse matrix-vector products on distributed memory parallel computers. In: Proc 8th IEEE intl conf on cluster computing (CLUSTER’06), Barcelona, Spain Google Scholar
  15. 15.
    Shan H, Blagojević F, Min SJ, Hargrove P, Jin H, Fuerlinger K, Koniges A, Wright NJ (2010) A programming model performance study using the NAS parallel benchmarks. Sci Program 18(3–4):153–167 Google Scholar
  16. 16.
    Shan H, Wright N, Shalf J, Yelick K, Wagner M, Wichmann N (2011) A preliminary evaluation of the hardware acceleration of the cray gemini interconnect for PGAS languages and comparison with MPI. In: Proc 2nd intl workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS’11), Seattle, WA, USA, pp 13–14 CrossRefGoogle Scholar
  17. 17.
    Space Basic Linear Algebra Subprograms (SparseBLAS) Library (2012) Last visit: April 2012.
  18. 18.
    The University of Florida Sparse Matrix Collection (2012) Last visit: April 2012.
  19. 19.
    Usman A, Luján M, Freeman L, Gurd JR (2006) Performance evaluation of storage formats for sparse matrices in Fortran. In: Proc 8th IEEE intl conf on high performance computing and communications (HPCC’06), Munich, Germany, pp 160–169 CrossRefGoogle Scholar
  20. 20.
    Williams S, Oliker L, Vuduc W, Shalf J, Yelick K, Demmel J (2007) Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: Proc 20th ACM/IEEE conf on supercomputing (SC’07), Reno, NV, USA Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Jorge González-Domínguez
    • 1
  • Óscar García-López
    • 1
  • Guillermo L. Taboada
    • 1
  • María J. Martín
    • 1
  • Juan Touriño
    • 1
  1. 1.Computer Architecture GroupUniversity of A CoruñaA CoruñaSpain

Personalised recommendations