Abstract
Unified Parallel C (UPC) is a Partitioned Global Address Space (PGAS) language whose popularity has increased during the last years owing to its high programmability and reasonable performance through an efficient exploitation of data locality, especially on hierarchical architectures like multicore clusters. However, the performance issues that arise in this language due to the irregular structure of sparse matrix operations have not yet been studied. Among them, the selection of an adequate storage format for the sparse matrices can significantly improve the efficiency of the parallel codes. This paper presents an evaluation, using UPC, of the most common sparse storage formats with different implementations of the matrix-vector and matrix-matrix products, which are key kernels in many scientific applications.
Similar content being viewed by others
References
Barton C, Casçaval C, Almási G, Zheng Y, Farreras M, Chatterjee S, Amaral JN (2006) Shared memory programming for large scale machines. In: Proc ACM SIGPLAN conf on programming language design and implementation (PLDI’06), Ottawa, Canada, pp 108–117
Bell C, Nishtala R (2004) UPC implementation of the sparse triangular solve and NAS FT. Last visit: April 2012. http://www.cs.berkeley.edu/~rajeshn/pubs/bell_nishtala_spts_ft.pdf
Bell C, Bonachea D, Nishtala R, Yelick K (2006) Optimizing bandwidth limited problems using one-sided communication and overlap. In: Proc 20th intl parallel and distributed processing symp (IPDPS’06), Rhodes Island, Greece
Buluç A, Gilbert JR (2008) Challenges and advances in parallel sparse matrix-matrix multiplication. In: Proc 37th intl conf on parallel processing (ICPP’08), Portland, OR, USA, pp 503–510
Dongarra J (2000) Templates for the solution of algebraic eigenvalue problems: a practical guide. SIAM, Philadelphia, Chap 10
El-Ghazawi T, Cantonnet F (2002) UPC performance and potential: a NPB experimental study. In: Proc 15th ACM/IEEE conf on supercomputing (SC’02), Baltimore, MD, USA
González-Domínguez J, Martin MJ, Taboada GL, Touriño J, Doallo R, Mallón DA, Wibecan B (2012) UPCBLAS: a library for parallel matrix computations in unified parallel C. Concurr Comput Pract Exp. Available online. doi:10.1002/cpe.1914
Hugues MR, Petiton SG (2010) Sparse matrix formats evaluation and optimization on a GPU. In: Proc 12th IEEE intl conf on high performance computing and communications (HPCC’10), Melbourne, Australia, pp 122–129
Jiogo CD, Manneback P, Kuonen P (2006) Well balanced sparse matrix-vector multiplication on a parallel heterogeneous system. In: Proc. 8th IEEE intl conf on cluster computing (CLUSTER’06), Barcelona, Spain
Liu S, Zhang Y, Sun X, Qiu R (2009) Performance evaluation of multithreaded sparse matrix-vector multiplication using OpenMP. In: Proc 11th IEEE intl conf on high performance computing and communications (HPCC’09), Seoul, Korea, pp 659–665
Luján M, Usman A, Freeman TL, Gurd JR (2005) Storage formats for sparse matrices in Java. In: Proc 5th intl conf on computational science (ICCS’05), Atlanta, GA, USA, pp 364–371
Mallón DA, Taboada GL, Teijeiro C, Touriño J, Fraguela BB, Gómez A, Doallo R, Mouriño JC (2009) Performance evaluation of MPI, UPC and OpenMP on multicore architectures. In: Proc 16th European PVM/MPI users’ group meeting (EuroPVM/MPI’09), Espoo, Finland, pp 174–184
Nishtala R, Hargrove PH, Bonachea D, Yelick K (2009) Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap. In: Proc 23rd intl parallel and distributed processing symp (IPDPS’09), Rome, Italy, 2009
Shahnaz R, Usman A, Chughtai IR (2006) Implementation and evaluation of parallel sparse matrix-vector products on distributed memory parallel computers. In: Proc 8th IEEE intl conf on cluster computing (CLUSTER’06), Barcelona, Spain
Shan H, Blagojević F, Min SJ, Hargrove P, Jin H, Fuerlinger K, Koniges A, Wright NJ (2010) A programming model performance study using the NAS parallel benchmarks. Sci Program 18(3–4):153–167
Shan H, Wright N, Shalf J, Yelick K, Wagner M, Wichmann N (2011) A preliminary evaluation of the hardware acceleration of the cray gemini interconnect for PGAS languages and comparison with MPI. In: Proc 2nd intl workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS’11), Seattle, WA, USA, pp 13–14
Space Basic Linear Algebra Subprograms (SparseBLAS) Library (2012) Last visit: April 2012. http://math.nist.gov/spblas
The University of Florida Sparse Matrix Collection (2012) Last visit: April 2012. http://www.cise.ufl.edu/research/sparse/matrices/
Usman A, Luján M, Freeman L, Gurd JR (2006) Performance evaluation of storage formats for sparse matrices in Fortran. In: Proc 8th IEEE intl conf on high performance computing and communications (HPCC’06), Munich, Germany, pp 160–169
Williams S, Oliker L, Vuduc W, Shalf J, Yelick K, Demmel J (2007) Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: Proc 20th ACM/IEEE conf on supercomputing (SC’07), Reno, NV, USA
Acknowledgements
This work was funded by Hewlett-Packard (Project “Improving UPC Usability and Performance in Constellation Systems: Implementation/Extensions of UPC Libraries”), the Ministry of Science and Innovation of Spain (Project TIN2010-16735), the Ministry of Education (FPU Grant AP2008-01578), and the Spanish network CAPAP-H3 (Project TIN2010-12011-E). We gratefully thank CESGA (Galicia Supercomputing Center) for providing access to the Finis Terrae supercomputer.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
González-Domínguez, J., García-López, Ó., Taboada, G.L. et al. Performance evaluation of sparse matrix products in UPC. J Supercomput 64, 100–109 (2013). https://doi.org/10.1007/s11227-012-0796-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-012-0796-4