Performance evaluation of sparse matrix products in UPC

González-Domínguez, Jorge; García-López, Óscar; Taboada, Guillermo L.; Martín, María J.; Touriño, Juan

doi:10.1007/s11227-012-0796-4

Performance evaluation of sparse matrix products in UPC

Published: 09 June 2012

Volume 64, pages 100–109, (2013)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jorge González-Domínguez¹,
Óscar García-López¹,
Guillermo L. Taboada¹,
María J. Martín¹ &
…
Juan Touriño¹

197 Accesses
1 Citation
Explore all metrics

Abstract

Unified Parallel C (UPC) is a Partitioned Global Address Space (PGAS) language whose popularity has increased during the last years owing to its high programmability and reasonable performance through an efficient exploitation of data locality, especially on hierarchical architectures like multicore clusters. However, the performance issues that arise in this language due to the irregular structure of sparse matrix operations have not yet been studied. Among them, the selection of an adequate storage format for the sparse matrices can significantly improve the efficiency of the parallel codes. This paper presents an evaluation, using UPC, of the most common sparse storage formats with different implementations of the matrix-vector and matrix-matrix products, which are key kernels in many scientific applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A 2D algorithm with asymmetric workload for the UPC conjugate gradient method

Article 18 September 2014

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Design Principles for Sparse Matrix Multiplication on the GPU

References

Barton C, Casçaval C, Almási G, Zheng Y, Farreras M, Chatterjee S, Amaral JN (2006) Shared memory programming for large scale machines. In: Proc ACM SIGPLAN conf on programming language design and implementation (PLDI’06), Ottawa, Canada, pp 108–117
Google Scholar
Bell C, Nishtala R (2004) UPC implementation of the sparse triangular solve and NAS FT. Last visit: April 2012. http://www.cs.berkeley.edu/~rajeshn/pubs/bell_nishtala_spts_ft.pdf
Bell C, Bonachea D, Nishtala R, Yelick K (2006) Optimizing bandwidth limited problems using one-sided communication and overlap. In: Proc 20th intl parallel and distributed processing symp (IPDPS’06), Rhodes Island, Greece
Google Scholar
Buluç A, Gilbert JR (2008) Challenges and advances in parallel sparse matrix-matrix multiplication. In: Proc 37th intl conf on parallel processing (ICPP’08), Portland, OR, USA, pp 503–510
Google Scholar
Dongarra J (2000) Templates for the solution of algebraic eigenvalue problems: a practical guide. SIAM, Philadelphia, Chap 10
MATH Google Scholar
El-Ghazawi T, Cantonnet F (2002) UPC performance and potential: a NPB experimental study. In: Proc 15th ACM/IEEE conf on supercomputing (SC’02), Baltimore, MD, USA
Google Scholar
González-Domínguez J, Martin MJ, Taboada GL, Touriño J, Doallo R, Mallón DA, Wibecan B (2012) UPCBLAS: a library for parallel matrix computations in unified parallel C. Concurr Comput Pract Exp. Available online. doi:10.1002/cpe.1914
Hugues MR, Petiton SG (2010) Sparse matrix formats evaluation and optimization on a GPU. In: Proc 12th IEEE intl conf on high performance computing and communications (HPCC’10), Melbourne, Australia, pp 122–129
Chapter Google Scholar
Jiogo CD, Manneback P, Kuonen P (2006) Well balanced sparse matrix-vector multiplication on a parallel heterogeneous system. In: Proc. 8th IEEE intl conf on cluster computing (CLUSTER’06), Barcelona, Spain
Google Scholar
Liu S, Zhang Y, Sun X, Qiu R (2009) Performance evaluation of multithreaded sparse matrix-vector multiplication using OpenMP. In: Proc 11th IEEE intl conf on high performance computing and communications (HPCC’09), Seoul, Korea, pp 659–665
Chapter Google Scholar
Luján M, Usman A, Freeman TL, Gurd JR (2005) Storage formats for sparse matrices in Java. In: Proc 5th intl conf on computational science (ICCS’05), Atlanta, GA, USA, pp 364–371
Google Scholar
Mallón DA, Taboada GL, Teijeiro C, Touriño J, Fraguela BB, Gómez A, Doallo R, Mouriño JC (2009) Performance evaluation of MPI, UPC and OpenMP on multicore architectures. In: Proc 16th European PVM/MPI users’ group meeting (EuroPVM/MPI’09), Espoo, Finland, pp 174–184
Google Scholar
Nishtala R, Hargrove PH, Bonachea D, Yelick K (2009) Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap. In: Proc 23rd intl parallel and distributed processing symp (IPDPS’09), Rome, Italy, 2009
Google Scholar
Shahnaz R, Usman A, Chughtai IR (2006) Implementation and evaluation of parallel sparse matrix-vector products on distributed memory parallel computers. In: Proc 8th IEEE intl conf on cluster computing (CLUSTER’06), Barcelona, Spain
Google Scholar
Shan H, Blagojević F, Min SJ, Hargrove P, Jin H, Fuerlinger K, Koniges A, Wright NJ (2010) A programming model performance study using the NAS parallel benchmarks. Sci Program 18(3–4):153–167
Google Scholar
Shan H, Wright N, Shalf J, Yelick K, Wagner M, Wichmann N (2011) A preliminary evaluation of the hardware acceleration of the cray gemini interconnect for PGAS languages and comparison with MPI. In: Proc 2nd intl workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS’11), Seattle, WA, USA, pp 13–14
Chapter Google Scholar
Space Basic Linear Algebra Subprograms (SparseBLAS) Library (2012) Last visit: April 2012. http://math.nist.gov/spblas
The University of Florida Sparse Matrix Collection (2012) Last visit: April 2012. http://www.cise.ufl.edu/research/sparse/matrices/
Usman A, Luján M, Freeman L, Gurd JR (2006) Performance evaluation of storage formats for sparse matrices in Fortran. In: Proc 8th IEEE intl conf on high performance computing and communications (HPCC’06), Munich, Germany, pp 160–169
Chapter Google Scholar
Williams S, Oliker L, Vuduc W, Shalf J, Yelick K, Demmel J (2007) Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: Proc 20th ACM/IEEE conf on supercomputing (SC’07), Reno, NV, USA
Google Scholar

Download references

Acknowledgements

This work was funded by Hewlett-Packard (Project “Improving UPC Usability and Performance in Constellation Systems: Implementation/Extensions of UPC Libraries”), the Ministry of Science and Innovation of Spain (Project TIN2010-16735), the Ministry of Education (FPU Grant AP2008-01578), and the Spanish network CAPAP-H3 (Project TIN2010-12011-E). We gratefully thank CESGA (Galicia Supercomputing Center) for providing access to the Finis Terrae supercomputer.

Author information

Authors and Affiliations

Computer Architecture Group, University of A Coruña, A Coruña, Spain
Jorge González-Domínguez, Óscar García-López, Guillermo L. Taboada, María J. Martín & Juan Touriño

Authors

Jorge González-Domínguez
View author publications
You can also search for this author in PubMed Google Scholar
Óscar García-López
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo L. Taboada
View author publications
You can also search for this author in PubMed Google Scholar
María J. Martín
View author publications
You can also search for this author in PubMed Google Scholar
Juan Touriño
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jorge González-Domínguez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

González-Domínguez, J., García-López, Ó., Taboada, G.L. et al. Performance evaluation of sparse matrix products in UPC. J Supercomput 64, 100–109 (2013). https://doi.org/10.1007/s11227-012-0796-4

Download citation

Published: 09 June 2012
Issue Date: April 2013
DOI: https://doi.org/10.1007/s11227-012-0796-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance evaluation of sparse matrix products in UPC

Abstract

Access this article

Similar content being viewed by others

A 2D algorithm with asymmetric workload for the UPC conjugate gradient method

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Design Principles for Sparse Matrix Multiplication on the GPU

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance evaluation of sparse matrix products in UPC

Abstract

Access this article

Similar content being viewed by others

A 2D algorithm with asymmetric workload for the UPC conjugate gradient method

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Design Principles for Sparse Matrix Multiplication on the GPU

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation