Balanced CSR Sparse Matrix-Vector Product on Graphics Processors

  • Goran FlegarEmail author
  • Enrique S. Quintana-Ortí
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10417)


We propose a novel parallel approach to compute the sparse matrix-vector product (SpMV) on graphics processing units (GPUs), optimized for matrices with an irregular row distribution of the non-zero entries. Our algorithm relies on the standard CSR format to store the sparse matrix, requires an inexpensive pre-processing step, and consumes only a minor amount of additional memory compared with significantly more expensive GPU-specific sparse matrix layouts. In addition, we propose a simple heuristic to determine whether our method or the standard CSR SpMV algorithm should be used for a specific matrix. As a result, our proposal, combined with the standard CSR SpMV, can be adopted as the default choice for the implementation of SpMV in general-purpose sparse linear algebra libraries for GPUs.


Sparse matrix-vector product Sparse matrix data layouts Sparse linear algebra Performance GPUs 



This work was supported by the CICYT project TIN2014-53495-R of the MINECO and FEDER and the EU H2020 project 732631 “OPRECOMP. Open Transprecision Computing”.


  1. 1.
    Anzt, H., Tomov, S., Dongarra, J.: Implementing a sparse matrix vector product for the SELL-C/SELL-C-\(\sigma \) formats on NVIDIA GPUs. Technical report, ut-eecs-14-727, University of Tennessee (2014)Google Scholar
  2. 2.
    Buluç, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In: Proceedings of the 21st Annual Symposium on Parallelism in Algorithms and Architectures, SPAA 2009, pp. 233–244. ACM (2009)Google Scholar
  3. 3.
    Buono, D., Gunnels, J.A., Que, X., Checconi, F., Petrini, F., Tuan, T.C., Long, C.: Optimizing sparse linear algebra for large-scale graph analytics. Computer 48(8), 26–34 (2015)CrossRefGoogle Scholar
  4. 4.
    Davis, T.A.: Direct Methods for Sparse Linear Systems. SIAM, Philadelphia (2006)CrossRefzbMATHGoogle Scholar
  5. 5.
    Kreutzer, M., Hager, G., Wellein, G., Fehske, H., Bishop, A.R.: A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM J. Sci. Comput. 36(5), C401–C423 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Langville, A.N., Meyer, C.D.: Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton (2011)zbMATHGoogle Scholar
  7. 7.
    Liu, W., Vinter, B.: CSR5: an efficient storage format for cross-platform sparse matrix-vector multiplication. In: Proceedings of the 29th ACM on International Conference on Supercomputing, pp. 339–350. ACM (2015)Google Scholar
  8. 8.
    Nathan, B., Michael, G.: Efficient sparse matrix-vector multiplication on CUDA. Technical report, NVIDIA Technical Report NVR-2008-004 (2008)Google Scholar
  9. 9.
    NVIDIA. cuSPARSE library (2017).
  10. 10.
    Saad, Y.: Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia (2003)CrossRefzbMATHGoogle Scholar
  11. 11.
    Vázquez, F., Fernández, J.J., Garzón, E.M.: A new approach for sparse matrix vector product on NVIDIA GPUs. Concur. Comput.: Pract. Exp. 23(8), 815–826 (2011)CrossRefGoogle Scholar
  12. 12.
    Yan, S., Li, C., Zhang, Y., Zhou, H.: yaSpMV: yet another SPMV framework on GPUs. In: ACM SIGPLAN Notices, vol. 49, pp. 107–118. ACM (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Departamento de Ingeniería y Ciencia de ComputadoresUniversidad Jaume ICastellónSpain

Personalised recommendations