Sparse Matrix-Vector Product

  • Zbigniew KozaEmail author
  • Maciej Matyka
  • Łukasz Mirosław
  • Jakub Poła


The sparse matrix-vector (SpMV) multiplication is one of the key kernels in scientific computing. We present the foundations of its implementation on CUDA- and OpenCL-enabled devices. After introducing the subject, we briefly present three most popular formats: COO, CRS and ELL. They serve as exemplary data structures on which we discuss hardware-related issues associated with efficient SpMV kernel design, such as matrix size, ordering of data, memory boundedness, storage overhead, thread divergence, and coalescence of memory transfers. Next, we present three widely available libraries with stable and validated SpMV kernels: cuSPARSE, CUSP and Paralution. We present and discuss complete codes of several SpMV kernels for both basic SpMV formats and some of its derivatives, including CMRS, and briefly discuss the principles beyond other popular format extensions.


Sparse Matrix Sparse Matrice Column Index Storage Overhead Warp Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC’09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–11. ACM, New York (2009). doi:
  2. 2.
    Choi, J.W., Singh, A., Vuduc, R.W.: Model-driven autotuning of sparse matrix-vector multiply on GPUs. SIGPLAN Not. 45(5), 115–126 (2010).
  3. 3.
    Dziekonski, A., Lamecki, A., Mrozowski, M.: A memory efficient and fast sparse matrix vector product on a GPU. Prog. Electromagn. Res. 116, 49–63 (2011).
  4. 4.
    Feng, X., Jin, H., Zheng, R., Hu, K., Zeng, J., Shao, Z.: Optimization of sparse matrix-vector multiplication with variant CSR on GPUs. In: 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS), pp. 165–172. IEEE, Tainan (2011)Google Scholar
  5. 5.
    Koza, Z., Matyka, M., Szkoda, S., Mirosław, Ł.: Compressed multirow storage format for sparse matrices on graphics processing units. SIAM J. Sci. Comput. 36(2), 219–239 (2014).
  6. 6.
    Monakov, A., Lokhmotov, A., Avetisyan, A.: Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: Patt, Y., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds.) High Performance Embedded Architectures and Compilers. Lecture Notes in Computer Science, vol. 5952, pp. 111–125. Springer, Heidelberg (2010).
  7. 7.
    Mukunoki, D., Takahashi, D.: Optimization of sparse matrix-vector multiplication for CRS format on NVIDIA Kepler architecture GPUs. In: Murgante, B., Misra, S., Carlini, M., Torre, C.M., Nguyen, H.Q., Taniar, D., Apduhan, B.O., Gervasi, O. (eds.) Computational Science and Its Applications – ICCSA 2013. Lecture Notes in Computer Science, vol. 7975, pp. 211–223. Springer, Heidelberg (2013).
  8. 8.
    Su, B.Y., Keutzer, K.: clSpMV: a cross-platform opencl spmv framework on GPUs. In: Proceedings of the International Conference on Supercomputing, ICS ’12 (2012)Google Scholar
  9. 9.
    Vázquez, F., Garzón, E.M., Martınez, J.A., Fernández, J.: Accelerating sparse matrix vector product with GPUs. In: Proceedings of the International Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE 2009), pp. 1081–1092. CMMSE, Gijón (2009)Google Scholar
  10. 10.
    Vázquez, F., Fernández, J.J., Garzón, E.M.: Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach. Parallel Comput. 38(8), 408–420 (2012). doi:10.1016/j.parco.2011.08.003.
  11. 11.
    Vuduc, R.W.: Automatic performance tuning of sparse matrix kernels. Ph.D. thesis, University of California (2003)Google Scholar
  12. 12.
    Yang, X., Parthasarathy, S., Sadayappan, P.: Fast sparse matrix-vector multiplication on GPUs: implications for graph mining. Proc. VLDB Endowment 4(4), 231–242 (2011)CrossRefGoogle Scholar
  13. 13.
    Yoshizawa, H., Takahashi, D.: Automatic tuning of sparse matrix-vector multiplication for CRS format on GPUs. In: 2012 IEEE 15th International Conference on Computational Science and Engineering, pp. 130–136 (2012). doi:

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Zbigniew Koza
    • 1
    Email author
  • Maciej Matyka
    • 1
  • Łukasz Mirosław
    • 2
  • Jakub Poła
    • 3
  1. 1.Faculty of Physics & AstronomyUniversity of WrocławWrocławPoland
  2. 2.Institute of InformaticsWrocław University of TechnologyWrocławPoland
  3. 3.Institute of PhysicsUniversity of SilesiaKatowicePoland

Personalised recommendations