Multilayer Approach for Joint Direct and Transposed Sparse Matrix Vector Multiplication for Multithreaded CPUs

  • Ivan ŠimečekEmail author
  • Daniel Langr
  • Ivan Kotenkov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10777)


One of the most common operations executed on modern high-performance computing systems is multiplication of a sparse matrix by a dense vector within a shared-memory computational node. Strongly related but far less studied problem is joint direct and transposed sparse matrix-vector multiplication, which is widely needed by certain types of iterative solvers. We propose a multilayer approach for joint sparse multiplication that balances the workload of threads. Measurements prove that our algorithm is scalable and achieve high computational performance for multiple benchmark matrices that arise from various scientific and engineering disciplines.


Sparse matrix-vector multiplication Multithreaded execution OpenMP Joint direct and transposed multiplication Scalability 



This research has been supported by CTU internal grant SGS17/215/OHK3/3T/18.


  1. 1.
    Aktulga, H.M., Buluç, A., Williams, S., Yang, C.: Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations. In: Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS 2014, pp. 1213–1222. IEEE Computer Society, Washington (2014).
  2. 2.
    Axelsson, O.: Iterative Solution Methods. Cambridge University Press, Cambridge (1994)CrossRefzbMATHGoogle Scholar
  3. 3.
    Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., der Vorst, H.V.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM, Philadelphia (1994)CrossRefzbMATHGoogle Scholar
  4. 4.
    Cotofana, M.S., Cotofana, S., Stathis, P., Vassiliadis, S.: Direct and transposed sparse matrix-vector. In: Proceedings of the 2002 Euromicro Conference on Massively-Parallel Computing Systems, MPCS-2002, pp. 1–9 (2002)Google Scholar
  5. 5.
    Davis, T.A., Hu, Y.F.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Intel® company: Intel® Math Kernel Library., Accessed 13 Aug 2017
  7. 7.
    Karsavuran, M.O., Akbudak, K., Aykanat, C.: Locality-aware parallel sparse matrix-vector and matrix-transpose-vector multiplication on many-core processors. IEEE Trans. Parallel Distrib. Syst. 27(6), 1713–1726 (2016)CrossRefGoogle Scholar
  8. 8.
    Langr, D., Šimeček, I., Tvrdík, P.: Storing sparse matrices in the adaptive-blocking hierarchical storage format. In: Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS 2013), pp. 479–486. IEEE Xplore Digital Library, September 2013Google Scholar
  9. 9.
    Langr, D., Šimeček, I., Tvrdík, P., Dytrych, T., Draayer, J.P.: Adaptive-blocking hierarchical storage format for sparse matrices. In: Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS 2012), pp. 545–551. IEEE Xplore Digital Library (2012)Google Scholar
  10. 10.
    Langr, D., Tvrdík, P.: Evaluation criteria for sparse matrix storage formats. IEEE Trans. Parallel Distrib. Syst. 27(2), 428–440 (2016)CrossRefGoogle Scholar
  11. 11.
    Leavitt, N.: Big iron moves toward exascale computing. Computer 45(11), 14–17 (2012)CrossRefGoogle Scholar
  12. 12.
    Liu, W., Vinter, B.: CSR5: an efficient storage format for cross-platform sparse matrix-vector multiplication. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS 2015, pp. 339–350. ACM, New York (2015).
  13. 13.
    Martone, M., Filippone, S., Paprzycki, M., Tucci, S.: On blas operations with recursively stored sparse matrices. In: 2010 12th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 49–56, September 2010Google Scholar
  14. 14.
    Martone, M.: Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the recursive sparse blocks format. Parallel Comput. 40(7), 251–270 (2014). MathSciNetCrossRefGoogle Scholar
  15. 15.
    Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing. IBM Ltd. (1966)Google Scholar
  16. 16.
    Nair, R.: Exascale computing. In: Padua, D. (ed.) Encycl. Parallel Comput., pp. 638–644. Springer, New York (2011). Google Scholar
  17. 17.
    Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2003)CrossRefzbMATHGoogle Scholar
  18. 18.
    Šimeček, I., Langr, D.: Space and execution efficient formats for modern processor architectures. In: Proceedings of the 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2015), pp. 98–105. IEEE Computer Society (2015)Google Scholar
  19. 19.
    Tao, Y., Deng, Y., Mu, S., Zhang, Z., Zhu, M., Xiao, L., Ruan, L.: GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication. Concurr. Comput. Pract. Exp. 27(14), 3771–3789 (2015)CrossRefGoogle Scholar
  20. 20.
    Tvrdík, P., Šimeček, I.: A new diagonal blocking format and model of cache behavior for sparse matrices. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds.) PPAM 2005. LNCS, vol. 3911, pp. 164–171. Springer, Heidelberg (2006). CrossRefGoogle Scholar
  21. 21.
    Šimeček, I., Tvrdík, P.: Sparse matrix-vector multiplication - final solution? In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2007. LNCS, vol. 4967, pp. 156–165. Springer, Heidelberg (2008). CrossRefGoogle Scholar
  22. 22.
    Šimeček, I., Langr, D., Kotenkov, I.: Multilayer approach for joint direct and transposed sparse matrix vector multiplication for multithreaded CPUs (2017). Accessed 13 Aug 2017
  23. 23.
    Yzelman, A.J., Roose, D.: High-level strategies for parallel shared-memory sparse matrix-vector multiplication. IEEE Trans. Parallel Distrib. Syst. 25(1), 116–125 (2014)CrossRefGoogle Scholar
  24. 24.
    Yzelman, A.J., Bisseling, R.H.: Cache-oblivious sparse matrix-vector multiplication by using sparse matrix partitioning methods. SIAM J. Sci. Comput. 31(4), 3128–3154 (2009). MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Systems, Faculty of Information TechnologyCzech Technical University in PraguePragueCzech Republic

Personalised recommendations