Skip to main content
Log in

Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

We study the problem of sparse-matrix dense-vector multiplication (SpMV) in external memory. The task of SpMV is to compute y:=Ax, where A is a sparse N×N matrix and x is a vector. We express sparsity by a parameter k, and for each choice of k consider the class of matrices where the number of nonzero entries is kN, i.e., where the average number of nonzero entries per column is k.

We investigate what is the external worst-case complexity, i.e., the best possible upper bound on the number of I/Os, as a function of k, N and the parameters M (memory size) and B (track size) of the I/O-model. We determine this complexity up to a constant factor for all meaningful choices of these parameters, as long as kN 1−ε, where ε depends on the problem variant. Our model of computation for the lower bound is a combination of the I/O-models of Aggarwal and Vitter, and of Hong and Kung.

We study variants of the problem, differing in the memory layout of A. If A is stored in column major layout, we prove that SpMV has I/O complexity \(\Theta(\min\{\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{\max\{k,M\}}\},\,kN\})\) for kN 1−ε and any constant 0<ε<1. If the algorithm can choose the memory layout, the I/O complexity reduces to \(\Theta ({\min\{\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{kM}\},kN\}})\) for \(k\leq\sqrt[3]{N}\). In contrast, if the algorithm must be able to handle an arbitrary layout of the matrix, the I/O complexity is \(\Theta ({\min\{\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{M}\},kN\}})\) for kN/2.

In the cache oblivious setting we prove that with tall cache assumption MB 1+ε, the I/O complexity is \(\mathcal {O}({\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{\max\{k,M\}}\}})\) for A in column major layout.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)

    Article  MathSciNet  Google Scholar 

  2. Arge, L., Miltersen, P.B.: On showing lower bounds for external-memory computational geometry problems. In: Abello, J.M., Vitter, J.S. (eds.) External Memory Algorithms and Visualization. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 50, pp. 139–159. American Mathematical Society, Providence (1999)

    Google Scholar 

  3. Brodal, G.S., Fagerberg, R.: On the limits of cache-obliviousness. In: Proc. 35th Annual ACM Symposium on Theory of Computing (STOC), pp. 307–315. ACM, San Diego (2003)

    Google Scholar 

  4. Brodal, G.S., Fagerberg, R., Moruz, G.: Cache-aware and cache-oblivious adaptive sorting. In: Proc. 32nd International Colloquium on Automata, Languages, and Programming. Lecture Notes in Computer Science, vol. 3580, pp. 576–588. Springer, Berlin (2005)

    Chapter  Google Scholar 

  5. Cormen, T.H., Sundquist, T., Wisniewski, L.F.: Asymptotically tight bounds for performing BMMC permutations on parallel disk systems. SIAM J. Comput. 28(1), 105–136 (1999)

    Article  MathSciNet  Google Scholar 

  6. Demmel, J., Dongarra, J., Eijkhout, V., Fuentes, E., Antoine Petitet, R.V., Whaley, R.C., Yelick, K.: Self-adapting linear algebra algorithms and software. Proc. IEEE 93(2) (2005). Special Issue on Program Generation, Optimization, and Adaptation

  7. Filippone, S., Colajanni, M.: PSBLAS: a library for parallel linear algebra computation on sparse matrices. ACM Trans. Math. Softw. 26(4), 527–550 (2000)

    Article  Google Scholar 

  8. Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proc. 40th Annual Symposium on Foundations of Computer Science (FOCS), pp. 285–297. IEEE Computer Society, New York (1999)

    Google Scholar 

  9. Hong, J.-W., Kung, H.T.: I/O complexity: the red-blue pebble game. In: Proc. 13th Annual ACM Symposium on Theory of Computing (STOC), pp. 326–333. ACM, New York (1981)

    Google Scholar 

  10. Im, E.J.: Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, May 2000

  11. Raz, R.: Multi-linear formulas for permanent and determinant are of super-polynomial size. In: Proc. 36th Annual ACM Symposium on Theory of Computing (STOC), Chicago, IL, USA, pp. 633–641. ACM, New York (2004)

    Google Scholar 

  12. Remington, K., Pozo, R.: NIST sparse BLAS user’s guide. Technical report, National Institute of Standards and Technology, Gaithersburg, MD (1996)

  13. Saad, Y.: Sparsekit: a basic tool kit for sparse matrix computations. Technical report, Computer Science Department, University of Minnesota, June 1994

  14. Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 13(4), 354–356 (1969)

    Article  MATH  MathSciNet  Google Scholar 

  15. Toledo, S.: A survey of out-of-core algorithms in numerical linear algebra. In: Abello, J.M., Vitter, J.S. (eds.) External Memory Algorithms and Visualization. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 50, pp. 161–179. American Mathematical Society, Providence (1999)

    Google Scholar 

  16. Vitter, J.S.: External memory algorithms and data structures. In: Abello, J.M., Vitter, J.S. (eds.) External Memory Algorithms and Visualization. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 50 pp. 1–38. American Mathematical Society, Providence (1999)

    Google Scholar 

  17. Vudac, R., Demmel, J.W., Yelick, K.A.: The Optimized Sparse Kernel Interface (OSKI) library: user’s guide for version 1.0.1b. Berkeley Benchmarking and OPtimization (BeBOP) Group, 15 March 2006

  18. Vuduc, R.W.: Automatic performance tuning of sparse matrix kernels. PhD thesis, University of California, Berkeley, Fall 2003

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riko Jacob.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bender, M.A., Brodal, G.S., Fagerberg, R. et al. Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model. Theory Comput Syst 47, 934–962 (2010). https://doi.org/10.1007/s00224-010-9285-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-010-9285-4

Keywords

Navigation