Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model

Bender, Michael A.; Brodal, Gerth Stølting; Fagerberg, Rolf; Jacob, Riko; Vicari, Elias

doi:10.1007/s00224-010-9285-4

Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model

Published: 26 August 2010

Volume 47, pages 934–962, (2010)
Cite this article

Theory of Computing Systems Aims and scope Submit manuscript

Michael A. Bender¹,
Gerth Stølting Brodal²,
Rolf Fagerberg³,
Riko Jacob⁴ &
…
Elias Vicari⁵

290 Accesses
18 Citations
Explore all metrics

Abstract

We study the problem of sparse-matrix dense-vector multiplication (SpMV) in external memory. The task of SpMV is to compute y:=Ax, where A is a sparse N×N matrix and x is a vector. We express sparsity by a parameter k, and for each choice of k consider the class of matrices where the number of nonzero entries is kN, i.e., where the average number of nonzero entries per column is k.

We investigate what is the external worst-case complexity, i.e., the best possible upper bound on the number of I/Os, as a function of k, N and the parameters M (memory size) and B (track size) of the I/O-model. We determine this complexity up to a constant factor for all meaningful choices of these parameters, as long as k≤N ^1−ε, where ε depends on the problem variant. Our model of computation for the lower bound is a combination of the I/O-models of Aggarwal and Vitter, and of Hong and Kung.

We study variants of the problem, differing in the memory layout of A. If A is stored in column major layout, we prove that SpMV has I/O complexity \(\Theta(\min\{\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{\max\{k,M\}}\},\,kN\})\) for k≤N ^1−ε and any constant 0<ε<1. If the algorithm can choose the memory layout, the I/O complexity reduces to \(\Theta ({\min\{\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{kM}\},kN\}})\) for \(k\leq\sqrt[3]{N}\). In contrast, if the algorithm must be able to handle an arbitrary layout of the matrix, the I/O complexity is \(\Theta ({\min\{\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{M}\},kN\}})\) for k≤N/2.

In the cache oblivious setting we prove that with tall cache assumption M≥B ^1+ε, the I/O complexity is \(\mathcal {O}({\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{\max\{k,M\}}\}})\) for A in column major layout.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Input/Output Complexity of Sparse Matrix Multiplication

New Efficient General Sparse Matrix Formats for Parallel SpMV Operations

Deterministic Coresets for Stochastic Matrices with Applications to Scalable Sparse PageRank

References

Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)
Article MathSciNet Google Scholar
Arge, L., Miltersen, P.B.: On showing lower bounds for external-memory computational geometry problems. In: Abello, J.M., Vitter, J.S. (eds.) External Memory Algorithms and Visualization. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 50, pp. 139–159. American Mathematical Society, Providence (1999)
Google Scholar
Brodal, G.S., Fagerberg, R.: On the limits of cache-obliviousness. In: Proc. 35th Annual ACM Symposium on Theory of Computing (STOC), pp. 307–315. ACM, San Diego (2003)
Google Scholar
Brodal, G.S., Fagerberg, R., Moruz, G.: Cache-aware and cache-oblivious adaptive sorting. In: Proc. 32nd International Colloquium on Automata, Languages, and Programming. Lecture Notes in Computer Science, vol. 3580, pp. 576–588. Springer, Berlin (2005)
Chapter Google Scholar
Cormen, T.H., Sundquist, T., Wisniewski, L.F.: Asymptotically tight bounds for performing BMMC permutations on parallel disk systems. SIAM J. Comput. 28(1), 105–136 (1999)
Article MathSciNet Google Scholar
Demmel, J., Dongarra, J., Eijkhout, V., Fuentes, E., Antoine Petitet, R.V., Whaley, R.C., Yelick, K.: Self-adapting linear algebra algorithms and software. Proc. IEEE 93(2) (2005). Special Issue on Program Generation, Optimization, and Adaptation
Filippone, S., Colajanni, M.: PSBLAS: a library for parallel linear algebra computation on sparse matrices. ACM Trans. Math. Softw. 26(4), 527–550 (2000)
Article Google Scholar
Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proc. 40th Annual Symposium on Foundations of Computer Science (FOCS), pp. 285–297. IEEE Computer Society, New York (1999)
Google Scholar
Hong, J.-W., Kung, H.T.: I/O complexity: the red-blue pebble game. In: Proc. 13th Annual ACM Symposium on Theory of Computing (STOC), pp. 326–333. ACM, New York (1981)
Google Scholar
Im, E.J.: Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, May 2000
Raz, R.: Multi-linear formulas for permanent and determinant are of super-polynomial size. In: Proc. 36th Annual ACM Symposium on Theory of Computing (STOC), Chicago, IL, USA, pp. 633–641. ACM, New York (2004)
Google Scholar
Remington, K., Pozo, R.: NIST sparse BLAS user’s guide. Technical report, National Institute of Standards and Technology, Gaithersburg, MD (1996)
Saad, Y.: Sparsekit: a basic tool kit for sparse matrix computations. Technical report, Computer Science Department, University of Minnesota, June 1994
Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 13(4), 354–356 (1969)
Article MATH MathSciNet Google Scholar
Toledo, S.: A survey of out-of-core algorithms in numerical linear algebra. In: Abello, J.M., Vitter, J.S. (eds.) External Memory Algorithms and Visualization. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 50, pp. 161–179. American Mathematical Society, Providence (1999)
Google Scholar
Vitter, J.S.: External memory algorithms and data structures. In: Abello, J.M., Vitter, J.S. (eds.) External Memory Algorithms and Visualization. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 50 pp. 1–38. American Mathematical Society, Providence (1999)
Google Scholar
Vudac, R., Demmel, J.W., Yelick, K.A.: The Optimized Sparse Kernel Interface (OSKI) library: user’s guide for version 1.0.1b. Berkeley Benchmarking and OPtimization (BeBOP) Group, 15 March 2006
Vuduc, R.W.: Automatic performance tuning of sparse matrix kernels. PhD thesis, University of California, Berkeley, Fall 2003

Download references

Author information

Authors and Affiliations

Department of Computer Science, Stony Brook University, Stony Brook, NY, 11794-4400, USA
Michael A. Bender
MADALGO, Department of Computer Science, Aarhus University, Aarhus, Denmark
Gerth Stølting Brodal
Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
Rolf Fagerberg
Department of Computer Science, Technische Universität München, Munich, Germany
Riko Jacob
Institute of Theoretical Computer Science, ETH Zurich, 8092, Zurich, Switzerland
Elias Vicari

Authors

Michael A. Bender
View author publications
You can also search for this author in PubMed Google Scholar
Gerth Stølting Brodal
View author publications
You can also search for this author in PubMed Google Scholar
Rolf Fagerberg
View author publications
You can also search for this author in PubMed Google Scholar
Riko Jacob
View author publications
You can also search for this author in PubMed Google Scholar
Elias Vicari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Riko Jacob.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bender, M.A., Brodal, G.S., Fagerberg, R. et al. Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model. Theory Comput Syst 47, 934–962 (2010). https://doi.org/10.1007/s00224-010-9285-4

Download citation

Published: 26 August 2010
Issue Date: November 2010
DOI: https://doi.org/10.1007/s00224-010-9285-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model

Abstract

Access this article

Similar content being viewed by others

The Input/Output Complexity of Sparse Matrix Multiplication

New Efficient General Sparse Matrix Formats for Parallel SpMV Operations

Deterministic Coresets for Stochastic Matrices with Applications to Scalable Sparse PageRank

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model

Abstract

Access this article

Similar content being viewed by others

The Input/Output Complexity of Sparse Matrix Multiplication

New Efficient General Sparse Matrix Formats for Parallel SpMV Operations

Deterministic Coresets for Stochastic Matrices with Applications to Scalable Sparse PageRank

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation