Abstract
We study the problem of sparse-matrix dense-vector multiplication (SpMV) in external memory. The task of SpMV is to compute y:=Ax, where A is a sparse N×N matrix and x is a vector. We express sparsity by a parameter k, and for each choice of k consider the class of matrices where the number of nonzero entries is kN, i.e., where the average number of nonzero entries per column is k.
We investigate what is the external worst-case complexity, i.e., the best possible upper bound on the number of I/Os, as a function of k, N and the parameters M (memory size) and B (track size) of the I/O-model. We determine this complexity up to a constant factor for all meaningful choices of these parameters, as long as k≤N 1−ε, where ε depends on the problem variant. Our model of computation for the lower bound is a combination of the I/O-models of Aggarwal and Vitter, and of Hong and Kung.
We study variants of the problem, differing in the memory layout of A. If A is stored in column major layout, we prove that SpMV has I/O complexity \(\Theta(\min\{\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{\max\{k,M\}}\},\,kN\})\) for k≤N 1−ε and any constant 0<ε<1. If the algorithm can choose the memory layout, the I/O complexity reduces to \(\Theta ({\min\{\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{kM}\},kN\}})\) for \(k\leq\sqrt[3]{N}\). In contrast, if the algorithm must be able to handle an arbitrary layout of the matrix, the I/O complexity is \(\Theta ({\min\{\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{M}\},kN\}})\) for k≤N/2.
In the cache oblivious setting we prove that with tall cache assumption M≥B 1+ε, the I/O complexity is \(\mathcal {O}({\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{\max\{k,M\}}\}})\) for A in column major layout.
Similar content being viewed by others
References
Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)
Arge, L., Miltersen, P.B.: On showing lower bounds for external-memory computational geometry problems. In: Abello, J.M., Vitter, J.S. (eds.) External Memory Algorithms and Visualization. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 50, pp. 139–159. American Mathematical Society, Providence (1999)
Brodal, G.S., Fagerberg, R.: On the limits of cache-obliviousness. In: Proc. 35th Annual ACM Symposium on Theory of Computing (STOC), pp. 307–315. ACM, San Diego (2003)
Brodal, G.S., Fagerberg, R., Moruz, G.: Cache-aware and cache-oblivious adaptive sorting. In: Proc. 32nd International Colloquium on Automata, Languages, and Programming. Lecture Notes in Computer Science, vol. 3580, pp. 576–588. Springer, Berlin (2005)
Cormen, T.H., Sundquist, T., Wisniewski, L.F.: Asymptotically tight bounds for performing BMMC permutations on parallel disk systems. SIAM J. Comput. 28(1), 105–136 (1999)
Demmel, J., Dongarra, J., Eijkhout, V., Fuentes, E., Antoine Petitet, R.V., Whaley, R.C., Yelick, K.: Self-adapting linear algebra algorithms and software. Proc. IEEE 93(2) (2005). Special Issue on Program Generation, Optimization, and Adaptation
Filippone, S., Colajanni, M.: PSBLAS: a library for parallel linear algebra computation on sparse matrices. ACM Trans. Math. Softw. 26(4), 527–550 (2000)
Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proc. 40th Annual Symposium on Foundations of Computer Science (FOCS), pp. 285–297. IEEE Computer Society, New York (1999)
Hong, J.-W., Kung, H.T.: I/O complexity: the red-blue pebble game. In: Proc. 13th Annual ACM Symposium on Theory of Computing (STOC), pp. 326–333. ACM, New York (1981)
Im, E.J.: Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, May 2000
Raz, R.: Multi-linear formulas for permanent and determinant are of super-polynomial size. In: Proc. 36th Annual ACM Symposium on Theory of Computing (STOC), Chicago, IL, USA, pp. 633–641. ACM, New York (2004)
Remington, K., Pozo, R.: NIST sparse BLAS user’s guide. Technical report, National Institute of Standards and Technology, Gaithersburg, MD (1996)
Saad, Y.: Sparsekit: a basic tool kit for sparse matrix computations. Technical report, Computer Science Department, University of Minnesota, June 1994
Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 13(4), 354–356 (1969)
Toledo, S.: A survey of out-of-core algorithms in numerical linear algebra. In: Abello, J.M., Vitter, J.S. (eds.) External Memory Algorithms and Visualization. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 50, pp. 161–179. American Mathematical Society, Providence (1999)
Vitter, J.S.: External memory algorithms and data structures. In: Abello, J.M., Vitter, J.S. (eds.) External Memory Algorithms and Visualization. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 50 pp. 1–38. American Mathematical Society, Providence (1999)
Vudac, R., Demmel, J.W., Yelick, K.A.: The Optimized Sparse Kernel Interface (OSKI) library: user’s guide for version 1.0.1b. Berkeley Benchmarking and OPtimization (BeBOP) Group, 15 March 2006
Vuduc, R.W.: Automatic performance tuning of sparse matrix kernels. PhD thesis, University of California, Berkeley, Fall 2003
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bender, M.A., Brodal, G.S., Fagerberg, R. et al. Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model. Theory Comput Syst 47, 934–962 (2010). https://doi.org/10.1007/s00224-010-9285-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00224-010-9285-4