Theory of Computing Systems

, Volume 47, Issue 4, pp 934–962 | Cite as

Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model

  • Michael A. Bender
  • Gerth Stølting Brodal
  • Rolf Fagerberg
  • Riko Jacob
  • Elias Vicari
Article

Abstract

We study the problem of sparse-matrix dense-vector multiplication (SpMV) in external memory. The task of SpMV is to compute y:=Ax, where A is a sparse N×N matrix and x is a vector. We express sparsity by a parameter k, and for each choice of k consider the class of matrices where the number of nonzero entries is kN, i.e., where the average number of nonzero entries per column is k.

We investigate what is the external worst-case complexity, i.e., the best possible upper bound on the number of I/Os, as a function of k, N and the parameters M (memory size) and B (track size) of the I/O-model. We determine this complexity up to a constant factor for all meaningful choices of these parameters, as long as kN1−ε, where ε depends on the problem variant. Our model of computation for the lower bound is a combination of the I/O-models of Aggarwal and Vitter, and of Hong and Kung.

We study variants of the problem, differing in the memory layout of A. If A is stored in column major layout, we prove that SpMV has I/O complexity \(\Theta(\min\{\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{\max\{k,M\}}\},\,kN\})\) for kN1−ε and any constant 0<ε<1. If the algorithm can choose the memory layout, the I/O complexity reduces to \(\Theta ({\min\{\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{kM}\},kN\}})\) for \(k\leq\sqrt[3]{N}\). In contrast, if the algorithm must be able to handle an arbitrary layout of the matrix, the I/O complexity is \(\Theta ({\min\{\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{M}\},kN\}})\) for kN/2.

In the cache oblivious setting we prove that with tall cache assumption MB1+ε, the I/O complexity is \(\mathcal {O}({\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{\max\{k,M\}}\}})\) for A in column major layout.

Keywords

I/O-model External memory algorithms Lower bound Sparse matrix dense vector multiplication 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988) CrossRefMathSciNetGoogle Scholar
  2. 2.
    Arge, L., Miltersen, P.B.: On showing lower bounds for external-memory computational geometry problems. In: Abello, J.M., Vitter, J.S. (eds.) External Memory Algorithms and Visualization. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 50, pp. 139–159. American Mathematical Society, Providence (1999) Google Scholar
  3. 3.
    Brodal, G.S., Fagerberg, R.: On the limits of cache-obliviousness. In: Proc. 35th Annual ACM Symposium on Theory of Computing (STOC), pp. 307–315. ACM, San Diego (2003) Google Scholar
  4. 4.
    Brodal, G.S., Fagerberg, R., Moruz, G.: Cache-aware and cache-oblivious adaptive sorting. In: Proc. 32nd International Colloquium on Automata, Languages, and Programming. Lecture Notes in Computer Science, vol. 3580, pp. 576–588. Springer, Berlin (2005) CrossRefGoogle Scholar
  5. 5.
    Cormen, T.H., Sundquist, T., Wisniewski, L.F.: Asymptotically tight bounds for performing BMMC permutations on parallel disk systems. SIAM J. Comput. 28(1), 105–136 (1999) CrossRefMathSciNetGoogle Scholar
  6. 6.
    Demmel, J., Dongarra, J., Eijkhout, V., Fuentes, E., Antoine Petitet, R.V., Whaley, R.C., Yelick, K.: Self-adapting linear algebra algorithms and software. Proc. IEEE 93(2) (2005). Special Issue on Program Generation, Optimization, and Adaptation Google Scholar
  7. 7.
    Filippone, S., Colajanni, M.: PSBLAS: a library for parallel linear algebra computation on sparse matrices. ACM Trans. Math. Softw. 26(4), 527–550 (2000) CrossRefGoogle Scholar
  8. 8.
    Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proc. 40th Annual Symposium on Foundations of Computer Science (FOCS), pp. 285–297. IEEE Computer Society, New York (1999) Google Scholar
  9. 9.
    Hong, J.-W., Kung, H.T.: I/O complexity: the red-blue pebble game. In: Proc. 13th Annual ACM Symposium on Theory of Computing (STOC), pp. 326–333. ACM, New York (1981) Google Scholar
  10. 10.
    Im, E.J.: Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, May 2000 Google Scholar
  11. 11.
    Raz, R.: Multi-linear formulas for permanent and determinant are of super-polynomial size. In: Proc. 36th Annual ACM Symposium on Theory of Computing (STOC), Chicago, IL, USA, pp. 633–641. ACM, New York (2004) Google Scholar
  12. 12.
    Remington, K., Pozo, R.: NIST sparse BLAS user’s guide. Technical report, National Institute of Standards and Technology, Gaithersburg, MD (1996) Google Scholar
  13. 13.
    Saad, Y.: Sparsekit: a basic tool kit for sparse matrix computations. Technical report, Computer Science Department, University of Minnesota, June 1994 Google Scholar
  14. 14.
    Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 13(4), 354–356 (1969) MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Toledo, S.: A survey of out-of-core algorithms in numerical linear algebra. In: Abello, J.M., Vitter, J.S. (eds.) External Memory Algorithms and Visualization. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 50, pp. 161–179. American Mathematical Society, Providence (1999) Google Scholar
  16. 16.
    Vitter, J.S.: External memory algorithms and data structures. In: Abello, J.M., Vitter, J.S. (eds.) External Memory Algorithms and Visualization. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 50 pp. 1–38. American Mathematical Society, Providence (1999) Google Scholar
  17. 17.
    Vudac, R., Demmel, J.W., Yelick, K.A.: The Optimized Sparse Kernel Interface (OSKI) library: user’s guide for version 1.0.1b. Berkeley Benchmarking and OPtimization (BeBOP) Group, 15 March 2006 Google Scholar
  18. 18.
    Vuduc, R.W.: Automatic performance tuning of sparse matrix kernels. PhD thesis, University of California, Berkeley, Fall 2003 Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Michael A. Bender
    • 1
  • Gerth Stølting Brodal
    • 2
  • Rolf Fagerberg
    • 3
  • Riko Jacob
    • 4
  • Elias Vicari
    • 5
  1. 1.Department of Computer ScienceStony Brook UniversityStony BrookUSA
  2. 2.MADALGO, Department of Computer ScienceAarhus UniversityAarhusDenmark
  3. 3.Department of Mathematics and Computer ScienceUniversity of Southern DenmarkOdenseDenmark
  4. 4.Department of Computer ScienceTechnische Universität MünchenMunichGermany
  5. 5.Institute of Theoretical Computer ScienceETH ZurichZurichSwitzerland

Personalised recommendations