A Cache Oblivious Algorithm for Matrix Multiplication Based on Peano’s Space Filling Curve

  • Michael Bader
  • Christoph Zenger
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3911)


Cache oblivious algorithms are algorithms that are designed to inherently exploit any kind of cache memory—regardless of its size or architecture. In this article, we discuss a cache oblivious algorithm for matrix multiplication. The elements of the matrices are stored according to a Peano space filling curve. A block recursive approach then leads to an algorithm where memory access to matrix elements is strictly local. Consequently, the algorithm shows several interesting properties considering cache performance, prefetching strategies, or even parallelization.


Matrix Multiplication Recursive Call Block Multiplication Cache Line Cache Memory 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bader, M., Zenger, C.: Cache oblivious matrix multiplication using an element ordering based on the Peano curve. Linear Algebra and its Applications (submitted)Google Scholar
  2. 2.
    Chatterjee, S., Jain, V.V., Lebeck, A.R., Mundhra, S., Thottethodi, M.: Nonlinear Array Layouts for Hierarchical Memory Systems. In: International Conference on Supercomputing (ICS 1999) (1999)Google Scholar
  3. 3.
    Frens, J., Wise, D.S.: Auto-Blocking Matrix-Multiplication or Tracking BLAS3 Performance from Source Code. In: Proceedings of the 6th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (1997)Google Scholar
  4. 4.
    Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, New York, October 1999, pp. 285–297 (1999)Google Scholar
  5. 5.
    Goto, K., van de Geijn, R.: On Reducing TLB Misses in Matrix Multiplication. TOMS, under revision, http://www.cs.utexas.edu/users/flame/pubs.html
  6. 6.
    Gustavson, F.G.: Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM Journal of Research and Development 41(6) (1999)Google Scholar
  7. 7.
    Hong, J.-W., Kung, H.T.: I/O complexity: the red-blue pebble game. In: Proceedings of ACM Symposium on Theory of Computing (1981)Google Scholar
  8. 8.
    Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated Empirical Optimization of Software and the ATLAS Project. Parallel Computing 27(1–2), 3–35 (2001)CrossRefMATHGoogle Scholar
  9. 9.
    Zumbusch, G.: Adaptive Parallel Multilevel Methods for Partial Differential Equations. Habilitation, Universität Bonn (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Michael Bader
    • 1
  • Christoph Zenger
    • 1
  1. 1.Dept. of InformaticsTU MünchenMünchenGermany

Personalised recommendations