A Paradigm for Parallel Matrix Algorithms:

  • David S. Wise
  • Craig Citro
  • Joshua Hursey
  • Fang Liu
  • Michael Rainey
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3648)


A style for programming problems from matrix algebra is developed with a familiar example and new tools, yielding high performance with a couple of surprising exceptions. The underlying philosophy is to use block recursion as the exclusive control structure, down to a 2 p × 2 p base case anyway, where hardware favors iterative style to fill its pipe. Use of Morton-ordered matrices yields excellent locality within the memory hierarchy—including block sharing among distributed computers. The recursion generalizes nicely to an SPMD program where such sharing is the only communication.

Cholesky factorization of an n × n SPD matrix is used as a simple nontrivial example to expose the paradigm. The program amounts to four functions, two of which are finalizers for the other two. This insight allows final blocks to be shared with inter-node communication ∈ Θ(n 2) for this algorithm ∈ Θ (n 3) flops.


Programming Problem Base Case Matrix Algebra Hard Copy Cholesky Factorization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chatterjee, S., Lebeck, A.R., Patnala, P.K., Thottenthodi, M.: Recursive array layouts and fast parallel matrix multiplication. IEEE Trans. Parallel Distrib. Syst. 13, 1105–1123 (2002), CrossRefGoogle Scholar
  2. 2.
    Thiyagalingam, J., Beckmann, O., Kelly, P.H.J.: Is Morton layout competitive for large two-dimensional arrays, yet? Concur. Comput. Prac. Exper. (2004) ,To appear in special issue on Compilers for Parallel Computing,
  3. 3.
    Goto, K., van de Geijn, R.: On reducing TLB misses in matrix multiplication.FLAME Working Note 9, Univ. of Texas, Austin (2002),
  4. 4.
    Morton, C.: A computer oriented geodetic data base and a new technique in file sequencing. Technical report, IBM Ltd., Ottawa, Ontario (1966)Google Scholar
  5. 5.
    Drakenberg, P., Lundevall, F., Lisper, B.: An efficient semi-hierarchical array layout. In: Lee, C., Yew, P.C. (eds.) Interaction between Compilers and Computer Architectures. Kluwer Intl. Series in Engineering and Computer Science, vol. 613, Kluwer, Deventer (2001), Google Scholar
  6. 6.
    Wise, D.S.: Ahnentafel indexing into Morton-ordered arrays, or matrix locality for free. In: Bode, A., Ludwig, T., Karl, W.C., Wismüller, R. (eds.) Euro-Par 2000. LNCS, vol. 1900, pp. 774–883. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  7. 7.
    Wise, D.S., Frens, J.D., Gu, Y., Alexander, G.A.: Language support for Morton-order matrices. In: Proc. 8th ACM SIGPLAN Symp. on Principles and Practice of Parallel Program. SIGPLAN Not., vol. 36, pp. 24–33 (2001),
  8. 8.
    Schrack, G.: Finding neighbors of equal size in linear quadtrees and octrees in constant time. CVGIP: Image Underst. 55, 221–230 (1992)zbMATHCrossRefGoogle Scholar
  9. 9.
    Raman, R., Wise, D.S.: Converting to and from dilated integers. Submitted for publication (2004),
  10. 10.
    Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proc. 40th Ann. Symp. Foundations of Computer Science, pp. 285–298. IEEE Computer Soc. Press, Washington (1999), Google Scholar
  11. 11.
    Frens, J.D.: Matrix Factorization Using a Block-Recursive Structure and Block-Recursive Algorithms. PhD thesis, Indiana Univ., Bloomington (2002),
  12. 12.
    Spiefi, J.: Untersuchungen des Zeitgewinns durch neue Algorithmen zur Matrix-Multiplication. Computing 17, 23–36 (1976)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Tocher, K.D.: The application of automatic computers to sampling experiments. J. Roy. Statist. Soc. Ser. B 16, 39–61,53-55 (1954)zbMATHMathSciNetGoogle Scholar
  14. 14.
    Johnson, D.S.: A theoretician’s guide to the experimental analysis of algorithms. In: Goldwasser, M.H., Johnson, D.S., McGeoch, C.C. (eds.) Data Structures, Near Neighbor Searches, and Methodology: 5th & 6th DIMACS Implementation Challenges. DIMACS Ser. Discrete Math. Theoret. Comput. Sci. Amer. Math. Soc, Providence, vol. 59, pp. 215–250 (2002),
  15. 15.
    Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proc. Supercomputing 1998, vol. 38, IEEE Computer Soc. Press, Washington (1998), Google Scholar
  16. 16.
    Intel Corp. Santa Clara, CA: Intel Math Kernel Library (2003),
  17. 17.
    LAM/MPI Bloomington, IN (2004) ,
  18. 18.
    InfiniBand Trade Assn. Portland, OR (2004),
  19. 19.
    InfiniCon Systems King of Prussia, PA (2004) ,
  20. 20.
    Myricom Inc. Arcadia, CA (2004) ,
  21. 21.
    Quadrics Ltd. Bristol, UK (2004),
  22. 22.
    Quadrics Ltd. Bristol, UK: Quadrics Release of MPICH 1.24. (2004),

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • David S. Wise
    • 1
  • Craig Citro
    • 1
  • Joshua Hursey
    • 1
  • Fang Liu
    • 1
  • Michael Rainey
    • 1
  1. 1.Indiana UniversityBloomington

Personalised recommendations