New Data Distribution for Solving Triangular Systems on Distributed Memory Machines

  • Przemysław Stpiczyński
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4699)


The aim is to present a new data distribution of triangular matrices that provides steady distribution of blocks among processes and reduces memory wasting compared to the standard block-cyclic data layout used in the ScaLAPACK library for dense matrix computations. A new algorithm for solving triangular systems of linear equations is also introduced. The results of experiments performed on a cluster of Itanium 2 processors and Cray X1 show that in some cases, the new method is faster than corresponding PBLAS routines PSTRSV and PSTRSM.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baboulin, M., Giraud, L., Gratton, S., Langou, J.: A distributed packed storage for large dense parallel in-core calculations. Technical Report TR/PA/05/30, CERFACS (2005)Google Scholar
  2. 2.
    Blackford, L., et al.: ScaLAPACK User’s Guide. SIAM, Philadelphia (1997)Google Scholar
  3. 3.
    Chaudron, M.R., van Duin, A.C.: The formal derivation of parallel triangular system solvers using a coordination-based design method. Parallel Comput. 24, 1023–1046 (1998)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    D’Azevedo, E.F., Dongarra, J.J.: LAPACK Working Note 135: Packed storage extension for ScaLAPACK (1998)Google Scholar
  5. 5.
    Dongarra, J., Duff, I., Sorensen, D., Van der Vorst, H.: Solving Linear Systems on Vector and Shared Memory Computers. SIAM, Philadelphia (1991)Google Scholar
  6. 6.
    Dongarra, J.J., Whaley, R.C.: LAPACK Working Note 94: A user’s guide to the BLACS, vol. 1.1 (1997),
  7. 7.
    Gustavson, F.G., Karlsson, L., Kagstrom, B.: Three algorithms for Cholesky factorization on distributed memory using packed storage. In: Workshop on State-of-the-Art in Scientific and Parallel Computing, Umea, Sweden, June 2006. LNCS (2006)Google Scholar
  8. 8.
    Heath, M., Romine, C.: Parallel solution of triangular systems on distributed memory multiprocessors. SIAM J. Sci. Statist. Comput. 9, 558–588 (1988)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Li, G., Coleman, T.F.: A new method for solving triangular systems on distributed-memory message-passing multiprocessors. SIAM J. Sci. Stat. Comput. 10, 382–396 (1989)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Netwok Computer Services Inc.: The AHPCRC Cray X1 primer,
  11. 11.
    Romine, C., Ortega, J.: Parallel solutions of triangular systems of equations. Parallel Comput. 6, 109–114 (1988)CrossRefMathSciNetGoogle Scholar
  12. 12.
    Stpiczyński, P.: Parallel Cholesky factorization on orthogonal multiprocessors. Parallel Computing 18, 213–219 (1992)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Przemysław Stpiczyński
    • 1
  1. 1.Department of Computer Science, Maria Curie–Skłodowska University, Pl. M. Curie-Skłodowskiej 1, PL-20-031 LublinPoland

Personalised recommendations