Linear algebra subprograms on shared memory computers: Beyond LAPACK
This paper discusses the implementation of LAPACK routines on a cache-based, shared memory system. The shortcomings of an approach which relies on parallelized BLAS are illustrated in the cases of LU, Cholesky and QR factorizations. An alternative approach to these factorization routines, exploiting explicit parallelism at a higher level in the code, is reported: this provides higher scalability and efficiency in all cases studied. Issues of portability were addressed by using standard Fortran 77 and PCF compiler directives in all codes.
Unable to display preview. Download preview PDF.
- 1.E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen: LAPACK Users' Guide, Release 2.0. SIAM, Philadelphia, 1995.Google Scholar
- 2.J.J. Dongarra, J.J. Du Croz, I.S.Duff, S. Hammarling: A Set of Level 3 Basic Linear Algebra Subprograms. ACM Trans. Math, Softw., 16 pp. 1–17, 1990.Google Scholar
- 3.S.Salvini, J.Waśniewski: Experiences in Developing Numerical Subprograms on a Parallel, Shared Memory Computer. NAG Technical Report TR5/96, Oxford, 1996. Also in UNI•C Technical Report UNIC-96–04, Copenhagen, 1996.Google Scholar
- 4.S.Salvini: Numerical Libraries on Shared Memory Computers, PARA'96 Workshop, 1996.Google Scholar
- 5.Silicon Graphics Computer Systems, “POWER CHALLENGE Supercomputing Servers”, Silicon Graphics Computer Systems, Marketing Dept, Supercomputing Systems Div., 485 Central Avenue, Mountain View, CA 9043, USA, 1994.Google Scholar