Abstract
This ScaLAPACK tutorial begins with a brief description of the LAPACK library. The importance of block-partitioned algorithms in reducing the frequency of data movement between different levels of hierarchical memory is stressed. By relying on the Basic Linear Algebra Subprograms (BLAS) it is possible to develop portable and efficient implementations of these algorithms across a wide range of architectures, with emphasis on workstations, vector-processors and shared-memory computers, as has been done in LAPACK.
The ScaLAPACK library, which is a distributed memory version of LAPACK is then presented. A key idea in our approach is the use of Basic Linear Algebra Communication Subprograms (BLACS) as communication building blocks and the use of a distributed version of the BLAS, the Parallel Basic Linear Algebra Subprograms (PBLAS) as computational building blocks. The BLACS and PBLAS features are in turn outlined and it is shown how these building blocks can be used to construct higher-level algorithms, and hide many details of the parallelism from the application developer. Performance results of ScaLAPACK routines are presented validating the adoption of the block-cyclic decomposition scheme as a way of distributing block-partitioned matrices yielding to well balanced computations and scalable implementations.
Finally, future directions for the ScaLAPACK library are described and alternative approaches to mathematical libraries are suggested that could integrate ScaLAPACK into efficient and user-friendly distributed systems.
This work was supported in part by the National Science Foundation Grant No. ASC-9005933; by the Defense Advanced Research Projects Agency under contract DAAL03-91-C-0047, administered by the Army Research Office; by the Office of Scientific Computing, U.S. Department of Energy, under Contract DE-AC05-84OR21400; and by the National Science Foundation Science and Technology Center Cooperative Agreement No. CCR-8809615.
Preview
Unable to display preview. Download preview PDF.
References
Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., Sorensen, D.: LAPACK Users' Guide, Second Edition. SIAM, Philadelphia, PA, 1995.
Lawson, C., Hanson, R., Kincaid, D., Kincaid, D., Krogh, F.: Basic Linear Algebra Subprograms for Fortran Usage. ACM Transactions on Mathematical Software, 5:308–323, 1979.
Dongarra, J., Du Croz, J., Hammarling, S., Hanson, R.: Algorithm 656: An extended Set of Basic Linear Algebra Subprograms: Model Implementation and Test Programs. ACM Transactions on Mathematical Software, 14(1): 18–32, 1988.
Dongarra, J., Du Croz, J., Duff, I., Hammarling, S.: A Set of Level 3 Basic Linear Algebra Subprograms. ACM Transactions on Mathematical Software, 16(1):1–17, 1990.
Anderson, E., Dongarra, J.: Results from the initial release of LAPACK. LAPACK working note 16, Computer Science Department, University of Tennessee, Knoxville, TN, 1989. institution = ”Computer Science Department, University of
Anderson, E., Dongarra, J.: Evaluating block algorithm variants in LAPACK. LA-PACK working note 19, Computer Science Department, University of Tennessee, Knoxville, TN, 1990.
Choi, J., Dongarra, J., Walker, D.: Parallel matrix transpose algorithms on distributed memory concurrent computers. In Proceedings of Fourth Symposium on the Frontiers of Massively Parallel Computation (McLean, Virginia), pages 245–252. IEEE Computer Society Press, Los Alamitos, California, 1993. (also LAPACK Working Note #65).
Dongarra, J., Whaley, R.C.: A User's Guide to the BLACS v1.0. Technical Report UT CS-95–281, LAPACK Working Note #94, University of Tennessee, 1995.
Choi, J., Demmel, J., Dhillon, I., Dongarra, J., Ostrouchov, S., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers — Design Issues and Performance. Technical Report UT CS-95-283, LAPACK Working Note #95, University of Tennessee, 1995.
Choi, J., Dongarra, J., Ostrouchov, S., Petitet, A., Walker, D., Whaley, R.C.: A Proposal for a Set of Parallel Basic Linear Algebra Subprograms. Technical Report UT CS-95-292, LAPACK Working Note #100, University of Tennessee, 1995.
Dongarra, J., Duff, I., Sorensen, D., Van der Vorst, H.: Solving Linear Systems on Vector and Shared Memory Computers. SIAM Publications, Philadelphia, PA, 1991.
Message Passing Interface Forum. MPI: A Message Passing Interface Standard. International Journal of Supercomputer Applications and High Performance Computing, 8(3–4), 1994.
Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R. V. Sunderam, V.: PVM: Parallel Virtual Machine. A User's Guide and Tutorial for Networked Parallel Computing. The MIT Press, Cambridge, Massachusetts, 1994.
Koebel, C., Loveman, D., Schreiber, R., Steele, G., Zosel, M.: The High Performance Fortran Handbook. The MIT Press, Cambridge, Massachusetts, 1994.
Wilkinson, J., Reinsch, C: Handbook for Automatic Computation: Volume II — Linear Algebra. Springer-Verlag, New York, 1971.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dongarra, J., Petitet, A. (1996). ScaLAPACK tutorial. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds) Applied Parallel Computing Computations in Physics, Chemistry and Engineering Science. PARA 1995. Lecture Notes in Computer Science, vol 1041. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60902-4_20
Download citation
DOI: https://doi.org/10.1007/3-540-60902-4_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60902-5
Online ISBN: 978-3-540-49670-0
eBook Packages: Springer Book Archive