The Architecture of Scientific Software pp 193-210 | Cite as

# Formal Methods for High-Performance Linear Algebra Libraries

## Abstract

A colleague of ours, Dr. Timothy Mattson of Intel, once made the following observation: “Literature professors read literature. Computer Science professors should at least occasionally read code.” The point he was making was that in order to write superior prose one needs to read good (and bad) literature. Analogously, it is our thesis that exposure to elegant (and ugly) programs tends to yield the insights which are necessary if one wishes to produce consistently well-written code.

Since the advent of high-performance distributed-memory parallel computing, the need for intelligible code has become ever greater. Development and maintenance of libraries for these kinds of architectures is simply too complex to be amenable to conventional approaches to coding. Attempting to do so has led to the production of an abundance of inefficient, anfractuous code that is difficult to maintain and nigh-impossible to upgrade.

Having struggled with these issues for more than a decade, we have arrived at a conclusion which is somewhat surprising to us: the answer is to apply formal methods from Computer Science to the development of high-performance linear algebra libraries. The resulting approach has consistently resulted in aesthetically-pleasing, coherent code that greatly facilitates performance analysis, intelligent modularity, and the enforcement of program correctness via assertions. Since the technique is completely language-independent, it lends itself equally well to a wide spectrum of programming languages (and paradigms) ranging from C and Fortran to C++ and Java. In this paper, we illustrate our observations by looking at our Formal Linear Algebra Methods Environment (FLAME).

## Keywords

FLAME linear algebra algorithms formal methods LU factorization## References

- [1]Robert Allen and David Garlan. A formal basis for architectural connection. A
*CM TOSEM*, 6 (3): 213–249, 1997.Google Scholar - [2]E. Anderson, Z. Bai, J. Demmel, J. E. Dongarra, J. DuCroz, A. Greenbaum, S. Hammarling, A. E. McKenney, S. Ostrouchov, and D. Sorensen.
*LAPACK Users’ Guide*. SIAM, Philadelphia, 1992.MATHGoogle Scholar - [3]J. Choi, J. J. Dongarra, R. Pozo, and D. W. Walker. Scalapack: A scalable linear algebra library for distributed memory concurrent computers. In
*Proceedings of the Fourth Symposium on the Frontiers of Massively Parallel Computation*, pages 120–127. IEEE Comput. Soc. Press, 1992.CrossRefGoogle Scholar - [4]J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart.
*LINPACK Users’ Guide*. SIAM, Philadelphia, 1979.CrossRefMATHGoogle Scholar - [5]Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Iain Duff. A set of level 3 basic linear algebra subprograms.
*ACM Trans. Math. Soft.*, 16 (1): 1–17, March 1990.CrossRefMATHGoogle Scholar - [6]Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson. An extended set of FORTRAN basic linear algebra subprograms.
*ACM Trans. Math. Soft.*, 14 (1): 1–17, March 1988.CrossRefMATHGoogle Scholar - [7]Jack J. Dongarra, Iain S. Duff, Danny C. Sorensen, and Henk A. van der Vorst. Solving Linear Systems on Vector and Shared Memory Computers. SIAM, Philadelphia, PA, 1991.Google Scholar
- [8]David Gries.
*The Science of Programming*. Springer-Verlag, 1981.Google Scholar - [9]John Gunnels, Greg Henry, and Robert van de Geijn. Toward dynamic high-performance matrix multiplication kernels. Technical report, Department of Computer Sciences, The University of Texas at Austin, in preparation.Google Scholar
- [10]John Gunnels, Calvin Lin, Greg Morrow, and Robert van de Geijn. A flexible class of parallel matrix multiplication algorithms. In
*Proceedings of First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing (1998 IPPS/SPDP ‘88)*, pages 110–116, 1998.CrossRefGoogle Scholar - [11]John A. Gunnels, Greg M. Henry, and Robert A. van de Geijn. Formal linear algebra methods environment (flame):overview. FLAME Working Note #1 CS-TR-00–28, Department of Computer Sciences, The University of Texas at Austin, NOV 2000.Google Scholar
- [12]F. Gustayson, A. Henriksson, I. Jonsson, B. Kâgström, and P. Ling. Recursive blocked data formats and bias’s for dense linear algebra algorithms. In B. Kâgström et al., editor,
*Applied Parallel Computing*,*Large Scale Scientific and Industrial Problems*, volume 1541 of*Lecture Notes in Computer Science*, pages 195–206. Springer-Verlag, 1998.Google Scholar - [13]F. G. Gustayson. Recursion leads to automatic variable blocking for dense linear-algebra algorithms.
*IBM Journal of Research and Development*, 41 (6): 737–755, November 1997.CrossRefGoogle Scholar - [14]B. Kagstrom, P. Ling, and C. Van Loan. GEMM-based level 3 BLAS: High performance model implementations and performance evaluation benchmark.
*TOMS*, 24 (3): 268–302, 1998.CrossRefGoogle Scholar - [15]C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic linear algebra subprograms for Fortran usage.
*ACM Trans. Math. Soft.*, 5 (3): 308–323, Sept. 1979.CrossRefMATHGoogle Scholar - [16]Wesley C. Reiley and Robert A. van de Geijn. POOCLAPACK: Parallel Out-of-Core Linear Algebra Package. Technical Report CS-TR-99–33, Department of Computer Sciences, The University of Texas at Austin, Nov. 1999.Google Scholar
- [17]B. T. Smith et al.
*Matrix Eigensystem Routines - EISPACK Guide*. Lecture Notes in Computer Science 6. Springer-Verlag, New York, second edition, 1976.CrossRefGoogle Scholar - [18]Marc Snir, Steve W. Otto, Steven Huss-Lederman, David W. Walker, and Jack Dongarra.
*MPI: The Complete Reference*. The MIT Press, 1996.Google Scholar - [19]G. W. Stewart.
*Matrix Algorithms Volume 1: Basic Decompositions*. SIAM, 1998.Google Scholar - [20]Robert A. van de Geijn.
*Using PLAPACK: Parallel Linear Algebra Package*The MIT Press, 1997.Google Scholar