Prospectus for the Next LAPACK and ScaLAPACK Libraries

Demmel, James W.; Dongarra, Jack; Parlett, Beresford; Kahan, William; Gu, Ming; Bindel, David; Hida, Yozo; Li, Xiaoye; Marques, Osni; Riedy, E. Jason; Vömel, Christof; Langou, Julien; Luszczek, Piotr; Kurzak, Jakub; Buttari, Alfredo; Langou, Julie; Tomov, Stanimire

doi:10.1007/978-3-540-75755-9_2

James W. Demmel¹,
Jack Dongarra^2,3,
Beresford Parlett¹,
William Kahan¹,
Ming Gu¹,
David Bindel¹,
Yozo Hida¹,
Xiaoye Li¹,
Osni Marques¹,
E. Jason Riedy¹,
Christof Vömel¹,
Julien Langou²,
Piotr Luszczek²,
Jakub Kurzak²,
Alfredo Buttari²,
Julie Langou² &
…
Stanimire Tomov²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4699))

Included in the following conference series:

International Workshop on Applied Parallel Computing

1701 Accesses
3 Citations

Abstract

New releases of the widely used LAPACK and ScaLAPACK numerical linear algebra libraries are planned. Based on an on-going user survey (www.netlib.org/lapack-dev) and research by many people, we are proposing the following improvements: Faster algorithms, including better numerical methods, memory hierarchy optimizations, parallelism, and automatic performance tuning to accommodate new architectures; More accurate algorithms, including better numerical methods, and use of extra precision; Expanded functionality, including updating and downdating, new eigenproblems, etc. and putting more of LAPACK into ScaLAPACK; Improved ease of use, e.g., via friendlier interfaces in multiple languages. To accomplish these goals we are also relying on better software engineering techniques and contributions from collaborators at many institutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Steele, A., et al.: The Fortress language specification, version 0.707, research.sun.com/projects/plrg/fortress0707.pdf
Andersen, B.S., Wazniewski, J., Gustavson., F.G.: A recursive formulation of Cholesky factorization of a matrix in packed storage. ACM Trans. Math. Soft. 27(2), 214–244 (2001)
Article MATH Google Scholar
Anderson, E.: LAPACK3E (2003), http://www.netlib.org/lapack3e
Ashcraft, C., Grimes, R.G., Lewis, J.G.: Accurate symmetric indefinite linear equation solvers. SIAM J. Matrix Anal. Appl. 20(2), 513–561 (1998)
Article MATH MathSciNet Google Scholar
Bailey, D., Demmel, J., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kang, S., Kapur, A., Li, X., Martin, M., Thompson, B., Tung, T., Yoo, D.: Design, implementation and testing of extended and mixed precision BLAS. ACM Trans. Math. Soft. 28(2), 152–205 (2002)
Article Google Scholar
Barker, V., Blackford, S., Dongarra, J., Du Croz, J., Hammarling, S., Marinova, M., Wasniewski, J., Yalamov, P.: LAPACK95 Users’ Guide. SIAM (2001), http://www.netlib.org/lapack95
Barlow, J., Bosner, N., Drmač, Z.: A new stable bidiagonal reduction algorithm (2004), www.cse.psu.edu/~barlow/fastbidiag3.ps
Benner, P., Mehrmann, V., Sima, V., Van Huffel, S., Varga, A.: SLICOT - a subroutine library in systems and control theory. Applied and Computational Control, Signals, and Circuits 1, 499–539 (1999)
Google Scholar
Bientinisi, P., Dhillon, I.S., van de Geijn, R.: A parallel eigensolver for dense symmetric matrices based on multiple relatively robust representations. Technial Report TR-03-26, Computer Science Dept., University of Texas (2003)
Google Scholar
Bini, D., Eidelman, Y., Gemignani, L., Gohberg, I.: Fast QR algorithms for Hessenberg matrices which are rank-1 perturbations of unitary matrices. Dept. of Mathematics report 1587, University of Pisa, Italy (2005), http://www.dm.unipi.it/~gemignani/papers/begg.ps
Bischof, C.H., Lang, B., Sun, X.: A framework for symmetric band reduction. ACM Trans. Math. Soft. 26(4), 581–601 (2000)
Article MathSciNet Google Scholar
Blackford, L.S., Choi, J., Cleary, A., Demmel, J., Dhillon, I., Dongarra, J.J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D.W., Whaley, R.C.: Scalapack prototype software. Netlib, Oak Ridge National Laboratory (1997)
Google Scholar
Blackford, L.S., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Kaufman, L., Lumsdaine, A., Petitet, A., Pozo, R., Remington, K., Whaley, R.C.: An updated set of Basic Linear Algebra Subroutines (BLAS). ACM Trans. Math. Soft., 28(2) (June 2002)
Google Scholar
Blackford, L.S., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Kaufman, L., Lumsdaine, A., Petitet, A., Pozo, R., Remington, K., Whaley, R.C., Maany, Z., Krough, F., Corliss, G., Hu, C., Keafott, B., Walster, W., Gudenberg, J.W.v.: Basic Linear Algebra Subprograms Techical (BLAST) Forum Standard. Intern. J. High Performance Comput. 15(3-4) (2001)
Google Scholar
Blackford, S., Corliss, G., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Hu, C., Kahan, W., Kaufman, L., Kearfott, B., Krogh, F., Li, X., Maany, Z., Petitet, A., Pozo, R., Remington, K., Walster, W., Whaley, C., Gudenberg, J.W.v., Lumsdaine, A.: Basic Linear Algebra Subprograms Technical (BLAST) Forum Standard. Intern. J. High Performance Comput. 15(3-4), 305 (2001), also available at www.netlib.org/blas/blast-forum/
Google Scholar
Braman, K., Byers, R., Mathias, R.: The multishift QR algorithm. Part I: Maintaining well-focused shifts and Level 3 performance. SIAM J. Matrix Anal. Appl. 23(4), 929–947 (2001)
Article MathSciNet Google Scholar
Braman, K., Byers, R., Mathias, R.: The multishift QR algorithm. Part II: Aggressive early deflation. SIAM J. Matrix Anal. Appl. 23(4), 948–973 (2001)
Article MathSciNet Google Scholar
Callahan, D., Chamberlain, B., Zima, H.: The Cascade high-productivity language. In: 9th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2004), pp. 52–60. IEEE Computer Society Press, Los Alamitos (2004), www.gwu.edu/~upc/publications/productivity.pdf
Chapter Google Scholar
Cantonnet, F., Yao, Y., Zahran, M., El-Ghazawi, T.: Productivity analysis of the UPC language. In: IPDPS 2004 PMEO workshop (2004), www.gwu.edu/~upc/publications/productivity.pdf
Chandrasekaran, S., Gu, M.: Fast and stable algorithms for banded plus semiseparable systems of linear equations. SIAM J. Matrix Anal. Appl. 25(2), 373–384 (2003)
Article MATH MathSciNet Google Scholar
CLAPACK: LAPACK in C, http://www.netlib.org/clapack/
Coarfa, C., Dotsenko, Y., Mellor-Crummey, J., Chavarria-Miranda, D., Contonnet, F., El-Ghazawi, T., Mohanti, A., Yao, Y.: An evaluation of global address space languages: Co-Array Fortran and Unified Parallel C. In: Proc. 10th ACM SIGPLAN Symp. on Principles and Practice and Parallel Programming (PPoPP 2005), ACM Press, New York (2005), www.hipersoft.rice.edu/caf/publications/index.html
Google Scholar
Davies, P., Higham, N.J.: A Schur-Parlett algorithm for computing matrix functions. SIAM J. Matrix Anal. Appl. 25(2), 464–485 (2003)
Article MATH MathSciNet Google Scholar
Demmel, J., Hida, Y., Kahan, W., Li, X.S., Mukherjee, S., Riedy, E.J.: Error bounds from extra precise iterative refinement. ACM TOMS 32(2), 325–351 (2006)
Article MathSciNet Google Scholar
Dhillon, I.S.: Reliable computation of the condition number of a tridiagonal matrix in O(n) time. SIAM J. Matrix Anal. Appl. 19(3), 776–796 (1998)
Article MATH MathSciNet Google Scholar
Dongarra, J., Bunch, J., Moler, C., Stewart, G.W.: LINPACK User’s Guide. SIAM, Philadelphia, PA (1979)
Google Scholar
Dongarra, J., D’Azevedo, E.: The design and implementation of the parallel out-of-core ScaLAPACK LU, QR, and Cholesky factorization routines. Computer Science Dept. Technical Report CS-97-347, University of Tennessee, Knoxville, TN (January 1997), http://www.netlib.org/lapack/lawns/lawn118.ps
Dongarra, J., Hammarling, S., Walker, D.: Key concepts for parallel out-of-core LU factorization. Computer Science Dept. Technical Report CS-96-324, University of Tennessee, Knoxville, TN (April 1996), www.netlib.org/lapack/lawns/lawn110.ps
Dongarra, J., Pozo, R., Walker, D.: Lapack++: A design overview of ovject-oriented extensions for high performance linear algebra. In: Supercomputing 1993, IEEE Computer Society Press, Los Alamitos (1993), math.nist.gov/lapack++
Google Scholar
Dongarra, J.J., Duff, I.S., Sorensen, D.C., van der Vorst, H.A.: Numerical Linear Algebra for High-Performance Computers. SIAM, Philadelphia, PA (1998)
Google Scholar
Dongarra, J.J., Luszczek, P., Petitet, A.: The LINPACK Benchmark: past, present and future. Concurrency Computat.: Pract. Exper. 15, 803–820 (2003)
Article Google Scholar
Dopico, F.M., Molera, J.M., Moro, J.: An orthogonal high relative accuracy algorithm for the symmetric eigenproblem. SIAM. J. Matrix Anal. Appl. 25(2), 301–351 (2003)
Article MATH MathSciNet Google Scholar
Drmač, Z., Veselić, K.: New fast and accurate Jacobi SVD algorithm. Technical report, Dept. of Mathematics, University of Zagreb (2004)
Google Scholar
Duff, I.S., Vömel, C.: Incremental Norm Estimation for Dense and Sparse Matrices. BIT 42(2), 300–322 (2002)
Article MATH MathSciNet Google Scholar
Elmroth, E., Gustavson, F., Jonsson, I., Kågström, B.: Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Review 46(1), 3–45 (2004)
Article MATH MathSciNet Google Scholar
f2c: Fortran-to-C translator, http://www.netlib.org/f2c
Fulton, C., Howell, G., Demmel, J., Hammarling, S.: Cache-efficient bidiagonalization using BLAS 2.5 operators, p. 28 (2004) (in progress)
Google Scholar
Golub, G., Van Loan, C.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Graham, S., Snir, M., Patterson, C. (eds.): Getting up to Speed: The Future of Supercomputing. National Research Council (2005)
Google Scholar
Granat, R., Jonsson, I., Kågström, B.: Combining Explicit and Recursive Blocking for Solving Triangular Sylvester-Type Matrix Equations in Distrubuted Memory Platforms. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 742–750. Springer, Heidelberg (2004)
Google Scholar
Grosser, B.: Ein paralleler und hochgenauer O(n ²) Algorithmus für die bidiagonale Singulärwertzerlegung. PhD thesis, University of Wuppertal, Wuppertal, Germany (2001)
Google Scholar
Gunnels, J.A., Gustavson, F.G., Henry, G.M., van de Geijn, R.A.: FLAME: Formal Linear Algebra Methods Environment. ACM Trans. Math. Soft. 27(4), 422–455 (2001)
Article MATH Google Scholar
Hargreaves, G.I.: Computing the condition number of tridiagonal and diagonal-plus-semiseparable matrices in linear time. Technical Report submitted, Department of Mathematics, University of Manchester, Manchester, England (2004)
Google Scholar
Higham, N.J.: Analysis of the Cholesky decomposition of a semi-definite matrix. In: Cox, M.G., Hammarling, S. (eds.) Reliable Numerical Computation. ch. 9, pp. 161–186. Clarendon Press, Oxford (1990)
Google Scholar
High productivity computing systems (hpcs), http://www.highproductivity.org
IEEE Standard for Binary Floating Point Arithmetic Revision (2002), grouper.ieee.org/groups/754
JLAPACK: LAPACK in Java, http://icl.cs.utk.edu/f2j
Jonsson, I., Kågström, B.: Recursive blocked algorithms for solving triangular systems. I. one-sided and coupled Sylvester-type matrix equations. ACM Trans. Math. Software 28(4), 392–415 (2002)
Article MATH MathSciNet Google Scholar
Jonsson, I., Kågström, B.: Recursive blocked algorithms for solving triangular systems. II. Two-sided and generalized Sylvester and Lyapunov matrix equations. ACM Trans. Math. Software 28(4), 416–435 (2002)
Article MATH MathSciNet Google Scholar
Kågström, B., Kressner, D.: Multishift Variants of the QZ Algorithm with Aggressive Early Deflation. SIAM J. Matrix Anal. Appl. 29(1), 199–227 (2006)
Article MathSciNet Google Scholar
LAPACK Contributor Webpage, http://www.netlib.org/lapack-dev/contributions.html
Li, X.S., Demmel, J.W., Bailey, D.H., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kang, S.Y., Kapur, A., Martin, M.C., Thompson, B.J., Tung, T., Yoo, D.J.: Design, implementation and testing of extended and mixed precision BLAS. ACM Trans. Math. Soft. 28(2), 152–205 (2002)
Article Google Scholar
Menon, V., Pingali, K.: Look left, look right, look left again: An application of fractal symbolic analysis to linear algebra code restructuring. Int. J. Parallel Comput. 32(6), 501–523 (2004)
Article MATH Google Scholar
Nishtala, R., Chakrabarti, K., Patel, N., Sanghavi, K., Demmel, J., Yelick, K., Brewer, E.: Automatic tuning of collective communications in MPI. In: Poster at SIAM Conf. on Parallel Proc., San Francisco, www.cs.berkeley.edu/~rajeshn/poster_draft_6.ppt
Numrich, R., Reid, J.: Co-array Fortran for parallel programming. Fortran Forum, 17 (1998)
Google Scholar
OSKI: Optimized Sparse Kernel Interface, http://bebop.cs.berkeley.edu/oski/
Parlett, B.N., Dhillon, I.S.: Orthogonal eigenvectors and relative gaps. SIAM J. Matrix Anal. Appl. 25(3), 858–899 (2004)
Article MATH MathSciNet Google Scholar
Parlett, B.N., Vömel, C.: Tight clusters of glued matrices and the shortcomings of computing orthogonal eigenvectors by multiple relatively robust representations. University of California, Berkeley, 2004 (in preparation)
Google Scholar
Ralha, R.: One-sided reduction to bidiagonal form. Lin. Alg. Appl. 358, 219–238 (2003)
Article MATH MathSciNet Google Scholar
Saraswat, V.: Report on the experimental language X10, v0.41. IBM Research technical report (2005)
Google Scholar
Slapničar, I.: Highly accurate symmetric eigenvalue decomposition and hyperbolic SVD. Lin. Alg. Appl. 358, 387–424 (2002)
Article Google Scholar
Strazdins, P.E.: A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Int. J. Parallel Distrib. Systems Networks 4(1), 26–35 (2001)
Google Scholar
Tisseur, F., Meerbergen, K.: A survey of the quadratic eigenvalue problem. SIAM Review 43, 234–286 (2001)
Article MathSciNet Google Scholar
TNT: Template Numerical Toolkit, http://math.nist.gov/tnt
Vadhiyar, S.S., Fagg, G.E., Dongarra, J.: Towards an accurate model for collective communications. Intern. J. High Perf. Comp. Appl., special issue on Performance Tuning 18(1), 159–167 (2004)
Article Google Scholar
Vandebril, R., Van Barel, M., Mastronardi, M.: An implicit QR algorithm for semiseparable matrices to compute the eigendecomposition of symmetric matrices. Report TW 367, Department of Computer Science, K.U. Leuven, Leuven, Belgium (2003)
Google Scholar
Vuduc, R., Demmel, J., Bilmes, J.: Statistical models for automatic performance tuning. In: Intern. Conf. Comput. Science (May 2001)
Google Scholar
Whaley, R.C., Dongarra, J.: The ATLAS WWW home page, http://www.netlib.org/atlas/
Whaley, R.C., Petitet, A., Dongarra, J.: Automated empirical optimization of software and the ATLAS project. Parallel Computing 27(1-2), 3–25 (2001)
Article MATH Google Scholar
Willems, P.: personal communication (2006)
Google Scholar
Yelick, K., Semenzato, L., Pike, G., Miyamoto, C., Liblit, B., Krishnamurthy, A., Hilfinger, P., Graham, S., Gay, D., Colella, P., Aiken, A.: Titanium: A high-performnace Java dialect. Concurrency: Practice and Experience 10, 825–836 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Berkeley CA 94720, USA
James W. Demmel, Beresford Parlett, William Kahan, Ming Gu, David Bindel, Yozo Hida, Xiaoye Li, Osni Marques, E. Jason Riedy & Christof Vömel
University of Tennessee, Knoxville TN 37996, USA
Jack Dongarra, Julien Langou, Piotr Luszczek, Jakub Kurzak, Alfredo Buttari, Julie Langou & Stanimire Tomov
Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
Jack Dongarra

Authors

James W. Demmel
View author publications
You can also search for this author in PubMed Google Scholar
Jack Dongarra
View author publications
You can also search for this author in PubMed Google Scholar
Beresford Parlett
View author publications
You can also search for this author in PubMed Google Scholar
William Kahan
View author publications
You can also search for this author in PubMed Google Scholar
Ming Gu
View author publications
You can also search for this author in PubMed Google Scholar
David Bindel
View author publications
You can also search for this author in PubMed Google Scholar
Yozo Hida
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoye Li
View author publications
You can also search for this author in PubMed Google Scholar
Osni Marques
View author publications
You can also search for this author in PubMed Google Scholar
E. Jason Riedy
View author publications
You can also search for this author in PubMed Google Scholar
Christof Vömel
View author publications
You can also search for this author in PubMed Google Scholar
Julien Langou
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Luszczek
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Kurzak
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Buttari
View author publications
You can also search for this author in PubMed Google Scholar
Julie Langou
View author publications
You can also search for this author in PubMed Google Scholar
Stanimire Tomov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bo Kågström Erik Elmroth Jack Dongarra Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Demmel, J.W. et al. (2007). Prospectus for the Next LAPACK and ScaLAPACK Libraries. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2006. Lecture Notes in Computer Science, vol 4699. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75755-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-75755-9_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75754-2
Online ISBN: 978-3-540-75755-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics