Prospectus for the Next LAPACK and ScaLAPACK Libraries

  • James W. Demmel
  • Jack Dongarra
  • Beresford Parlett
  • William Kahan
  • Ming Gu
  • David Bindel
  • Yozo Hida
  • Xiaoye Li
  • Osni Marques
  • E. Jason Riedy
  • Christof Vömel
  • Julien Langou
  • Piotr Luszczek
  • Jakub Kurzak
  • Alfredo Buttari
  • Julie Langou
  • Stanimire Tomov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4699)

Abstract

New releases of the widely used LAPACK and ScaLAPACK numerical linear algebra libraries are planned. Based on an on-going user survey (www.netlib.org/lapack-dev) and research by many people, we are proposing the following improvements: Faster algorithms, including better numerical methods, memory hierarchy optimizations, parallelism, and automatic performance tuning to accommodate new architectures; More accurate algorithms, including better numerical methods, and use of extra precision; Expanded functionality, including updating and downdating, new eigenproblems, etc. and putting more of LAPACK into ScaLAPACK; Improved ease of use, e.g., via friendlier interfaces in multiple languages. To accomplish these goals we are also relying on better software engineering techniques and contributions from collaborators at many institutions.

References

  1. 1.
    Steele, A., et al.: The Fortress language specification, version 0.707, research.sun.com/projects/plrg/fortress0707.pdf
  2. 2.
    Andersen, B.S., Wazniewski, J., Gustavson., F.G.: A recursive formulation of Cholesky factorization of a matrix in packed storage. ACM Trans. Math. Soft. 27(2), 214–244 (2001)MATHCrossRefGoogle Scholar
  3. 3.
    Anderson, E.: LAPACK3E (2003), http://www.netlib.org/lapack3e
  4. 4.
    Ashcraft, C., Grimes, R.G., Lewis, J.G.: Accurate symmetric indefinite linear equation solvers. SIAM J. Matrix Anal. Appl. 20(2), 513–561 (1998)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Bailey, D., Demmel, J., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kang, S., Kapur, A., Li, X., Martin, M., Thompson, B., Tung, T., Yoo, D.: Design, implementation and testing of extended and mixed precision BLAS. ACM Trans. Math. Soft. 28(2), 152–205 (2002)CrossRefGoogle Scholar
  6. 6.
    Barker, V., Blackford, S., Dongarra, J., Du Croz, J., Hammarling, S., Marinova, M., Wasniewski, J., Yalamov, P.: LAPACK95 Users’ Guide. SIAM (2001), http://www.netlib.org/lapack95
  7. 7.
    Barlow, J., Bosner, N., Drmač, Z.: A new stable bidiagonal reduction algorithm (2004), www.cse.psu.edu/~barlow/fastbidiag3.ps
  8. 8.
    Benner, P., Mehrmann, V., Sima, V., Van Huffel, S., Varga, A.: SLICOT - a subroutine library in systems and control theory. Applied and Computational Control, Signals, and Circuits 1, 499–539 (1999)Google Scholar
  9. 9.
    Bientinisi, P., Dhillon, I.S., van de Geijn, R.: A parallel eigensolver for dense symmetric matrices based on multiple relatively robust representations. Technial Report TR-03-26, Computer Science Dept., University of Texas (2003)Google Scholar
  10. 10.
    Bini, D., Eidelman, Y., Gemignani, L., Gohberg, I.: Fast QR algorithms for Hessenberg matrices which are rank-1 perturbations of unitary matrices. Dept. of Mathematics report 1587, University of Pisa, Italy (2005), http://www.dm.unipi.it/~gemignani/papers/begg.ps
  11. 11.
    Bischof, C.H., Lang, B., Sun, X.: A framework for symmetric band reduction. ACM Trans. Math. Soft. 26(4), 581–601 (2000)CrossRefMathSciNetGoogle Scholar
  12. 12.
    Blackford, L.S., Choi, J., Cleary, A., Demmel, J., Dhillon, I., Dongarra, J.J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D.W., Whaley, R.C.: Scalapack prototype software. Netlib, Oak Ridge National Laboratory (1997)Google Scholar
  13. 13.
    Blackford, L.S., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Kaufman, L., Lumsdaine, A., Petitet, A., Pozo, R., Remington, K., Whaley, R.C.: An updated set of Basic Linear Algebra Subroutines (BLAS). ACM Trans. Math. Soft., 28(2) (June 2002)Google Scholar
  14. 14.
    Blackford, L.S., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Kaufman, L., Lumsdaine, A., Petitet, A., Pozo, R., Remington, K., Whaley, R.C., Maany, Z., Krough, F., Corliss, G., Hu, C., Keafott, B., Walster, W., Gudenberg, J.W.v.: Basic Linear Algebra Subprograms Techical (BLAST) Forum Standard. Intern. J. High Performance Comput. 15(3-4) (2001)Google Scholar
  15. 15.
    Blackford, S., Corliss, G., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Hu, C., Kahan, W., Kaufman, L., Kearfott, B., Krogh, F., Li, X., Maany, Z., Petitet, A., Pozo, R., Remington, K., Walster, W., Whaley, C., Gudenberg, J.W.v., Lumsdaine, A.: Basic Linear Algebra Subprograms Technical (BLAST) Forum Standard. Intern. J. High Performance Comput. 15(3-4), 305 (2001), also available at www.netlib.org/blas/blast-forum/ Google Scholar
  16. 16.
    Braman, K., Byers, R., Mathias, R.: The multishift QR algorithm. Part I: Maintaining well-focused shifts and Level 3 performance. SIAM J. Matrix Anal. Appl. 23(4), 929–947 (2001)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Braman, K., Byers, R., Mathias, R.: The multishift QR algorithm. Part II: Aggressive early deflation. SIAM J. Matrix Anal. Appl. 23(4), 948–973 (2001)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Callahan, D., Chamberlain, B., Zima, H.: The Cascade high-productivity language. In: 9th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2004), pp. 52–60. IEEE Computer Society Press, Los Alamitos (2004), www.gwu.edu/~upc/publications/productivity.pdf CrossRefGoogle Scholar
  19. 19.
    Cantonnet, F., Yao, Y., Zahran, M., El-Ghazawi, T.: Productivity analysis of the UPC language. In: IPDPS 2004 PMEO workshop (2004), www.gwu.edu/~upc/publications/productivity.pdf
  20. 20.
    Chandrasekaran, S., Gu, M.: Fast and stable algorithms for banded plus semiseparable systems of linear equations. SIAM J. Matrix Anal. Appl. 25(2), 373–384 (2003)MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    CLAPACK: LAPACK in C, http://www.netlib.org/clapack/
  22. 22.
    Coarfa, C., Dotsenko, Y., Mellor-Crummey, J., Chavarria-Miranda, D., Contonnet, F., El-Ghazawi, T., Mohanti, A., Yao, Y.: An evaluation of global address space languages: Co-Array Fortran and Unified Parallel C. In: Proc. 10th ACM SIGPLAN Symp. on Principles and Practice and Parallel Programming (PPoPP 2005), ACM Press, New York (2005), www.hipersoft.rice.edu/caf/publications/index.html Google Scholar
  23. 23.
    Davies, P., Higham, N.J.: A Schur-Parlett algorithm for computing matrix functions. SIAM J. Matrix Anal. Appl. 25(2), 464–485 (2003)MATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Demmel, J., Hida, Y., Kahan, W., Li, X.S., Mukherjee, S., Riedy, E.J.: Error bounds from extra precise iterative refinement. ACM TOMS 32(2), 325–351 (2006)CrossRefMathSciNetGoogle Scholar
  25. 25.
    Dhillon, I.S.: Reliable computation of the condition number of a tridiagonal matrix in O(n) time. SIAM J. Matrix Anal. Appl. 19(3), 776–796 (1998)MATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Dongarra, J., Bunch, J., Moler, C., Stewart, G.W.: LINPACK User’s Guide. SIAM, Philadelphia, PA (1979)Google Scholar
  27. 27.
    Dongarra, J., D’Azevedo, E.: The design and implementation of the parallel out-of-core ScaLAPACK LU, QR, and Cholesky factorization routines. Computer Science Dept. Technical Report CS-97-347, University of Tennessee, Knoxville, TN (January 1997), http://www.netlib.org/lapack/lawns/lawn118.ps
  28. 28.
    Dongarra, J., Hammarling, S., Walker, D.: Key concepts for parallel out-of-core LU factorization. Computer Science Dept. Technical Report CS-96-324, University of Tennessee, Knoxville, TN (April 1996), www.netlib.org/lapack/lawns/lawn110.ps
  29. 29.
    Dongarra, J., Pozo, R., Walker, D.: Lapack++: A design overview of ovject-oriented extensions for high performance linear algebra. In: Supercomputing 1993, IEEE Computer Society Press, Los Alamitos (1993), math.nist.gov/lapack++ Google Scholar
  30. 30.
    Dongarra, J.J., Duff, I.S., Sorensen, D.C., van der Vorst, H.A.: Numerical Linear Algebra for High-Performance Computers. SIAM, Philadelphia, PA (1998)Google Scholar
  31. 31.
    Dongarra, J.J., Luszczek, P., Petitet, A.: The LINPACK Benchmark: past, present and future. Concurrency Computat.: Pract. Exper. 15, 803–820 (2003)CrossRefGoogle Scholar
  32. 32.
    Dopico, F.M., Molera, J.M., Moro, J.: An orthogonal high relative accuracy algorithm for the symmetric eigenproblem. SIAM. J. Matrix Anal. Appl. 25(2), 301–351 (2003)MATHCrossRefMathSciNetGoogle Scholar
  33. 33.
    Drmač, Z., Veselić, K.: New fast and accurate Jacobi SVD algorithm. Technical report, Dept. of Mathematics, University of Zagreb (2004)Google Scholar
  34. 34.
    Duff, I.S., Vömel, C.: Incremental Norm Estimation for Dense and Sparse Matrices. BIT 42(2), 300–322 (2002)MATHCrossRefMathSciNetGoogle Scholar
  35. 35.
    Elmroth, E., Gustavson, F., Jonsson, I., Kågström, B.: Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Review 46(1), 3–45 (2004)MATHCrossRefMathSciNetGoogle Scholar
  36. 36.
    f2c: Fortran-to-C translator, http://www.netlib.org/f2c
  37. 37.
    Fulton, C., Howell, G., Demmel, J., Hammarling, S.: Cache-efficient bidiagonalization using BLAS 2.5 operators, p. 28 (2004) (in progress)Google Scholar
  38. 38.
    Golub, G., Van Loan, C.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)MATHGoogle Scholar
  39. 39.
    Graham, S., Snir, M., Patterson, C. (eds.): Getting up to Speed: The Future of Supercomputing. National Research Council (2005)Google Scholar
  40. 40.
    Granat, R., Jonsson, I., Kågström, B.: Combining Explicit and Recursive Blocking for Solving Triangular Sylvester-Type Matrix Equations in Distrubuted Memory Platforms. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 742–750. Springer, Heidelberg (2004)Google Scholar
  41. 41.
    Grosser, B.: Ein paralleler und hochgenauer O(n 2) Algorithmus für die bidiagonale Singulärwertzerlegung. PhD thesis, University of Wuppertal, Wuppertal, Germany (2001)Google Scholar
  42. 42.
    Gunnels, J.A., Gustavson, F.G., Henry, G.M., van de Geijn, R.A.: FLAME: Formal Linear Algebra Methods Environment. ACM Trans. Math. Soft. 27(4), 422–455 (2001)MATHCrossRefGoogle Scholar
  43. 43.
    Hargreaves, G.I.: Computing the condition number of tridiagonal and diagonal-plus-semiseparable matrices in linear time. Technical Report submitted, Department of Mathematics, University of Manchester, Manchester, England (2004)Google Scholar
  44. 44.
    Higham, N.J.: Analysis of the Cholesky decomposition of a semi-definite matrix. In: Cox, M.G., Hammarling, S. (eds.) Reliable Numerical Computation. ch. 9, pp. 161–186. Clarendon Press, Oxford (1990)Google Scholar
  45. 45.
    High productivity computing systems (hpcs), http://www.highproductivity.org
  46. 46.
    IEEE Standard for Binary Floating Point Arithmetic Revision (2002), grouper.ieee.org/groups/754
  47. 47.
    JLAPACK: LAPACK in Java, http://icl.cs.utk.edu/f2j
  48. 48.
    Jonsson, I., Kågström, B.: Recursive blocked algorithms for solving triangular systems. I. one-sided and coupled Sylvester-type matrix equations. ACM Trans. Math. Software 28(4), 392–415 (2002)MATHCrossRefMathSciNetGoogle Scholar
  49. 49.
    Jonsson, I., Kågström, B.: Recursive blocked algorithms for solving triangular systems. II. Two-sided and generalized Sylvester and Lyapunov matrix equations. ACM Trans. Math. Software 28(4), 416–435 (2002)MATHCrossRefMathSciNetGoogle Scholar
  50. 50.
    Kågström, B., Kressner, D.: Multishift Variants of the QZ Algorithm with Aggressive Early Deflation. SIAM J. Matrix Anal. Appl. 29(1), 199–227 (2006)CrossRefMathSciNetGoogle Scholar
  51. 51.
  52. 52.
    Li, X.S., Demmel, J.W., Bailey, D.H., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kang, S.Y., Kapur, A., Martin, M.C., Thompson, B.J., Tung, T., Yoo, D.J.: Design, implementation and testing of extended and mixed precision BLAS. ACM Trans. Math. Soft. 28(2), 152–205 (2002)CrossRefGoogle Scholar
  53. 53.
    Menon, V., Pingali, K.: Look left, look right, look left again: An application of fractal symbolic analysis to linear algebra code restructuring. Int. J. Parallel Comput. 32(6), 501–523 (2004)MATHCrossRefGoogle Scholar
  54. 54.
    Nishtala, R., Chakrabarti, K., Patel, N., Sanghavi, K., Demmel, J., Yelick, K., Brewer, E.: Automatic tuning of collective communications in MPI. In: Poster at SIAM Conf. on Parallel Proc., San Francisco, www.cs.berkeley.edu/~rajeshn/poster_draft_6.ppt
  55. 55.
    Numrich, R., Reid, J.: Co-array Fortran for parallel programming. Fortran Forum, 17 (1998)Google Scholar
  56. 56.
    OSKI: Optimized Sparse Kernel Interface, http://bebop.cs.berkeley.edu/oski/
  57. 57.
    Parlett, B.N., Dhillon, I.S.: Orthogonal eigenvectors and relative gaps. SIAM J. Matrix Anal. Appl. 25(3), 858–899 (2004)MATHCrossRefMathSciNetGoogle Scholar
  58. 58.
    Parlett, B.N., Vömel, C.: Tight clusters of glued matrices and the shortcomings of computing orthogonal eigenvectors by multiple relatively robust representations. University of California, Berkeley, 2004 (in preparation)Google Scholar
  59. 59.
    Ralha, R.: One-sided reduction to bidiagonal form. Lin. Alg. Appl. 358, 219–238 (2003)MATHCrossRefMathSciNetGoogle Scholar
  60. 60.
    Saraswat, V.: Report on the experimental language X10, v0.41. IBM Research technical report (2005)Google Scholar
  61. 61.
    Slapničar, I.: Highly accurate symmetric eigenvalue decomposition and hyperbolic SVD. Lin. Alg. Appl. 358, 387–424 (2002)CrossRefGoogle Scholar
  62. 62.
    Strazdins, P.E.: A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Int. J. Parallel Distrib. Systems Networks 4(1), 26–35 (2001)Google Scholar
  63. 63.
    Tisseur, F., Meerbergen, K.: A survey of the quadratic eigenvalue problem. SIAM Review 43, 234–286 (2001)CrossRefMathSciNetGoogle Scholar
  64. 64.
    TNT: Template Numerical Toolkit, http://math.nist.gov/tnt
  65. 65.
    Vadhiyar, S.S., Fagg, G.E., Dongarra, J.: Towards an accurate model for collective communications. Intern. J. High Perf. Comp. Appl., special issue on Performance Tuning 18(1), 159–167 (2004)CrossRefGoogle Scholar
  66. 66.
    Vandebril, R., Van Barel, M., Mastronardi, M.: An implicit QR algorithm for semiseparable matrices to compute the eigendecomposition of symmetric matrices. Report TW 367, Department of Computer Science, K.U. Leuven, Leuven, Belgium (2003)Google Scholar
  67. 67.
    Vuduc, R., Demmel, J., Bilmes, J.: Statistical models for automatic performance tuning. In: Intern. Conf. Comput. Science (May 2001)Google Scholar
  68. 68.
    Whaley, R.C., Dongarra, J.: The ATLAS WWW home page, http://www.netlib.org/atlas/
  69. 69.
    Whaley, R.C., Petitet, A., Dongarra, J.: Automated empirical optimization of software and the ATLAS project. Parallel Computing 27(1-2), 3–25 (2001)MATHCrossRefGoogle Scholar
  70. 70.
    Willems, P.: personal communication (2006)Google Scholar
  71. 71.
    Yelick, K., Semenzato, L., Pike, G., Miyamoto, C., Liblit, B., Krishnamurthy, A., Hilfinger, P., Graham, S., Gay, D., Colella, P., Aiken, A.: Titanium: A high-performnace Java dialect. Concurrency: Practice and Experience 10, 825–836 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • James W. Demmel
    • 1
  • Jack Dongarra
    • 2
    • 3
  • Beresford Parlett
    • 1
  • William Kahan
    • 1
  • Ming Gu
    • 1
  • David Bindel
    • 1
  • Yozo Hida
    • 1
  • Xiaoye Li
    • 1
  • Osni Marques
    • 1
  • E. Jason Riedy
    • 1
  • Christof Vömel
    • 1
  • Julien Langou
    • 2
  • Piotr Luszczek
    • 2
  • Jakub Kurzak
    • 2
  • Alfredo Buttari
    • 2
  • Julie Langou
    • 2
  • Stanimire Tomov
    • 2
  1. 1.University of California, Berkeley CA 94720USA
  2. 2.University of Tennessee, Knoxville TN 37996USA
  3. 3.Oak Ridge National Laboratory, Oak Ridge, TN 37831USA

Personalised recommendations