The Relevance of New Data Structure Approaches for Dense Linear Algebra in the New Multi-Core / Many Core Environments

  • Fred G. Gustavson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4967)

Abstract

For about ten years now, Bo Kågström’s Group in Umea, Sweden, Jerzy Waśniewski’s Team at Danish Technical University in Lyngby, Denmark, and I at IBM Research in Yorktown Heights have been applying recursion and New Data Structures (NDS) to increase the performance of Dense Linear Algebra (DLA) factorization algorithms. Later, John Gunnels, and later still, Jim Sexton, both now at IBM Research also began working in this area. For about three years now almost all computer manufacturers have dramatically changed their computer architectures which they call Multi-Core, (MC). It turns out that these new designs give poor performance for the traditional designs of DLA libraries such as LAPACK and ScaLAPACK. Recent results of Jack Dongarra’s group at the Innovative Computing Laboratory in Knoxville, Tennessee have shown how to obtain high performance for DLA factorization algorithms on the Cell architecture, an example of an MC processor, but only when they used NDS. In this talk we will give some reasons why this is so.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agarwal, R.C., Gustavson, F.G., Zubair, M.: Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM Journal of Research and Development 38(5), 563–576 (1994)Google Scholar
  2. 2.
    Andersen, B.S., Gustavson, F.G., Waśniewski, J.: A Recursive Formulation of Cholesky Factorization of a Matrix in Packed Storage. ACM TOMS 27(2), 214–244 (2001)MATHCrossRefGoogle Scholar
  3. 3.
    Andersen, B.S., Gunnels, J.A., Gustavson, F.G., Reid, J.K., Waśniewski, J.: A Fully Portable High Performance Minimal Storage Hybrid Cholesky Algorithm. ACM TOMS 31(2), 201–227 (2005)MATHCrossRefGoogle Scholar
  4. 4.
    Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., Sorensen, D.: LAPACK Users’ Guide Release 3.0. SIAM, Philadelphia (1999), http://www.netlib.org/lapack/lug/lapack_lug.html Google Scholar
  5. 5.
    Chatterjee, S., et al.: Design and Exploitation of a High-performance SIMD Floating-point Unit for Blue Gene/L. IBM Journal of Research and Development 49(2-3), 377–391 (2005)Google Scholar
  6. 6.
    Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.: A Set of Level 3 Basic Linear Algebra Subprograms. TOMS 16(1), 1–17 (1990)MATHCrossRefGoogle Scholar
  7. 7.
    Elmroth, E., Gustavson, F.G., Jonsson, I., Kågström, B.: Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software. SIAM Review 46(1), 3–45 (2004)MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Golub, G., Van Loan, C.: Matrix Computations, 3rd edn. John Hopkins Press, Baltimore and London (1996)MATHGoogle Scholar
  9. 9.
    Gustavson, F.G.: Recursion Leads to Automatic Variable Blocking for Dense Linear-Algebra Algorithms. IBM Journal of Research and Development 41(6), 737–755 (1997)Google Scholar
  10. 10.
    Gustavson, F.G., Jonsson, I.: Minimal Storage High Performance Cholesky via Blocking and Recursion. IBM Journal of Research and Development 44(6), 823–849 (2000)Google Scholar
  11. 11.
    Gustavson, F.G.: New Generalized Data Structures for Matrices Lead to a Variety of High Performance Linear Algebra Algorithms. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2001. LNCS, vol. 2328, pp. 418–436. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  12. 12.
    Gustavson, F.G.: High Performance Linear Algebra Algorithms using New Generalized Data Structures for Matrices. IBM Journal of Research and Development 47(1), 31–55 (2003)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Gustavson, F.G.: New Generalized Data Structures for Matrices Lead to a Variety of High performance Dense Linear Algorithms. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds.) PARA 2004. LNCS, vol. 3732, pp. 11–20. Springer, Heidelberg (2006)Google Scholar
  14. 14.
    Gustavson, F.G., Gunnels, J., Sexton, J.: Minimal Data Copy For Dense Linear Algebra Factorization. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 540–549. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  15. 15.
    Gustavson, F.G., Waśniewski, J.: LAPACK Cholesky routines in rectangular full packed format. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 570–579. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  16. 16.
    IBM. IBM Engineering and Scientific Subroutine Library for AIX Version 3, Release 3. IBM Pub. No. SA22-7272-00 (February 1986)Google Scholar
  17. 17.
    Park, N., Hong, B., Prasanna, V.: Tiling, Block Data Layout, and Memory Hierarchy Performance. IEEE Trans. Parallel and Distributed Systems 14(7), 640–654 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Fred G. Gustavson
    • 1
  1. 1.IBM T.J. Watson Research CenterUSA

Personalised recommendations