Advertisement

LU factorization with maximum performances on FPS architectures 38/64 bit

  • A. Corana
  • C. Martini
  • S. Ridella
  • C. Rolando
Session 9A: Algorithms, Architectures And Performance I
Part of the Lecture Notes in Computer Science book series (LNCS, volume 297)

Abstract

A technique for dense linear system solution is presented which reaches the maximum performances on attached processors like FPS-120, 5000 and X64 using the Fortran language with calls to the vector routines.

Starting from the Dongarra's LU factorization algorithm the key idea is to carry out a pseudo-transposition of the lower triangular matrix L (including the main diagonal) around the minor diagonal. The pseudo-transposition allows to carry out all the matrix vector operations involved in LU factorization with only stride 1 dot product operations which, using the TM Auxiliary Memory and the TMDOT routine, can be executed in the FPS processor obtaining the maximum speed.

Since the algorithm uses only vector instructions it is fully portable on all the FPS 38/64 bit machines and in general on all the vector computers with a similar memory structure. Furthermore the algorithm can be easily translated into the new FORTRAN 8X, which will probably become the standard for future SIMD computers for numerical applications.

The algorithm has been implemented on a FPS-100 yielding the asymptotic speed r=8 MegaFLOPS (FPS-100 peak performances) and the half performances length N1/2 = 235. The N1/2 value could be lowered by using the APAL Assembly Language to code some critical parts, losing however the code portability.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Dongarra J.J., Eisenstat S.C.: Squeezing the Most out of an Algorithm in CRAY FORTRAN, ACM Transaction on Mathematical Software, Vol.10, No. 3, September 1984, pages 219–230.Google Scholar
  2. [2]
    Dongarra J.J., Du Croz J. Hammarling S., Hanson R.J.: A Proposal for an Extended Set of Fortran Basic Linear Algebra Subprograms, Argonne National Laboratory, Mathematics and Computer Science Division, Technical Memorandum No. 41, December 1984.Google Scholar
  3. [3]
    Charlesworth A.E.: An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family. IEEE Computer, September 1981, Pages 18–27.Google Scholar
  4. [4]
    5000 FORTRAN 77 Manuals, FPS Technical Pubblication, 1986.Google Scholar
  5. [5]
    FPS-5000 APMATH Library Manual, FPS Technical Pubblication, 1985.Google Scholar
  6. [6]
    Dongarra J.J.: Performance of Various Computers Using Standard Linear Equations Software in a Fortran Environment. Argonne National Laboratory, Mathematics and Computer Science Division, Technical Memorandum No. 23, May 1985.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1988

Authors and Affiliations

  • A. Corana
    • 1
  • C. Martini
    • 1
  • S. Ridella
    • 1
  • C. Rolando
    • 1
  1. 1.Consiglio Nazionale delle Ricerche Istituto per i Circuiti ElettroniciGenovaItaly

Personalised recommendations