Advertisement

The challenge of portable libraries for high performance machines

  • Sven Hammarling
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 879)

Abstract

NAG has always aimed to make their software available on any type of computer for which there is reasonable demand for it, which in practice means any computer in widespread use for general purpose scientific computing. The NAG Fortran 77 Library is currently available on more than fifty different machine ranges, and on something like a hundred different compiler versions. Thus portability of the library has always been a prime consideration, but the advent of vector and parallel computers has required us to pay much more careful attention to the performance of the library, and the challenge has been to try satisfy the sometimes conflicting requirements of performance and portability.

We shall discuss how we have approached the development of portable software for modern shared memory machines, and how we are addressing the problem of distributed memory systems.

Keywords

Numerical software parallel algorithms parallel computing portability 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    E. Anderson, Z. Bai, C. H. Bischof, J. Demmel, J. J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. C. Sorensen. LAPACK Users' Guide. SIAM, Philadelphia, 1992.Google Scholar
  2. [2]
    K. R. Bennett and G. Fairweather. PCOLNEW: A parallel boundary-value solver for shared memory machines. Technical Report CS-90-8, University of Kentucky, Center for Computer Science, Lexington, Kentucky 40506, USA, 1990.Google Scholar
  3. [3]
    R. H. Bisseling and W. F. McColl. Scientific computing on bulk synchronous parallel architectures. Preprint 836, Utrecht University, Department of Mathematics, P.O. Box 80010, 3508 TA Utrecht, The Netherlands, 1993.Google Scholar
  4. [4]
    W. S. Brainerd, C. H. Goldberg, and J. C. Adams. Programmer's Guide to Fortran 90. Unicomp, Albuquerque, 2nd edition, 1994.Google Scholar
  5. [5]
    R. H. Byrd, R. B. Schnabel, and G. A. Shultz. Parallel quasi-Newton methods for unconstrained optimization. Mathematical Programming, 42:273–306, 1988.Google Scholar
  6. [6]
    J. Choi, J. J. Dongarra, D. W. Walker, and R. C. Whaley. ScaLAPACK reference manual. Technical Memorandum ORNL/TM-12470, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA, 1994.Google Scholar
  7. [7]
    R. D. da Cunha and T. Hopkins. PIM 1.1: The parallel iterative methods package for systems of linear equations — users' guide (Fortran 77 version). Technical report, University of Kent at Canterbury, Computing Laboratory, UK, 1994.Google Scholar
  8. [8]
    C. Daly and J. Du Croz. Performance of a subroutine library on vector-processing machines. Comput. Phys. Comm., 37:181–186, 1985.Google Scholar
  9. [9]
    M. J. Daydé, I. S. Duff, and A. Petitet. A parallel block implementation of Level 3 BLAS for MIMD vector processors. Technical Report RAL-93-037, Rutherford Appleton Laboratory, Central Computing Department, Atlas Centre, Oxon OX11 0QX, UK, 1993.Google Scholar
  10. [10]
    J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart. LINPACK Users' Guide. SIAM, Philadelphia, 1978.Google Scholar
  11. [11]
    J. J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling. A set of Level 3 Basic Linear Algebra Subprograms. ACM Trans. Math. Software, 16:1–28, 1990.Google Scholar
  12. [12]
    J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson. An extended set of FORTRAN Basic Linear Algebra Subprograms. ACM Trans. Math. Software, 14:1–32, 1988.Google Scholar
  13. [13]
    J. J. Dongarra, I. S. Duff, D. C. Sorensen, and H. A. van der Vorst. Solving Linear Systems on Vector and Shared Memory Computers. SIAM, Philadelphia, 1991.Google Scholar
  14. [14]
    J. J. Dongarra and S. C. Eisenstat. Squeezing the most out of an algorithm in Cray Fortran. ACM Trans. Math. Software, 10:219–230, 1984.Google Scholar
  15. [15]
    J. J. Dongarra and E. Grosse. Distribution of mathematical software via electronic mail. Communs Ass. comput Mach., 30:403–407, 1987.Google Scholar
  16. [16]
    J. J. Dongarra and S. Hammarling. Evolution of numerical software for dense linear algebra. In M. G. Cox and S. Hammarling, editors, Reliable Numerical Computation, pages 297–327. Oxford University Press, Oxford, 1990.Google Scholar
  17. [17]
    J. J. Dongarra, L. Kaufman, and S. Hammarling. Squeezing the most out of eigenvalue solvers on high-performance computers. Linear Algebra Appl., 77:113–136, 1986.Google Scholar
  18. [18]
    J. J. Dongarra, T. H. Rowan, and R. C. Wade. Software distribution using Xnetlib. Technical Memorandum ORNL/TM-12318, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA, 1993.Google Scholar
  19. [19]
    J. J. Dongarra, van de Geijn, and R. C. Whaley. Users' guide to the BLACS. Technical report, University of Tennessee, Department of Computer Science, 107 Ayres Hall, Knoxville, TN 37996-1301, USA, 1993.Google Scholar
  20. [20]
    J. Du Croz. Evolution of parallel algorithms in dense linear algebra. In A. E. Fincham and B. Ford, editors, Parallel Computation, pages 233–251. Oxford University Press, Oxford, 1993.Google Scholar
  21. [21]
    J. Du Croz and P. Mayes. NAG Fortran Library vectorization review. Technical Report TR6/89, Numerical Algorithms Group, Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, UK, 1989.Google Scholar
  22. [22]
    J. Du Croz, S. Nugent, J. Reid, and D. Taylor. Solving large full sets of linear equations in a paged virtual store. ACM Trans. Math. Software, 7:527–536, 1981.Google Scholar
  23. [23]
    I. S. Duff. The influence of vector and parallel processors on numerical analysis. In A. Iserles and M. J. D. Powell, editors, The State of the Art in Numerical Analysis, pages 359–407. Oxford University Press, London, 1987.Google Scholar
  24. [24]
    A. E. Fincham and B. Ford, editors. Parallel Computation. Oxford University Press, Oxford, 1993.Google Scholar
  25. [25]
    Message Passing Interface Forum. MPI: A message-passing interface standard. Technical report, University of Tennessee, Department of Computer Science, 107 Ayres Hall, Knoxville, TN 37996-1301, USA, 1994.Google Scholar
  26. [26]
    T. L. Freeman and C. Phillips. Parallel Numerical Algorithms. Prentice-Hall, Hemel Hempstead, Hertfordshire, 1992.Google Scholar
  27. [27]
    K. A. Gallivan, R. J. Plemmons, and A. H. Sameh. Parallel algorithms for dense linear algebra computations. SIAM Review, 32:54–135, 1990.Google Scholar
  28. [28]
    G. A. Geist, A. Beguilin, J. J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. PVM 3 user's guide and reference manual. Technical Memorandum ORNL/TM-12187, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA, 1993.Google Scholar
  29. [29]
    A. Genz. The numerical evaluation of multiple integrals on parallel computers. In P Keast and G. Fairweather, editors, Numerical Integration: Recent Developments, Software and Applications, pages 219–229. NATO ASI Series, 209, D. Reidel, 1987.Google Scholar
  30. [30]
    I. Gladwell. Vectorisation of one dimensional quadrature codes. In P Keast and G. Fairweather, editors, Numerical Integration: Recent Developments, Software and Applications, pages 230–238. NATO ASI Series, 209, D. Reidel, 1987.Google Scholar
  31. [31]
    S. Hammarling. Development of numerical software libraries for vector and parallel machines. In A. E. Fincham and B. Ford, editors, Parallel Computation, pages 11–35. Oxford University Press, Oxford, 1993.Google Scholar
  32. [32]
    R. W. Hockney and C. R. Jesshope. Parallel Computers 2. Adam Hilger, Bristol, 1988.Google Scholar
  33. [33]
    B Kågström, P Ling, and C. Van Loan. Portable high performance GEMM-based Level 3 BLAS. In R. F. Sincovec, D. E. Keyes, M. R. Leuze, L. R. Petzold, and D. A. Reed, editors, Parallel Processing for Scientific Computing. SIAM, Philadelphia, 1993. Proceedings of the Sixth SIAM Conference.Google Scholar
  34. [34]
    C. H. Koelbel, D. B. Loveman, R. S. Schreiber, G. L. Steele Jr., and M. E. Zosel. The High Performance Fortran Handbook. The MIT Press, Cambridge, Massachusetts, 1994.Google Scholar
  35. [35]
    C. L. Lawson, R. J. Hanson, D. Kincaid, and F. T. Krogh. Basic Linear Algebra Subprograms for FORTRAN usage. ACM Trans. Math. Software, 5:308–323, 1979.Google Scholar
  36. [36]
    M. Metcalf and J Reid. Fortran 90 Explained. Oxford University Press, Oxford, 1993.Google Scholar
  37. [37]
    R. Miller and J. L. Reed. The Oxford BSP library users' guide, version 1.0. Technical report, University of Oxford, Programming Research Group, 1994.Google Scholar
  38. [38]
    J. M. Ortega and R. G. Voigt. Solution of partial differential equations on vector and parallel computers. SIAM Review, 27:149–240, 1985.Google Scholar
  39. [39]
    J. Rutter. A serial implementation of Cuppen's divide and conquer algorithm for the symmetric eigenvalue problem. Technical Report UCB//CSD-94-799, Computer Science Division (EECS), University of California at Berkeley, Berkeley, CA 94720, USA, 1994.Google Scholar
  40. [40]
    R. B. Schnabel. Parallel nonlinear optimization: Limitations, opportunities, and challenges. Technical Report CU-CS-715-94, University of Colorado at Boulder, Department of Computer Science, Campus Box 430, Boulder, Colorado, USA, 1994.Google Scholar
  41. [41]
    L. G. Valiant. A bridging model for parallel computation. Communs Ass. comput Mach., 33:103–111, 1990.Google Scholar
  42. [42]
    R. G. Voigt. Where are the parallel algorithms? ICASE Report 85-2, Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, Hampton, Virginia 23665, USA, 1985.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1994

Authors and Affiliations

  • Sven Hammarling
    • 1
  1. 1.Numerical Algorithms Group LtdOxfordUK

Personalised recommendations