Skip to main content

Basics and Practice of Linear Algebra Calculation Library BLAS and LAPACK

  • Chapter
  • First Online:
The Art of High Performance Computing for Computational Science, Vol. 1

Abstract

In this chapter, we explain the basic architecture and use of the linear algebra calculation libraries called BLAS and LAPACK. BLAS and LAPACK libraries are for carrying out vector and matrix operations on computers. They are used by many programs, and their implementations are optimized according to the computer they are run on. These libraries should be used whenever possible for linear algebra operations. This is because algorithms based directly on mathematical theorems in textbooks may be inefficient and their results may not have sufficient accuracy in practice. Moreover, programming such algorithms are bothersome. However, performance may suffer if you use a non-optimized library. In fact, the difference in performance between a non-optimized and optimized one is likely very large, so you should choose the fastest one for your computer. The availability of optimized BLAS and LAPACK libraries have improved remarkably. For example, they are now included in Linux distributions such as Ubuntu. In this chapter, we will refer to the libraries for Ubuntu 16.04 so that readers can easily try them out for themselves. Unfortunately, we will not mention GPU implementations on account of lack of space. However, the basic ideas are the same as presented in this chapter; therefore, we believe that readers will easily be able to utilize them as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Although Carl Friedrich Gauss is sometimes credited with the rediscovery, Isaac Newton, a 100 years earlier, wrote that textbooks of the day lacked a method for solving simultaneous equations and proceeded to publish one that became well circulated.

  2. 2.

    There once was a time when the format varied from one manufacturer or vendor to the other; when data ceased being compatible or when a computer was replaced, it was necessary to change the program as well.

  3. 3.

    In multicore environments, programs run in parallel in a light process called “threads.” Because different threads can access the same memory area at the same time. It may induce conflicts. In LAPACK 3.3, all the routines are now thread safe by removing such private variables.

  4. 4.

    It looks very similar to the textbook implementation. However, in this case, we use sub-matrices instead of numbers. This algorithm makes use of the hierarchical structure of the memory cache; it is also suitable for multicore CPUs because of the independence of each sub-matrix \(C_{pq}\).

  5. 5.

    The situation before 2010 was quite chaotic, because the source code was hidden by vendors.

  6. 6.

    The calculation becomes difficult when the clock changes dynamically such as in the case of TurboBoost.

  7. 7.

    AVX refers to Intel Advanced Vector Extensions which is an extension of the SIMD-type instructions succeeding SSE. It has a 256 bit width and can calculate additions and multiplications in one clock. It can store four double-precision values in 256 bits. It can calculate two multiplications per clock, so it is possible to perform eight operations in 1 clock.

  8. 8.

    However, it is better to use the Xeon because it has more memory bandwidth despite it being more expensive to run on a Core i7.

References

  1. Author unknown, The nine chapters on the mathematical art, around the 1st century BC to the second century AD

    Google Scholar 

  2. IEEE, IEEE standard for floating-point arithmetic, IEEE Std 754-2008, pp. 1–70 (2008)

    Google Scholar 

  3. N.J. Higham, SIAM: Society for Industrial and Applied Mathematics, 2nd edn. (2002)

    Google Scholar 

  4. S.  Hauberg, J. W. Eaton, D.  Bateman, GNU Octave Version 3.0.1 Manual: A High-level Interactive Language for Numerical Computations (CreateSpace Independent Publishing Platform, 2008)

    Google Scholar 

  5. MATLAB, version 7.10.0 (R2010a). The MathWorks Inc., Natick, Massachusetts (2010)

    Google Scholar 

  6. BLAS quick reference card, http://www.netlib.org/blas/blasqr.pdf

  7. B. Kågström, P. Ling, C. Van Loan, GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark, ACM Trans. Math. Softw. 24(3), 268–302 (1998)

    Article  Google Scholar 

  8. R.C. Whaley, J.J. Dongarra, in Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, SC ’98, 1 (IEEE Computer Society, Washington, 1998)

    Google Scholar 

  9. K. Goto, R.A. van de Geijn, ACM Trans. Math. Softw. 34, 12:1 (2008)

    Article  MathSciNet  Google Scholar 

  10. X. Zhang, Q. Wang, Y. Zhang, in IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), vol. 17 (IEEE Computer Society, 2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maho Nakata .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Nakata, M. (2019). Basics and Practice of Linear Algebra Calculation Library BLAS and LAPACK. In: Geshi, M. (eds) The Art of High Performance Computing for Computational Science, Vol. 1. Springer, Singapore. https://doi.org/10.1007/978-981-13-6194-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-6194-4_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-6193-7

  • Online ISBN: 978-981-13-6194-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics