Skip to main content

Fundamental Kernels

  • Chapter
  • First Online:
  • 2837 Accesses

Part of the book series: Scientific Computation ((SCIENTCOMP))

Abstract

In this chapter we discuss the fundamental operations, that are the building blocks of dense and sparse matrix computations. They are termed kernels because in most cases they account for most of the computational effort. Because of this, their implementation directly impacts the overall efficiency of the computation. They occur often at the lowest level where parallelism is expressed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Lawson, C., Hanson, R., Kincaid, D., Krogh, F.: Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5(3), 308–323 (1979)

    Article  MATH  Google Scholar 

  2. Dongarra, J., Croz, J.D., Hammarling, S., Hanson, R.: An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Softw. 14(1), 1–17 (1988)

    Article  MATH  Google Scholar 

  3. Dongarra, J., Du Croz, J., Hammarling, S., Duff, I.: A set of level-3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990)

    Article  MATH  Google Scholar 

  4. Intel company: Intel Math Kernel Library. http://software.intel.com/en-us/intel-mkl

  5. Texas advanced computer center, University of Texas: GotoBLAS2. https://www.tacc.utexas.edu/tacc-software/gotoblas2

  6. Netlib Repository at UTK and ORNL: Automatically Tuned Linear Algebra Software (ATLAS). http://www.netlib.org/atlas/

  7. Whaley, R., Dongarra, J.: Automatically tuned linear algebra software. In: Proceedings of 1998 ACM/IEEE Conference on Supercomputing, Supercomputing’98, pp. 1–27. IEEE Computer Society, Washington (1998). http://dl.acm.org/citation.cfm?id=509058.509096

  8. Yotov, K., Li, X., Ren, G., Garzarán, M., Padua, D., Pingali, K., Stodghill, P.: Is search really necessary to generate high-performance BLAS? Proc. IEEE 93(2), 358–386 (2005). doi:10.1109/JPROC.2004.840444

    Article  Google Scholar 

  9. Goto, K., van de Geijn, R.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34(3), 12:1–12:25 (2008). doi:10.1145/1356052.1356053. http://doi.acm.org/10.1145/1356052.1356053

    Google Scholar 

  10. Gallivan, K.A., Plemmons, R.J., Sameh, A.H.: Parallel algorithms for dense linear algebra computations. SIAM Rev. 32(1), 54–135 (1990). doi:http://dx.doi.org/10.1137/1032002

    Google Scholar 

  11. Gallivan, K., Jalby, W., Meier, U.: The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory. SIAM J. Sci. Stat. Comput. 8(6), 1079–1084 (1987)

    Article  MATH  Google Scholar 

  12. Strassen, V.: Gaussian elimination is not optimal. Numerische Mathematik 13, 354–356 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  13. Winograd, S.: On multiplication of 2 \(\times \) 2 matrices. Linear Algebra Appl. 4(4), 381–388 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  14. Ballard, G., Demmel, J., Holtz, O., Lipshitz, B., Schwartz, O.: Communication-optimal parallel algorithm for Strassen matrix multiplication. Technical report UCB/EECS-2012-32, EECS Department, University of California, Berkeley (2012). http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-32.html

  15. Higham, N.J.: Exploiting fast matrix multiplication within the level 3 BLAS. ACM Trans. Math. Softw. 16(4), 352–368 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  16. Ballard, G., Demmel, J., Holtz, O., Schwartz, O.: Graph expansion and communication costs of fast matrix multiplication. J. ACM 59(6), 32:1–32:23 (2012). doi:10.1145/2395116.2395121. http://doi.acm.org/10.1145/2395116.2395121

    Google Scholar 

  17. Lipshitz, B., Ballard, G., Demmel, J., Schwartz, O.: Communication-avoiding parallel Strassen: implementation and performance. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC’12, pp. 101:1–101:11. IEEE Computer Society Press, Los Alamitos (2012). http://dl.acm.org/citation.cfm?id=2388996.2389133

  18. Higham, N.J.: Stability of a method for multiplying complex matrices with three real matrix multiplications. SIAM J. Matrix Anal. Appl. 13(3), 681–687 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  19. Golub, G., Van Loan, C.: Matrix Computations, 4th edn. Johns Hopkins (2013)

    Google Scholar 

  20. Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide, 3rd edn. Society for Industrial and Applied Mathematics, Philadelphia (1999)

    Book  Google Scholar 

  21. Blackford, L., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.: ScaLAPACK User’s Guide. SIAM, Philadelphia (1997). http://www.netlib.org/scalapack

  22. Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message Passing Interface. MIT Press, Cambridge (1994)

    Google Scholar 

  23. Moler, C.: MATLAB incorporates LAPACK. Mathworks Newsletter (2000). http://www.mathworks.com/company/newsletters/articles/matlab-incorporates-lapack.html

  24. Gallivan, K., Jalby, W., Meier, U., Sameh, A.: The impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl. 2(1) (1988)

    Google Scholar 

  25. Davis, T., Hu, Y.: The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011). http://doi.acm.org/10.1145/2049662.2049663

  26. Duff, I., Erisman, A., Reid, J.: Direct Methods for Sparse Matrices. Oxford University Press Inc., New York (1989)

    MATH  Google Scholar 

  27. Davis, T.: Direct Methods for Sparse Linear Systems. SIAM, Philadelphia (2006)

    Book  MATH  Google Scholar 

  28. Zlatev, Z.: Computational Methods for General Sparse Matrices, vol. 65. Kluwer Academic Publishers, Dordrecht (1991)

    Book  MATH  Google Scholar 

  29. Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., van der Vorst, H.: Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Philadelphia (2000)

    Book  MATH  Google Scholar 

  30. Melhem, R.: Toward efficient implementation of preconditioned conjugate gradient methods on vector supercomputers. Int. J. Supercomput. Appl. 1(1), 70–98 (1987)

    Article  Google Scholar 

  31. Philippe, B., Saad, Y.: Solving large sparse eigenvalue problems on supercomputers. Technical report RIACS TR 88.38, NASA Ames Research Center (1988)

    Google Scholar 

  32. Schenk, O.: Combinatorial Scientific Computing. CRC Press, Switzerland (2012)

    Book  MATH  Google Scholar 

  33. Kepner, J., Gilbert, J.: Graph Algorithms in the Language of Linear Algebra. SIAM, Philadelphia (2011)

    Book  MATH  Google Scholar 

  34. George, J., Liu, J.: Computer Solutions of Large Sparse Positive Definite Systems. Prentice Hall (1981)

    Google Scholar 

  35. Pissanetzky, S.: Sparse Matrix Technology. Academic Press, New York (1984)

    MATH  Google Scholar 

  36. Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of 24th National Conference Association Computer Machinery, pp. 157–172. ACM Publications, New York (1969)

    Google Scholar 

  37. Liu, W., Sherman, A.: Comparative analysis of the Cuthill-McKee and the reverse Cuthill-McKee ordering algorithms for sparse matrices. SIAM J. Numer. Anal. 13, 198–213 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  38. D’Azevedo, E.F., Forsyth, P.A., Tang, W.P.: Ordering methods for preconditioned conjugate gradient methods applied to unstructured grid problems. SIAM J. Matrix Anal. 13(3), 944–961 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  39. Duff, I., Meurant, G.: The effect of ordering on preconditioned conjugate gradients. BIT 29, 635–657 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  40. Reid, J., Scott, J.: Reducing the total bandwidth of a sparse unsymmetric matrix. SIAM J. Matrix Anal. Appl. 28(3), 805–821 (2005)

    Article  MathSciNet  Google Scholar 

  41. Barnard, S., Pothen, A., Simon, H.: A spectral algorithm for envelope reduction of sparse matrices. Numer. Linear Algebra Appl. 2, 317–334 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  42. Spielman, D., Teng, S.: Spectral partitioning works: planar graphs and finite element meshes. Numer. Linear Algebra Appl. 421, 284–305 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  43. Duff, I.: On algorithms for obtaining a maximum transversal. ACM Trans. Math. Softw. 7, 315–330 (1981)

    Article  Google Scholar 

  44. Duff, I., Koster, J.: On algorithms for permuting large entries to the diagonal of a sparse matrix. SIAM J. Matrix Anal. Appl. 22, 973–966 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  45. Duff, I., Koster, J.: The design and use of algorithms for permuting large entries to the diagonal of sparse matrices. SIAM J. Matrix Anal. Appl. 20, 889–901 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  46. The HSL mathematical software library. See http://www.hsl.r1.ac.uk/index.html

  47. Tarjan, R.: Depth-first search and linear graph algorithms. SIAM J. Comput. 1(2), 146–160 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  48. Cheriyan, J., Mehlhorn, K.: Algorithms for dense graphs and networks on the random access computer. Algorithmica 15, 521–549 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  49. Dijkstra, E.: A Discipline of Programming, Chapter 25. Prentice Hall, Englewood Cliffs (1976)

    Google Scholar 

  50. Manguoğlu, M., Mehmet, K., Sameh, A., Grama, A.: Weighted matrix ordering and parallel banded preconditioners for iterative linear system solvers. SIAM J. Sci. Comput. 32(3), 1201–1206 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  51. Hendrickson, B., Leland, R.: An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM J. Sci. Comput. 16(2), 452–469 (1995). http://citeseer.nj.nec.com/hendrickson95improved.html

    Google Scholar 

  52. Fiedler, M.: Algebraic connectivity of graphs. Czechoslovak Math. J. 23, 298–305 (1973)

    MathSciNet  Google Scholar 

  53. Kruyt, N.: A conjugate gradient method for the spectral partitioning of graphs. Parallel Comput. 22, 1493–1502 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  54. Chan, P., Schlag, M., Zien, J.: Spectral k-way ratio-cut partitioning and clustering. IEEE Trans. CAD-Integr. Circuits Syst. 13, 1088–1096 (1994)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Efstratios Gallopoulos .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Gallopoulos, E., Philippe, B., Sameh, A.H. (2016). Fundamental Kernels. In: Parallelism in Matrix Computations. Scientific Computation. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-7188-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-7188-7_2

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-017-7187-0

  • Online ISBN: 978-94-017-7188-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics