Abstract
In this chapter we discuss the fundamental operations, that are the building blocks of dense and sparse matrix computations. They are termed kernels because in most cases they account for most of the computational effort. Because of this, their implementation directly impacts the overall efficiency of the computation. They occur often at the lowest level where parallelism is expressed.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Lawson, C., Hanson, R., Kincaid, D., Krogh, F.: Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5(3), 308–323 (1979)
Dongarra, J., Croz, J.D., Hammarling, S., Hanson, R.: An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Softw. 14(1), 1–17 (1988)
Dongarra, J., Du Croz, J., Hammarling, S., Duff, I.: A set of level-3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990)
Intel company: Intel Math Kernel Library. http://software.intel.com/en-us/intel-mkl
Texas advanced computer center, University of Texas: GotoBLAS2. https://www.tacc.utexas.edu/tacc-software/gotoblas2
Netlib Repository at UTK and ORNL: Automatically Tuned Linear Algebra Software (ATLAS). http://www.netlib.org/atlas/
Whaley, R., Dongarra, J.: Automatically tuned linear algebra software. In: Proceedings of 1998 ACM/IEEE Conference on Supercomputing, Supercomputing’98, pp. 1–27. IEEE Computer Society, Washington (1998). http://dl.acm.org/citation.cfm?id=509058.509096
Yotov, K., Li, X., Ren, G., Garzarán, M., Padua, D., Pingali, K., Stodghill, P.: Is search really necessary to generate high-performance BLAS? Proc. IEEE 93(2), 358–386 (2005). doi:10.1109/JPROC.2004.840444
Goto, K., van de Geijn, R.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34(3), 12:1–12:25 (2008). doi:10.1145/1356052.1356053. http://doi.acm.org/10.1145/1356052.1356053
Gallivan, K.A., Plemmons, R.J., Sameh, A.H.: Parallel algorithms for dense linear algebra computations. SIAM Rev. 32(1), 54–135 (1990). doi:http://dx.doi.org/10.1137/1032002
Gallivan, K., Jalby, W., Meier, U.: The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory. SIAM J. Sci. Stat. Comput. 8(6), 1079–1084 (1987)
Strassen, V.: Gaussian elimination is not optimal. Numerische Mathematik 13, 354–356 (1969)
Winograd, S.: On multiplication of 2 \(\times \) 2 matrices. Linear Algebra Appl. 4(4), 381–388 (1971)
Ballard, G., Demmel, J., Holtz, O., Lipshitz, B., Schwartz, O.: Communication-optimal parallel algorithm for Strassen matrix multiplication. Technical report UCB/EECS-2012-32, EECS Department, University of California, Berkeley (2012). http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-32.html
Higham, N.J.: Exploiting fast matrix multiplication within the level 3 BLAS. ACM Trans. Math. Softw. 16(4), 352–368 (1990)
Ballard, G., Demmel, J., Holtz, O., Schwartz, O.: Graph expansion and communication costs of fast matrix multiplication. J. ACM 59(6), 32:1–32:23 (2012). doi:10.1145/2395116.2395121. http://doi.acm.org/10.1145/2395116.2395121
Lipshitz, B., Ballard, G., Demmel, J., Schwartz, O.: Communication-avoiding parallel Strassen: implementation and performance. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC’12, pp. 101:1–101:11. IEEE Computer Society Press, Los Alamitos (2012). http://dl.acm.org/citation.cfm?id=2388996.2389133
Higham, N.J.: Stability of a method for multiplying complex matrices with three real matrix multiplications. SIAM J. Matrix Anal. Appl. 13(3), 681–687 (1992)
Golub, G., Van Loan, C.: Matrix Computations, 4th edn. Johns Hopkins (2013)
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide, 3rd edn. Society for Industrial and Applied Mathematics, Philadelphia (1999)
Blackford, L., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.: ScaLAPACK User’s Guide. SIAM, Philadelphia (1997). http://www.netlib.org/scalapack
Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message Passing Interface. MIT Press, Cambridge (1994)
Moler, C.: MATLAB incorporates LAPACK. Mathworks Newsletter (2000). http://www.mathworks.com/company/newsletters/articles/matlab-incorporates-lapack.html
Gallivan, K., Jalby, W., Meier, U., Sameh, A.: The impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl. 2(1) (1988)
Davis, T., Hu, Y.: The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011). http://doi.acm.org/10.1145/2049662.2049663
Duff, I., Erisman, A., Reid, J.: Direct Methods for Sparse Matrices. Oxford University Press Inc., New York (1989)
Davis, T.: Direct Methods for Sparse Linear Systems. SIAM, Philadelphia (2006)
Zlatev, Z.: Computational Methods for General Sparse Matrices, vol. 65. Kluwer Academic Publishers, Dordrecht (1991)
Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., van der Vorst, H.: Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Philadelphia (2000)
Melhem, R.: Toward efficient implementation of preconditioned conjugate gradient methods on vector supercomputers. Int. J. Supercomput. Appl. 1(1), 70–98 (1987)
Philippe, B., Saad, Y.: Solving large sparse eigenvalue problems on supercomputers. Technical report RIACS TR 88.38, NASA Ames Research Center (1988)
Schenk, O.: Combinatorial Scientific Computing. CRC Press, Switzerland (2012)
Kepner, J., Gilbert, J.: Graph Algorithms in the Language of Linear Algebra. SIAM, Philadelphia (2011)
George, J., Liu, J.: Computer Solutions of Large Sparse Positive Definite Systems. Prentice Hall (1981)
Pissanetzky, S.: Sparse Matrix Technology. Academic Press, New York (1984)
Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of 24th National Conference Association Computer Machinery, pp. 157–172. ACM Publications, New York (1969)
Liu, W., Sherman, A.: Comparative analysis of the Cuthill-McKee and the reverse Cuthill-McKee ordering algorithms for sparse matrices. SIAM J. Numer. Anal. 13, 198–213 (1976)
D’Azevedo, E.F., Forsyth, P.A., Tang, W.P.: Ordering methods for preconditioned conjugate gradient methods applied to unstructured grid problems. SIAM J. Matrix Anal. 13(3), 944–961 (1992)
Duff, I., Meurant, G.: The effect of ordering on preconditioned conjugate gradients. BIT 29, 635–657 (1989)
Reid, J., Scott, J.: Reducing the total bandwidth of a sparse unsymmetric matrix. SIAM J. Matrix Anal. Appl. 28(3), 805–821 (2005)
Barnard, S., Pothen, A., Simon, H.: A spectral algorithm for envelope reduction of sparse matrices. Numer. Linear Algebra Appl. 2, 317–334 (1995)
Spielman, D., Teng, S.: Spectral partitioning works: planar graphs and finite element meshes. Numer. Linear Algebra Appl. 421, 284–305 (2007)
Duff, I.: On algorithms for obtaining a maximum transversal. ACM Trans. Math. Softw. 7, 315–330 (1981)
Duff, I., Koster, J.: On algorithms for permuting large entries to the diagonal of a sparse matrix. SIAM J. Matrix Anal. Appl. 22, 973–966 (2001)
Duff, I., Koster, J.: The design and use of algorithms for permuting large entries to the diagonal of sparse matrices. SIAM J. Matrix Anal. Appl. 20, 889–901 (1999)
The HSL mathematical software library. See http://www.hsl.r1.ac.uk/index.html
Tarjan, R.: Depth-first search and linear graph algorithms. SIAM J. Comput. 1(2), 146–160 (1972)
Cheriyan, J., Mehlhorn, K.: Algorithms for dense graphs and networks on the random access computer. Algorithmica 15, 521–549 (1996)
Dijkstra, E.: A Discipline of Programming, Chapter 25. Prentice Hall, Englewood Cliffs (1976)
Manguoğlu, M., Mehmet, K., Sameh, A., Grama, A.: Weighted matrix ordering and parallel banded preconditioners for iterative linear system solvers. SIAM J. Sci. Comput. 32(3), 1201–1206 (2010)
Hendrickson, B., Leland, R.: An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM J. Sci. Comput. 16(2), 452–469 (1995). http://citeseer.nj.nec.com/hendrickson95improved.html
Fiedler, M.: Algebraic connectivity of graphs. Czechoslovak Math. J. 23, 298–305 (1973)
Kruyt, N.: A conjugate gradient method for the spectral partitioning of graphs. Parallel Comput. 22, 1493–1502 (1997)
Chan, P., Schlag, M., Zien, J.: Spectral k-way ratio-cut partitioning and clustering. IEEE Trans. CAD-Integr. Circuits Syst. 13, 1088–1096 (1994)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Gallopoulos, E., Philippe, B., Sameh, A.H. (2016). Fundamental Kernels. In: Parallelism in Matrix Computations. Scientific Computation. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-7188-7_2
Download citation
DOI: https://doi.org/10.1007/978-94-017-7188-7_2
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-017-7187-0
Online ISBN: 978-94-017-7188-7
eBook Packages: EngineeringEngineering (R0)