Advertisement

Fundamental Kernels

  • Efstratios Gallopoulos
  • Bernard Philippe
  • Ahmed H. Sameh
Chapter
Part of the Scientific Computation book series (SCIENTCOMP)

Abstract

In this chapter we discuss the fundamental operations, that are the building blocks of dense and sparse matrix computations. They are termed kernels because in most cases they account for most of the computational effort. Because of this, their implementation directly impacts the overall efficiency of the computation. They occur often at the lowest level where parallelism is expressed.

References

  1. 1.
    Lawson, C., Hanson, R., Kincaid, D., Krogh, F.: Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5(3), 308–323 (1979)CrossRefMATHGoogle Scholar
  2. 2.
    Dongarra, J., Croz, J.D., Hammarling, S., Hanson, R.: An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Softw. 14(1), 1–17 (1988)CrossRefMATHGoogle Scholar
  3. 3.
    Dongarra, J., Du Croz, J., Hammarling, S., Duff, I.: A set of level-3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990)CrossRefMATHGoogle Scholar
  4. 4.
    Intel company: Intel Math Kernel Library. http://software.intel.com/en-us/intel-mkl
  5. 5.
    Texas advanced computer center, University of Texas: GotoBLAS2. https://www.tacc.utexas.edu/tacc-software/gotoblas2
  6. 6.
    Netlib Repository at UTK and ORNL: Automatically Tuned Linear Algebra Software (ATLAS). http://www.netlib.org/atlas/
  7. 7.
    Whaley, R., Dongarra, J.: Automatically tuned linear algebra software. In: Proceedings of 1998 ACM/IEEE Conference on Supercomputing, Supercomputing’98, pp. 1–27. IEEE Computer Society, Washington (1998). http://dl.acm.org/citation.cfm?id=509058.509096
  8. 8.
    Yotov, K., Li, X., Ren, G., Garzarán, M., Padua, D., Pingali, K., Stodghill, P.: Is search really necessary to generate high-performance BLAS? Proc. IEEE 93(2), 358–386 (2005). doi: 10.1109/JPROC.2004.840444 CrossRefGoogle Scholar
  9. 9.
    Goto, K., van de Geijn, R.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34(3), 12:1–12:25 (2008). doi: 10.1145/1356052.1356053. http://doi.acm.org/10.1145/1356052.1356053
  10. 10.
    Gallivan, K.A., Plemmons, R.J., Sameh, A.H.: Parallel algorithms for dense linear algebra computations. SIAM Rev. 32(1), 54–135 (1990). doi:http://dx.doi.org/10.1137/1032002
  11. 11.
    Gallivan, K., Jalby, W., Meier, U.: The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory. SIAM J. Sci. Stat. Comput. 8(6), 1079–1084 (1987)CrossRefMATHGoogle Scholar
  12. 12.
    Strassen, V.: Gaussian elimination is not optimal. Numerische Mathematik 13, 354–356 (1969)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Winograd, S.: On multiplication of 2 \(\times \) 2 matrices. Linear Algebra Appl. 4(4), 381–388 (1971)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Ballard, G., Demmel, J., Holtz, O., Lipshitz, B., Schwartz, O.: Communication-optimal parallel algorithm for Strassen matrix multiplication. Technical report UCB/EECS-2012-32, EECS Department, University of California, Berkeley (2012). http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-32.html
  15. 15.
    Higham, N.J.: Exploiting fast matrix multiplication within the level 3 BLAS. ACM Trans. Math. Softw. 16(4), 352–368 (1990)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Ballard, G., Demmel, J., Holtz, O., Schwartz, O.: Graph expansion and communication costs of fast matrix multiplication. J. ACM 59(6), 32:1–32:23 (2012). doi: 10.1145/2395116.2395121. http://doi.acm.org/10.1145/2395116.2395121
  17. 17.
    Lipshitz, B., Ballard, G., Demmel, J., Schwartz, O.: Communication-avoiding parallel Strassen: implementation and performance. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC’12, pp. 101:1–101:11. IEEE Computer Society Press, Los Alamitos (2012). http://dl.acm.org/citation.cfm?id=2388996.2389133
  18. 18.
    Higham, N.J.: Stability of a method for multiplying complex matrices with three real matrix multiplications. SIAM J. Matrix Anal. Appl. 13(3), 681–687 (1992)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Golub, G., Van Loan, C.: Matrix Computations, 4th edn. Johns Hopkins (2013)Google Scholar
  20. 20.
    Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide, 3rd edn. Society for Industrial and Applied Mathematics, Philadelphia (1999)CrossRefGoogle Scholar
  21. 21.
    Blackford, L., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.: ScaLAPACK User’s Guide. SIAM, Philadelphia (1997). http://www.netlib.org/scalapack
  22. 22.
    Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message Passing Interface. MIT Press, Cambridge (1994)Google Scholar
  23. 23.
    Moler, C.: MATLAB incorporates LAPACK. Mathworks Newsletter (2000). http://www.mathworks.com/company/newsletters/articles/matlab-incorporates-lapack.html
  24. 24.
    Gallivan, K., Jalby, W., Meier, U., Sameh, A.: The impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl. 2(1) (1988)Google Scholar
  25. 25.
    Davis, T., Hu, Y.: The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011). http://doi.acm.org/10.1145/2049662.2049663
  26. 26.
    Duff, I., Erisman, A., Reid, J.: Direct Methods for Sparse Matrices. Oxford University Press Inc., New York (1989)MATHGoogle Scholar
  27. 27.
    Davis, T.: Direct Methods for Sparse Linear Systems. SIAM, Philadelphia (2006)CrossRefMATHGoogle Scholar
  28. 28.
    Zlatev, Z.: Computational Methods for General Sparse Matrices, vol. 65. Kluwer Academic Publishers, Dordrecht (1991)CrossRefMATHGoogle Scholar
  29. 29.
    Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., van der Vorst, H.: Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Philadelphia (2000)CrossRefMATHGoogle Scholar
  30. 30.
    Melhem, R.: Toward efficient implementation of preconditioned conjugate gradient methods on vector supercomputers. Int. J. Supercomput. Appl. 1(1), 70–98 (1987)CrossRefGoogle Scholar
  31. 31.
    Philippe, B., Saad, Y.: Solving large sparse eigenvalue problems on supercomputers. Technical report RIACS TR 88.38, NASA Ames Research Center (1988)Google Scholar
  32. 32.
    Schenk, O.: Combinatorial Scientific Computing. CRC Press, Switzerland (2012)CrossRefMATHGoogle Scholar
  33. 33.
    Kepner, J., Gilbert, J.: Graph Algorithms in the Language of Linear Algebra. SIAM, Philadelphia (2011)CrossRefMATHGoogle Scholar
  34. 34.
    George, J., Liu, J.: Computer Solutions of Large Sparse Positive Definite Systems. Prentice Hall (1981)Google Scholar
  35. 35.
    Pissanetzky, S.: Sparse Matrix Technology. Academic Press, New York (1984)MATHGoogle Scholar
  36. 36.
    Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of 24th National Conference Association Computer Machinery, pp. 157–172. ACM Publications, New York (1969)Google Scholar
  37. 37.
    Liu, W., Sherman, A.: Comparative analysis of the Cuthill-McKee and the reverse Cuthill-McKee ordering algorithms for sparse matrices. SIAM J. Numer. Anal. 13, 198–213 (1976)MathSciNetCrossRefMATHGoogle Scholar
  38. 38.
    D’Azevedo, E.F., Forsyth, P.A., Tang, W.P.: Ordering methods for preconditioned conjugate gradient methods applied to unstructured grid problems. SIAM J. Matrix Anal. 13(3), 944–961 (1992)MathSciNetCrossRefMATHGoogle Scholar
  39. 39.
    Duff, I., Meurant, G.: The effect of ordering on preconditioned conjugate gradients. BIT 29, 635–657 (1989)MathSciNetCrossRefMATHGoogle Scholar
  40. 40.
    Reid, J., Scott, J.: Reducing the total bandwidth of a sparse unsymmetric matrix. SIAM J. Matrix Anal. Appl. 28(3), 805–821 (2005)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Barnard, S., Pothen, A., Simon, H.: A spectral algorithm for envelope reduction of sparse matrices. Numer. Linear Algebra Appl. 2, 317–334 (1995)MathSciNetCrossRefMATHGoogle Scholar
  42. 42.
    Spielman, D., Teng, S.: Spectral partitioning works: planar graphs and finite element meshes. Numer. Linear Algebra Appl. 421, 284–305 (2007)MathSciNetCrossRefMATHGoogle Scholar
  43. 43.
    Duff, I.: On algorithms for obtaining a maximum transversal. ACM Trans. Math. Softw. 7, 315–330 (1981)CrossRefGoogle Scholar
  44. 44.
    Duff, I., Koster, J.: On algorithms for permuting large entries to the diagonal of a sparse matrix. SIAM J. Matrix Anal. Appl. 22, 973–966 (2001)MathSciNetCrossRefMATHGoogle Scholar
  45. 45.
    Duff, I., Koster, J.: The design and use of algorithms for permuting large entries to the diagonal of sparse matrices. SIAM J. Matrix Anal. Appl. 20, 889–901 (1999)MathSciNetCrossRefMATHGoogle Scholar
  46. 46.
    The HSL mathematical software library. See http://www.hsl.r1.ac.uk/index.html
  47. 47.
    Tarjan, R.: Depth-first search and linear graph algorithms. SIAM J. Comput. 1(2), 146–160 (1972)MathSciNetCrossRefMATHGoogle Scholar
  48. 48.
    Cheriyan, J., Mehlhorn, K.: Algorithms for dense graphs and networks on the random access computer. Algorithmica 15, 521–549 (1996)MathSciNetCrossRefMATHGoogle Scholar
  49. 49.
    Dijkstra, E.: A Discipline of Programming, Chapter 25. Prentice Hall, Englewood Cliffs (1976)Google Scholar
  50. 50.
    Manguoğlu, M., Mehmet, K., Sameh, A., Grama, A.: Weighted matrix ordering and parallel banded preconditioners for iterative linear system solvers. SIAM J. Sci. Comput. 32(3), 1201–1206 (2010)MathSciNetCrossRefMATHGoogle Scholar
  51. 51.
    Hendrickson, B., Leland, R.: An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM J. Sci. Comput. 16(2), 452–469 (1995). http://citeseer.nj.nec.com/hendrickson95improved.html
  52. 52.
    Fiedler, M.: Algebraic connectivity of graphs. Czechoslovak Math. J. 23, 298–305 (1973)MathSciNetGoogle Scholar
  53. 53.
    Kruyt, N.: A conjugate gradient method for the spectral partitioning of graphs. Parallel Comput. 22, 1493–1502 (1997)MathSciNetCrossRefMATHGoogle Scholar
  54. 54.
    Chan, P., Schlag, M., Zien, J.: Spectral k-way ratio-cut partitioning and clustering. IEEE Trans. CAD-Integr. Circuits Syst. 13, 1088–1096 (1994)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  • Efstratios Gallopoulos
    • 1
  • Bernard Philippe
    • 2
  • Ahmed H. Sameh
    • 3
  1. 1.Computer Engineering and Informatics DepartmentUniversity of PatrasPatrasGreece
  2. 2.Campus de BeaulieuINRIA/IRISARennes CedexFrance
  3. 3.Department of Computer SciencePurdue UniversityWest LafayetteUSA

Personalised recommendations