Experiments in Parallel Matrix Multiplication on Multi-core Systems

  • Joeffrey Legaux
  • Sylvain Jubertie
  • Frédéric Loulergue
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7439)


Matrix multiplication is an example of application that is both easy to specify and to provide a simple implementation. There exist numerous sophisticated algorithms or very efficient complex implementations. In this study we are rather interested in the design/programming overhead with respect to performance benefits. Starting from the naive sequential implementation, the implementation is first optimised by improving data accesses, then by using vector units of modern processors, and we finally propose a parallel version for multi-core architectures. The various proposed optimisations are experimented on several architectures and the trade-off software complexity versus efficiency is evaluated using Halstead metrics.


matrix multiplication memory accesses SIMD unit shared-memory parallelism software metrics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Albrecht, A.: Measuring Application Development Productivity. In: Press, I.B.M. (ed.) IBM Application Development Symp., pp. 83–92 (October 1979)Google Scholar
  2. 2.
    Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press (1989),
  3. 3.
    Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation 9(3), 251–280 (1990), MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    D’Alberto, P., Nicolau, A.: Adaptive strassen and atlas’s dgemm: A fast squarematrix multiply for modern high-performance systems. In: Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region. HPCASIA 2005, p. 45. IEEE Computer Society, Washington, DC (2005), doi:10.1109/HPCASIA.2005.18Google Scholar
  5. 5.
    Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990), zbMATHCrossRefGoogle Scholar
  6. 6.
    Dongarra, J.J., Luszczek, P., Petitet, A.: The linpack benchmark: past, present and future. Concurrency and Computation: Practice and Experience 15(9), 803–820 (2003), doi:10.1002/cpe.728CrossRefGoogle Scholar
  7. 7.
    Gonzáalez-Vélez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: highlevel structured parallel programming enablers. Software, Practrice & Experience 40(12), 1135–1160 (2010)CrossRefGoogle Scholar
  8. 8.
    Halstead, M.H.: Elements of Software Science. Operating and programming systems series. Elsevier Science Ltd. (1977)Google Scholar
  9. 9.
    Javed, N., Loulergue, F.: Parallel Programming and Performance Predictability with Orléans Skeleton Library. In: International Conference on High Performance Computing and Simulation (HPCS), pp. 257–263. IEEE (2011)Google Scholar
  10. 10.
    Kemerer, C.F.: An empirical validation of software cost estimation models. Commun. ACM 30(5), 416–429 (1987)CrossRefGoogle Scholar
  11. 11.
    Mccabe, T.J.: A complexity measure. In: ICSE 1976: Proceedings of the 2nd International Conference on Software Engineering. IEEE Computer Society Press, Los Alamitos (1976)Google Scholar
  12. 12.
    Peleg, A., Weiser, U.: MMX technology extension to the intel architecture. IEEE Micro 16(4), 42–50 (1996)CrossRefGoogle Scholar
  13. 13.
    Strassen, V.: Gaussian elimination is not optimal. Numerische Mathematik 13, 354–356 (1969), doi:10.1007/BF02165411, 10.1007/BF02165411MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Strey, A., Bange, M.: Performance Analysis of Intel’s MMX and SSE: A Case Study. In: Sakellariou, R., Keane, J.A., Gurd, J.R., Freeman, L. (eds.) Euro-Par 2001. LNCS, vol. 2150, pp. 142–147. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  15. 15.
    Touati, S.A.A., Worms, J., Briais, S.: The Speedup Test. Tech. Rep. inria-00443839, INRIA Saclay - Ile de France (2010),
  16. 16.
    Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimization of software and the ATLAS project. Parallel Computing 27(1-2), 3–35 (2001); also available as University of Tennessee LAPACKWorking Note #147, UT-CS-00-448 (2000), zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Joeffrey Legaux
    • 1
  • Sylvain Jubertie
    • 1
  • Frédéric Loulergue
    • 1
  1. 1.LIFOUniversity of OrléansFrance

Personalised recommendations