Skip to main content
Log in

A Fast Algorithm for Matrix Multiplication and Its Efficient Realization on Systolic Arrays

  • Published:
Cybernetics and Systems Analysis Aims and scope

Abstract

A new fast matrix multiplication algorithm is proposed, which, as compared to the Winograd algorithm, has a lower multiplicative complexity equal to W M ≈ 0.437n3 multiplication operations. Based on a goal-directed transformation of its basic graph, new optimized architectures of systolic arrays are synthesized. A systolic variant of the Strassen algorithm is presented for the first time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. H. T. Kung andC. E. Leiserson, “Systolic arrays for VLSI,” in: Proc. Sparse Matrix Symp., 1978, SIAM, Philadelphia (1979), pp. 252–282.

    Google Scholar 

  2. H. T. Kunq, “Why systolic architectures?” Computer, 15 No. 1, 37–46 (1982).

    Google Scholar 

  3. S. Y. Kung,H. I. Whitehouse, and T. Kailath (Eds.), VLSI and Modern Signal Processing, Prentice-Hall, Englewood Cliffs, NJ (1985).

    Google Scholar 

  4. D. Uhlman, Computational Aspects of VLSI [Russian translation], Radio i Svyaz', Moscow (1990).

    Google Scholar 

  5. D. K. Faddeev andV. N. Faddeeva, Computational Methods of Linear Algebra [in Russian], Fizmatgiz, Moscow-Leningrad (1963).

    Google Scholar 

  6. M. Vajteršic, “Matrix multiplication algorithms for matrices of size n 128 on the MasPar parallel computer,” Tech. Report of the Dep. for Informatics, Univ. of Bergen, Norway, No. 48, Aug. (1990).

    Google Scholar 

  7. P. Bjorstad,F. Manne,T. Sorevik et al., “Efficient matrix multiplication on SIMD computers,” SIAM J. Matrix Anal. Appl., 13 No. 1, 386–401 (1992).

    Google Scholar 

  8. D. H. Bailey, “Extra high speed matrix multiplication on the Cray-2,” SIAM J. Sci. Statist. Comput., 9 603–607 (1988).

    Google Scholar 

  9. F. Dafaux andM. Kunt, “Matrix multiplication on an associative string processor,” In: P. Quinton andY. Robert (eds.), Algorithm and Parallel VLSI Architecture, Elsevier, Amsterdam (1992), pp. 305–310.

    Google Scholar 

  10. S. Kak, “A two-layered mesh array for matrix multiplication,” Parallel Comput., 10 383–385 (1988).

    Google Scholar 

  11. G. H. Li andB. W. Wah, “The design of optimal systolic arrays,” IEEE Trans. Comput., C-10 66–77 (1985).

    Google Scholar 

  12. J. H. Moreno andT. Lang, “Matrix computations on systolic-type meshes: An introduction to the multimesh graph method,” Computer, 23 No. 4, 32–51 (1990).

    Google Scholar 

  13. L. Jelfimova,R. Wyrzykovski, andJu. Kanevski, “A fast toroidal systolic array for matrix operations,” in: Proc. Sixth Int. Workshop on Parallel Processing by Cellular Automata and Array, PARCELLA-94 ( Potsdam, Germany, Sept., 1994), Akad.-Verlag, Potsdam, 81 (1994), pp. 237–245.

    Google Scholar 

  14. D. A. Pospelov, Introduction to the Theory of Computing Systems [in Russian], Sov. Radio, Moscow (1972).

    Google Scholar 

  15. A. M. Larionov,S. A. Mayorov, andG. I. Novikov, Computer Complexes, Systems, and Networks [in Russian], Energoatomizdat, Leningrad (1987).

    Google Scholar 

  16. S. Winograd, “A new algorithm for inner product,” IEEE Trans. Comput., C-18, 693–694 (1968).

    Google Scholar 

  17. E. Francomano,A. Tortorici-Macaluso, andM. Vajteršic, “Implementation analysis of fast matrix multiplication algorithms on shared memory computers,” Comput. Artif. Intel., 14 299–313 (1995).

    Google Scholar 

  18. J. Miclosko,M. Vajteršic,I. Vrto et al., Fast Algorithms and Their Implementation on Specialized Parallel Computers, North Holland, Amsterdam (1989).

    Google Scholar 

  19. B. Dimitrescu,J. L. Roch, andD. Trystram, “Fast matrix multiplications on MIMD architecture,” Par. Alg. Arch., 4 53–70 (1994).

    Google Scholar 

  20. H. V. Jagadish andT. Kailath, “A family of new efficient arrays for matrix multiplication,” IEEE Trans. Comput., 38 149–155 (1989).

    Google Scholar 

  21. A. Benaini andY. Robert, “An even faster systolic array for matrix multiplication,” Parallel Computing, 12 249–254 (1989).

    Google Scholar 

  22. L. Jelfimova, “A new fast systolic array for the modified Winograd algorithm,” In: Proc. 7th Int. Workshop on Parallel Processing by Cellular Automata and Arrays, PARCELLA-96 ( Berlin, Germany, Sept., 1996), 96, Akad. Verlag, Berlin (1996), pp. 157–164.

    Google Scholar 

  23. V. Strassen, “Gaussian elimination is not optimal,” Num. Math., 13 354–356 (1969).

    Google Scholar 

  24. B. Grayson andR. Van de Geijn, “A high performance parallel Strassen implementation,” Par. Proc. Letters, 6 3–12 (1996).

    Google Scholar 

  25. C. H. Huang,R. W. Johnson, andJ. R. Johnson, “Generating parallel programs from tensor product formulas: A case study of Strassen matrix multiplication algorithm,” Intern. Conf. on Parallel Processing, 3 104–108 (1992).

    Google Scholar 

  26. M. Carmignani,A. Genco, andA. Tortorici, “A parallel algorithm for the Strassen's generalized method,” In: Rivista di Informatica AICA XVI-4 [in Italian] (1986), pp. 347–351.

  27. J. J. Modi, Parallel Algorithms and Matrix Computation, Clarendon Press, Oxford (1988).

    Google Scholar 

  28. V. V. Voevodin, Mathematical Models and Methods in Parallel Processes [in Russian], Nauka, Moscow (1986).

    Google Scholar 

  29. V. A. Evstigneev, Applying Graph Theory to Programming [in Russian], Nauka, Moscow (1985).

    Google Scholar 

  30. L. D. Jelfimova,R. Wyrzikovski, andJ. S. Kanevski, “Systolic array implementation of some iterative algorithms for solving systems of linear algebraic equations,” Cyb. Sys. Anal., No. 5, 145–158 (1992).

  31. D. I. Moldovan, “On the design of algorithms for VLSI systolic arrays,” Proc. IEEE, 71(1) 113–120 (1983).

    Google Scholar 

  32. P. Quinton, “Systematic design of systolic arrays,” In: Automata Networks in Computer Science, Manchester Univ. Press, Manchester (1987), pp. 229–260.

    Google Scholar 

  33. H. Barada andA. El-Amawy, “A methodology for algorithm regularization and mapping into time optimal VLSI arrays,” Parallel Computing, 19 33–61 (1993).

    Google Scholar 

  34. W. L. Miranker andA. Winkler, “Space-time representations of systolic computational structures,” Computing, 32 93–114 (1984).

    Google Scholar 

  35. J. S. Kanevski, Systolic Processors [in Russian], Tekhnika, Kiev (1991).

    Google Scholar 

  36. S. G. Sedukhin, Regular Approach to Design of VLSI-Based Computational Structures [in Russian], Preprint, Acad. Sci. SSSR, VTs SO, No. 589, Novosibirsk (1985).

  37. S. Y. Kunq, VLSI Array Processors, Prentice-Hall, Englewood Cliffs, NJ (1988).

    Google Scholar 

  38. V. A. Emelichev,O. I. Mel'nikov,V. I. Sarvanov, andR. I. Tyshkevich, Lectures on Graph Theory [in Russian], Nauka, Moscow (1990).

    Google Scholar 

  39. F. P. Preparata, “Optimal three-dimensional VLSI layouts,” Math. Syst. Theory, 16 1–8 (1983).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elfimova, L.D., Kapitonova, Y.V. A Fast Algorithm for Matrix Multiplication and Its Efficient Realization on Systolic Arrays. Cybernetics and Systems Analysis 37, 109–121 (2001). https://doi.org/10.1023/A:1016676318988

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1016676318988

Navigation