A Fast Algorithm for Matrix Multiplication and Its Efficient Realization on Systolic Arrays

Elfimova, L. D.; Kapitonova, Yu. V.

doi:10.1023/A:1016676318988

A Fast Algorithm for Matrix Multiplication and Its Efficient Realization on Systolic Arrays

Published: January 2001

Volume 37, pages 109–121, (2001)
Cite this article

Cybernetics and Systems Analysis Aims and scope

L. D. Elfimova¹ &
Yu. V. Kapitonova¹

189 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

A new fast matrix multiplication algorithm is proposed, which, as compared to the Winograd algorithm, has a lower multiplicative complexity equal to W _M ≈ 0.437n³ multiplication operations. Based on a goal-directed transformation of its basic graph, new optimized architectures of systolic arrays are synthesized. A systolic variant of the Strassen algorithm is presented for the first time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Low Power Semi-systolic Architectures for Polynomial-Basis Multiplication over GF(2 m ) Using Progressive Multiplier Reduction

Article 19 April 2015

AxSA: On the Design of High-Performance and Power-Efficient Approximate Systolic Arrays for Matrix Multiplication

Article 11 August 2020

Hardware Acceleration of Matrix Multiplication over Small Prime Finite Fields

REFERENCES

H. T. Kung andC. E. Leiserson, “Systolic arrays for VLSI,” in: Proc. Sparse Matrix Symp., 1978, SIAM, Philadelphia (1979), pp. 252–282.
Google Scholar
H. T. Kunq, “Why systolic architectures?” Computer, 15 No. 1, 37–46 (1982).
Google Scholar
S. Y. Kung,H. I. Whitehouse, and T. Kailath (Eds.), VLSI and Modern Signal Processing, Prentice-Hall, Englewood Cliffs, NJ (1985).
Google Scholar
D. Uhlman, Computational Aspects of VLSI [Russian translation], Radio i Svyaz', Moscow (1990).
Google Scholar
D. K. Faddeev andV. N. Faddeeva, Computational Methods of Linear Algebra [in Russian], Fizmatgiz, Moscow-Leningrad (1963).
Google Scholar
M. Vajteršic, “Matrix multiplication algorithms for matrices of size n 128 on the MasPar parallel computer,” Tech. Report of the Dep. for Informatics, Univ. of Bergen, Norway, No. 48, Aug. (1990).
Google Scholar
P. Bjorstad,F. Manne,T. Sorevik et al., “Efficient matrix multiplication on SIMD computers,” SIAM J. Matrix Anal. Appl., 13 No. 1, 386–401 (1992).
Google Scholar
D. H. Bailey, “Extra high speed matrix multiplication on the Cray-2,” SIAM J. Sci. Statist. Comput., 9 603–607 (1988).
Google Scholar
F. Dafaux andM. Kunt, “Matrix multiplication on an associative string processor,” In: P. Quinton andY. Robert (eds.), Algorithm and Parallel VLSI Architecture, Elsevier, Amsterdam (1992), pp. 305–310.
Google Scholar
S. Kak, “A two-layered mesh array for matrix multiplication,” Parallel Comput., 10 383–385 (1988).
Google Scholar
G. H. Li andB. W. Wah, “The design of optimal systolic arrays,” IEEE Trans. Comput., C-10 66–77 (1985).
Google Scholar
J. H. Moreno andT. Lang, “Matrix computations on systolic-type meshes: An introduction to the multimesh graph method,” Computer, 23 No. 4, 32–51 (1990).
Google Scholar
L. Jelfimova,R. Wyrzykovski, andJu. Kanevski, “A fast toroidal systolic array for matrix operations,” in: Proc. Sixth Int. Workshop on Parallel Processing by Cellular Automata and Array, PARCELLA-94 ( Potsdam, Germany, Sept., 1994), Akad.-Verlag, Potsdam, 81 (1994), pp. 237–245.
Google Scholar
D. A. Pospelov, Introduction to the Theory of Computing Systems [in Russian], Sov. Radio, Moscow (1972).
Google Scholar
A. M. Larionov,S. A. Mayorov, andG. I. Novikov, Computer Complexes, Systems, and Networks [in Russian], Energoatomizdat, Leningrad (1987).
Google Scholar
S. Winograd, “A new algorithm for inner product,” IEEE Trans. Comput., C-18, 693–694 (1968).
Google Scholar
E. Francomano,A. Tortorici-Macaluso, andM. Vajteršic, “Implementation analysis of fast matrix multiplication algorithms on shared memory computers,” Comput. Artif. Intel., 14 299–313 (1995).
Google Scholar
J. Miclosko,M. Vajteršic,I. Vrto et al., Fast Algorithms and Their Implementation on Specialized Parallel Computers, North Holland, Amsterdam (1989).
Google Scholar
B. Dimitrescu,J. L. Roch, andD. Trystram, “Fast matrix multiplications on MIMD architecture,” Par. Alg. Arch., 4 53–70 (1994).
Google Scholar
H. V. Jagadish andT. Kailath, “A family of new efficient arrays for matrix multiplication,” IEEE Trans. Comput., 38 149–155 (1989).
Google Scholar
A. Benaini andY. Robert, “An even faster systolic array for matrix multiplication,” Parallel Computing, 12 249–254 (1989).
Google Scholar
L. Jelfimova, “A new fast systolic array for the modified Winograd algorithm,” In: Proc. 7th Int. Workshop on Parallel Processing by Cellular Automata and Arrays, PARCELLA-96 ( Berlin, Germany, Sept., 1996), 96, Akad. Verlag, Berlin (1996), pp. 157–164.
Google Scholar
V. Strassen, “Gaussian elimination is not optimal,” Num. Math., 13 354–356 (1969).
Google Scholar
B. Grayson andR. Van de Geijn, “A high performance parallel Strassen implementation,” Par. Proc. Letters, 6 3–12 (1996).
Google Scholar
C. H. Huang,R. W. Johnson, andJ. R. Johnson, “Generating parallel programs from tensor product formulas: A case study of Strassen matrix multiplication algorithm,” Intern. Conf. on Parallel Processing, 3 104–108 (1992).
Google Scholar
M. Carmignani,A. Genco, andA. Tortorici, “A parallel algorithm for the Strassen's generalized method,” In: Rivista di Informatica AICA XVI-4 [in Italian] (1986), pp. 347–351.
J. J. Modi, Parallel Algorithms and Matrix Computation, Clarendon Press, Oxford (1988).
Google Scholar
V. V. Voevodin, Mathematical Models and Methods in Parallel Processes [in Russian], Nauka, Moscow (1986).
Google Scholar
V. A. Evstigneev, Applying Graph Theory to Programming [in Russian], Nauka, Moscow (1985).
Google Scholar
L. D. Jelfimova,R. Wyrzikovski, andJ. S. Kanevski, “Systolic array implementation of some iterative algorithms for solving systems of linear algebraic equations,” Cyb. Sys. Anal., No. 5, 145–158 (1992).
D. I. Moldovan, “On the design of algorithms for VLSI systolic arrays,” Proc. IEEE, 71(1) 113–120 (1983).
Google Scholar
P. Quinton, “Systematic design of systolic arrays,” In: Automata Networks in Computer Science, Manchester Univ. Press, Manchester (1987), pp. 229–260.
Google Scholar
H. Barada andA. El-Amawy, “A methodology for algorithm regularization and mapping into time optimal VLSI arrays,” Parallel Computing, 19 33–61 (1993).
Google Scholar
W. L. Miranker andA. Winkler, “Space-time representations of systolic computational structures,” Computing, 32 93–114 (1984).
Google Scholar
J. S. Kanevski, Systolic Processors [in Russian], Tekhnika, Kiev (1991).
Google Scholar
S. G. Sedukhin, Regular Approach to Design of VLSI-Based Computational Structures [in Russian], Preprint, Acad. Sci. SSSR, VTs SO, No. 589, Novosibirsk (1985).
S. Y. Kunq, VLSI Array Processors, Prentice-Hall, Englewood Cliffs, NJ (1988).
Google Scholar
V. A. Emelichev,O. I. Mel'nikov,V. I. Sarvanov, andR. I. Tyshkevich, Lectures on Graph Theory [in Russian], Nauka, Moscow (1990).
Google Scholar
F. P. Preparata, “Optimal three-dimensional VLSI layouts,” Math. Syst. Theory, 16 1–8 (1983).
Google Scholar

Download references

Author information

Authors and Affiliations

Cybernetics Institute, National Academy of Sciences of Ukraine, Kiev, Ukraine
L. D. Elfimova & Yu. V. Kapitonova

Authors

L. D. Elfimova
View author publications
You can also search for this author in PubMed Google Scholar
Yu. V. Kapitonova
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elfimova, L.D., Kapitonova, Y.V. A Fast Algorithm for Matrix Multiplication and Its Efficient Realization on Systolic Arrays. Cybernetics and Systems Analysis 37, 109–121 (2001). https://doi.org/10.1023/A:1016676318988

Download citation

Issue Date: January 2001
DOI: https://doi.org/10.1023/A:1016676318988

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Fast Algorithm for Matrix Multiplication and Its Efficient Realization on Systolic Arrays

Abstract

Access this article

Similar content being viewed by others

Low Power Semi-systolic Architectures for Polynomial-Basis Multiplication over GF(2 m ) Using Progressive Multiplier Reduction

AxSA: On the Design of High-Performance and Power-Efficient Approximate Systolic Arrays for Matrix Multiplication

Hardware Acceleration of Matrix Multiplication over Small Prime Finite Fields

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Fast Algorithm for Matrix Multiplication and Its Efficient Realization on Systolic Arrays

Abstract

Access this article

Similar content being viewed by others

Low Power Semi-systolic Architectures for Polynomial-Basis Multiplication over GF(2 m ) Using Progressive Multiplier Reduction

AxSA: On the Design of High-Performance and Power-Efficient Approximate Systolic Arrays for Matrix Multiplication

Hardware Acceleration of Matrix Multiplication over Small Prime Finite Fields

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation