Abstract
A new fast matrix multiplication algorithm is proposed, which, as compared to the Winograd algorithm, has a lower multiplicative complexity equal to W M ≈ 0.437n3 multiplication operations. Based on a goal-directed transformation of its basic graph, new optimized architectures of systolic arrays are synthesized. A systolic variant of the Strassen algorithm is presented for the first time.
Similar content being viewed by others
REFERENCES
H. T. Kung andC. E. Leiserson, “Systolic arrays for VLSI,” in: Proc. Sparse Matrix Symp., 1978, SIAM, Philadelphia (1979), pp. 252–282.
H. T. Kunq, “Why systolic architectures?” Computer, 15 No. 1, 37–46 (1982).
S. Y. Kung,H. I. Whitehouse, and T. Kailath (Eds.), VLSI and Modern Signal Processing, Prentice-Hall, Englewood Cliffs, NJ (1985).
D. Uhlman, Computational Aspects of VLSI [Russian translation], Radio i Svyaz', Moscow (1990).
D. K. Faddeev andV. N. Faddeeva, Computational Methods of Linear Algebra [in Russian], Fizmatgiz, Moscow-Leningrad (1963).
M. Vajteršic, “Matrix multiplication algorithms for matrices of size n 128 on the MasPar parallel computer,” Tech. Report of the Dep. for Informatics, Univ. of Bergen, Norway, No. 48, Aug. (1990).
P. Bjorstad,F. Manne,T. Sorevik et al., “Efficient matrix multiplication on SIMD computers,” SIAM J. Matrix Anal. Appl., 13 No. 1, 386–401 (1992).
D. H. Bailey, “Extra high speed matrix multiplication on the Cray-2,” SIAM J. Sci. Statist. Comput., 9 603–607 (1988).
F. Dafaux andM. Kunt, “Matrix multiplication on an associative string processor,” In: P. Quinton andY. Robert (eds.), Algorithm and Parallel VLSI Architecture, Elsevier, Amsterdam (1992), pp. 305–310.
S. Kak, “A two-layered mesh array for matrix multiplication,” Parallel Comput., 10 383–385 (1988).
G. H. Li andB. W. Wah, “The design of optimal systolic arrays,” IEEE Trans. Comput., C-10 66–77 (1985).
J. H. Moreno andT. Lang, “Matrix computations on systolic-type meshes: An introduction to the multimesh graph method,” Computer, 23 No. 4, 32–51 (1990).
L. Jelfimova,R. Wyrzykovski, andJu. Kanevski, “A fast toroidal systolic array for matrix operations,” in: Proc. Sixth Int. Workshop on Parallel Processing by Cellular Automata and Array, PARCELLA-94 ( Potsdam, Germany, Sept., 1994), Akad.-Verlag, Potsdam, 81 (1994), pp. 237–245.
D. A. Pospelov, Introduction to the Theory of Computing Systems [in Russian], Sov. Radio, Moscow (1972).
A. M. Larionov,S. A. Mayorov, andG. I. Novikov, Computer Complexes, Systems, and Networks [in Russian], Energoatomizdat, Leningrad (1987).
S. Winograd, “A new algorithm for inner product,” IEEE Trans. Comput., C-18, 693–694 (1968).
E. Francomano,A. Tortorici-Macaluso, andM. Vajteršic, “Implementation analysis of fast matrix multiplication algorithms on shared memory computers,” Comput. Artif. Intel., 14 299–313 (1995).
J. Miclosko,M. Vajteršic,I. Vrto et al., Fast Algorithms and Their Implementation on Specialized Parallel Computers, North Holland, Amsterdam (1989).
B. Dimitrescu,J. L. Roch, andD. Trystram, “Fast matrix multiplications on MIMD architecture,” Par. Alg. Arch., 4 53–70 (1994).
H. V. Jagadish andT. Kailath, “A family of new efficient arrays for matrix multiplication,” IEEE Trans. Comput., 38 149–155 (1989).
A. Benaini andY. Robert, “An even faster systolic array for matrix multiplication,” Parallel Computing, 12 249–254 (1989).
L. Jelfimova, “A new fast systolic array for the modified Winograd algorithm,” In: Proc. 7th Int. Workshop on Parallel Processing by Cellular Automata and Arrays, PARCELLA-96 ( Berlin, Germany, Sept., 1996), 96, Akad. Verlag, Berlin (1996), pp. 157–164.
V. Strassen, “Gaussian elimination is not optimal,” Num. Math., 13 354–356 (1969).
B. Grayson andR. Van de Geijn, “A high performance parallel Strassen implementation,” Par. Proc. Letters, 6 3–12 (1996).
C. H. Huang,R. W. Johnson, andJ. R. Johnson, “Generating parallel programs from tensor product formulas: A case study of Strassen matrix multiplication algorithm,” Intern. Conf. on Parallel Processing, 3 104–108 (1992).
M. Carmignani,A. Genco, andA. Tortorici, “A parallel algorithm for the Strassen's generalized method,” In: Rivista di Informatica AICA XVI-4 [in Italian] (1986), pp. 347–351.
J. J. Modi, Parallel Algorithms and Matrix Computation, Clarendon Press, Oxford (1988).
V. V. Voevodin, Mathematical Models and Methods in Parallel Processes [in Russian], Nauka, Moscow (1986).
V. A. Evstigneev, Applying Graph Theory to Programming [in Russian], Nauka, Moscow (1985).
L. D. Jelfimova,R. Wyrzikovski, andJ. S. Kanevski, “Systolic array implementation of some iterative algorithms for solving systems of linear algebraic equations,” Cyb. Sys. Anal., No. 5, 145–158 (1992).
D. I. Moldovan, “On the design of algorithms for VLSI systolic arrays,” Proc. IEEE, 71(1) 113–120 (1983).
P. Quinton, “Systematic design of systolic arrays,” In: Automata Networks in Computer Science, Manchester Univ. Press, Manchester (1987), pp. 229–260.
H. Barada andA. El-Amawy, “A methodology for algorithm regularization and mapping into time optimal VLSI arrays,” Parallel Computing, 19 33–61 (1993).
W. L. Miranker andA. Winkler, “Space-time representations of systolic computational structures,” Computing, 32 93–114 (1984).
J. S. Kanevski, Systolic Processors [in Russian], Tekhnika, Kiev (1991).
S. G. Sedukhin, Regular Approach to Design of VLSI-Based Computational Structures [in Russian], Preprint, Acad. Sci. SSSR, VTs SO, No. 589, Novosibirsk (1985).
S. Y. Kunq, VLSI Array Processors, Prentice-Hall, Englewood Cliffs, NJ (1988).
V. A. Emelichev,O. I. Mel'nikov,V. I. Sarvanov, andR. I. Tyshkevich, Lectures on Graph Theory [in Russian], Nauka, Moscow (1990).
F. P. Preparata, “Optimal three-dimensional VLSI layouts,” Math. Syst. Theory, 16 1–8 (1983).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Elfimova, L.D., Kapitonova, Y.V. A Fast Algorithm for Matrix Multiplication and Its Efficient Realization on Systolic Arrays. Cybernetics and Systems Analysis 37, 109–121 (2001). https://doi.org/10.1023/A:1016676318988
Issue Date:
DOI: https://doi.org/10.1023/A:1016676318988