Strassen's 2x2 matrix multiplication algorithm: A conceptual perspective

The main purpose of this paper is pedagogical. Despite its importance, all proofs of the correctness of Strassen's famous 1969 algorithm to multiply two 2x2 matrices with only seven multiplications involve some basis-dependent calculations such as explicitly multiplying specific 2x2 matrices, expanding expressions to cancel terms with opposing signs, or expanding tensors over the standard basis. This makes the proof nontrivial to memorize and many presentations of the proof avoid showing all the details and leave a significant amount of verifications to the reader. In this note we give a short, self-contained, basis-independent proof of the existence of Strassen's algorithm that avoids these types of calculations. We achieve this by focusing on symmetries and algebraic properties. Our proof can be seen as a coordinate-free version of the construction of Clausen from 1988, combined with recent work on the geometry of Strassen's algorithm by Chiantini, Ikenmeyer, Landsberg, and Ottaviani from 2016.


Introduction
The discovery of Strassen's matrix multiplication algorithm [28] was a breakthrough result in computational linear algebra.The study of fast (subcubic) matrix multiplication algorithms initiated by this discovery has become an important area of research (see [3] for a survey and [21] for the currently best upper bound on the complexity of matrix multiplication).Fast matrix multiplication has countless applications as a subroutine in algorithms for a wide variety of problems, see e.g.[7, §16] for numerous applications in computational linear algebra.In practice, algorithms more sophisticated than Strassen's are almost never implemented, but Strassen's algorithm is used for multiplication of large matrices (see [12,25,19] on practical fast matrix multiplication).
The core of Strassen's result is an algorithm for multiplying 2 × 2 matrices with only 7 multiplications instead of 8.It is a bilinear algorithm, which means that it arises from a decomposition of the form where u k and v k are cleverly chosen linear forms on the space of 2 × 2 matrices and W k are seven explicit 2 × 2 matrices.Because of this structure it can be applied to block matrices, and its recursive application results in an algorithm for the multiplication of two n × n matrices using O(n log 2 7 ) arithmetic operations (see [7, §15.2] or [3] for details).
Because of the great importance of Strassen's algorithm, our goal is to understand it on a deep level.In Strassen's original paper, the linear forms u k , v k , and the matrices W k are given, but the verification of the correctness of the algorithm is left to the reader.Unfortunately, such a description does not yield many further immediate insights.
Shortly after Strassen's paper, Gastinel [14] published a proof of the existence of decomposition (⋆) using simple algebraic transformations that is much easier to follow and verify.Many other papers provide alternative descriptions of Strassen's algorithm or proofs of its existence.Brent [4] and Paterson [26] present the algorithm in a graphical form using 4 × 4 diagrams indicating which elements of the two matrices are used.A more formal version of these diagrams are matrices of linear forms, which are used, for example, by Fiduccia [13] (essentially the same proof appears in [29]), Brockett and Dobkin [5] and Lafon [20].Makarov [22] gives a proof that uses ideas of Karatsuba's algorithm for the efficient multiplication of polynomials.Büchi and Clausen [6] connect the existence of Strassen's algorithm to the existence of special bases of the space of 2 × 2 matrices in which the multiplication table has a specific structure (their results are more general and apply not only to matrix multiplication).Alexeyev [1] describes several algorithms for matrix multiplication as embeddings of the matrix algebra into a 7-dimensional nonassociative algebra with a special properties.
Verification of these proofs usually requires simple, but lengthy computations: expansion of explicit decompositions in some basis, multiplication of several matrices or following chains of algebraic transformations in which careful attention to details is required.To obtain a more conceptual proof of the existence of Strassen's algorithm, we do not focus on the explicit algorithm, but on the algebraic properties of the 2 × 2 matrices, their transformations and symmetries of Strassen's algorithm.It is well-known that the decomposition (⋆) is not unique.Given one decomposition, we can obtain another one by applying the identity and using the original decomposition for the product in the square brackets.Alternatively, we can talk about 2 × 2 matrices as linear maps between 2-dimensional vector spaces.Any choice of bases in these vector spaces gives a new bilinear algorithm.De Groote [18] proved that the algorithm with seven multiplications is unique up to these transformations (this result is also announced without a proof in [23], see also [24]).Thus, Strassen's algorithm is unique in this sense and there should be a coordinate-free description of this algorithm which does not use explicit matrices.One such description is given in [10] and the proof of its correctness uses the fact that matrix multiplication is the unique (up to scale) bilinear map invariant under the transformations described above.This is a nontrivial fact which requires representation theory to prove.Moreover, the verification of the correctness in [10] is left to the reader.
Symmetries of Strassen's algorithm are also useful for its understanding.Clausen [11] gives a description of Strassen's algorithm in terms of special bases, as in [6], and notices that the elements of these bases form orbits under the action of the symmetric group S 3 on the space of 2 × 2 matrices defined via conjugation with specific matrices, i. e., Strassen's algorithm is invariant under this action.Clausen's construction is also describled in [7, Ch.1].Grochow and Moore [16,17] generalize Clausen's construction to n × n matrices using other finite group orbits.Another symmetry is only apparent in the trilinear representation of the algorithm: the decompositions (⋆) are in one-to-one correspondence with decompositions of the trilinear form tr(XY Z) of the form where u k , v k and w k are linear forms.The decomposition corresponding to Strassen's algorithm is then invariant under the cyclic permutation of matrices X, Y, Z.This symmetry is exploited in the proof of Chatelin [9], which uses properties of polynomials invariant under this symmetry.He also notices the importance of a matrix which is related to the S 3 symmetry discussed above.The symmetries of Strassen's algorithm are explored in detail in [8,10].Several earlier publications note their importance [15,27].The paper [2] explores symmetries of algorithms for 3 × 3 matrix multiplication.
In this paper we provide a proof of Strassen's result which is • coordinate-free we do not use explicit matrices, which allows us to focus on the algebraic properties required to prove the correctness of the algorithm.We avoid all tedious explicit calculations, in particular any expansions of expressions and any verification of explicit sign cancellations.Our proof can be seen as a coordinate-free version of Clausen's construction.
• elementary our proof uses only simple facts from basic linear algebra and does not require knowledge of representation theory.This is also why we do not use tensor language.Proofs from [10] and [17] are based on more complicated mathematics and may offer other insights.
Formally, the result that we prove is the following.

Preliminaries from linear algebra
The trace tr(A) of a square matrix A is the sum of its diagonal entries.If tr(A) is zero, then the matrix A is called traceless.Taking the trace of a product of (rectangular) matrices is invariant under cyclic shifts: . As a consequence, the trace of a matrix is invariant under conjugations: tr(B −1 AB) = tr(ABB −1 ) = tr(A).Another implication is that if u is a column vector and v T is a row vector, then v T u = tr(v T u) = tr(uv T ).
The characteristic polynomial of a 2 × 2 matrix A is λ 2 − tr(A)λ + det(A).The Cayley Hamilton theorem says that substituting A for λ yields the zero matrix.

Rotational symmetry
In this section we collect some standard facts about rotation matrices.We think of the 2 × 2 matrix D as a rotation of the plane by 120 • , but to make our approach work over every field we use a more algebraic definition for D.
Let D have determinant 1 and trace −1, that is, D has characteristic polynomial λ 2 + λ + 1.We assume that D is not a multiple of the identity id (this is implicitly satisfied if the characteristic is not 3).For example, we could choose D = 0 −1 1 −1 , the matrix that cyclically permutes the three vectors For every column vector u define u ⊥ as the row vector satisfying conditions u ⊥ u = 0 and u ⊥ Du = 1.If u is not an eigenvector of D, then u and Du are linearly independent, so u ⊥ is uniquely defined.If, on the other hand, u is an eigenvector of D, the two conditions are inconsistent and u ⊥ is undefined.
We fix a vector u that is not an eigenvector of D and define u ⊥ as above.In our example we could choose A first simple observation relates u ⊥ and (Du) ⊥ : Proof.We need to verify the two defining properties for (Du) ⊥ .We have The following observation complements the fact that u ⊥ Du = 1.
Proof.Using Claim 2 we have id +D + D −1 = 0 and thus Since u ⊥ u = 0 and u ⊥ Du = 1, the claim follows.

Seven multiplications suffice
In this section we apply our structural insights from Section 3 to prove Theorem 1.We set M := uu ⊥ .Clearly tr(M ) = u ⊥ u = 0 and we obtain the following identities that can be used to simplify products of M , D, and D −1 : where in the last line we used Claim 4.
By Claim 2, conjugation with D is a map of order 3 on the vector space of all 2 × 2 matrices, i.e., for any matrix A there is a triple of conjugates A → D −1 AD → DAD −1 → A. Moreover, if A is traceless, then so are its conjugates.Claim 6.The matrices M , D −1 M D, and DM D −1 form a basis of the vector space of traceless matrices.
Proof.Since M is traceless, its conjugates are also traceless.Hence it is enough to prove that M , D −1 M D and DM D −1 are linearly independent.
Since u is not an eigenvector of D, the vectors u and Du are linearly independent and thus form a basis of the space of column vectors.The row vectors u ⊥ and u ⊥ D −1 = (Du) ⊥ (Claim 3) are orthogonal to u and Du, respectively.Therefore they form a basis of the space of row vectors.Thus, the four matrices Using the properties D 2 = D −1 , D −2 = D and M 2 = 0 from Claim 2 and Claim 5, we can write down the multiplication table with respect to these two bases.We further simplify it using the identities M DM = M and M D −1 M = −M from Claim 5. Note that the x i are linear forms in the entries of X and the y j are linear forms in the entries of Y .We expand the product XY and group together summands according to the table: This finishes the proof.
Remark.Taking the trace in (4.1) and using the fact that M and its conjugates are traceless, we see that tr(X) = x 1 tr(D) = −x 1 , and tr(Y ) = −y 1 .Thus the first of the 7 summands is tr(X) tr(Y ) id.
form a basis of the space of 2 × 2 matrices.The matrices M and DM D −1 are contained in this basis.Adding up all four matrices, we get (id +D)M (id +D −1 ), which can be simplified to (−D −1 )M (−D) = D −1 M D using Claim 2. Therefore the matrices M , DM D −1 , D −1 M D are linearly independent.Since D and D −1 have trace −1 = 0 (Claim 2), adding D or D −1 to the basis in Claim 6 yields two bases for the full space of 2 × 2 matrices: {D, M, D −1 M D, DM D −1 } and {D −1 , M, D −1 M D, DM D −1 }.

Proof of Theorem 1 . 1 Y
Notice that in the body of the table only (scalar multiples of) 7 matrices are used, and the entries are aligned in such a way that two occurrences of the same matrix are either in the same row or in the same column.At this point we are done proving Theorem 1, because the existence of such a pattern gives a simple way to construct a matrix multiplication algorithm as follows.To multiply matrices A and B, represent them in the bases {D, M, D −1 M D, DM D −1 } and {D −1 , M, D −1 M D, DM D −1 }, respectively:X = x 1 D + x 2 M + x 3 D −1 M D + x 4 DM D −= y 1 D −1 + y 2 M + y 3 D −1 M D + y 4 DM D −1 (4.1)