1 Introduction

The discovery of Strassen’s matrix multiplication algorithm [28] was a breakthrough result in computational linear algebra. The study of fast (subcubic) matrix multiplication algorithms initiated by this discovery has become an important area of research (see [3] for a survey and [21] for the currently best upper bound on the complexity of matrix multiplication). Fast matrix multiplication has countless applications as a subroutine in algorithms for a wide variety of problems, see e.g. [7, §16] for numerous applications in computational linear algebra. In practice, algorithms more sophisticated than Strassen’s are rarely implemented, but Strassen’s algorithm is used for multiplication of large matrices (see [13, 19, 25] on practical fast matrix multiplication).

The core of Strassen’s result is an algorithm for multiplying \(2 \times 2\) matrices with only 7 multiplications instead of 8. It is a bilinear algorithm, which means that it arises from a decomposition of the form

figure a

where \(u_k\) and \(v_k\) are cleverly chosen linear forms on the space of \(2 \times 2\) matrices and \(W_k\) are seven explicit \(2 \times 2\) matrices. Because of this structure it can be applied to block matrices, and its recursive application results in an algorithm for the multiplication of two \(n \times n\) matrices using \(O(n^{\log _2 7})\) arithmetic operations (see [7, §15.2] or [3] for details).

Because of the great importance of Strassen’s algorithm, our goal is to understand it on a deep level. In Strassen’s original paper, the linear forms \(u_k\), \(v_k\), and the matrices \(W_k\) are given, but the verification of the correctness of the algorithm is left to the reader. Unfortunately, such a description does not yield many further immediate insights.

Shortly after Strassen’s paper, Gastinel [15] published a proof of the existence of decomposition (\(\star \)) using simple algebraic transformations that is much easier to follow and verify. Many other papers provide alternative descriptions of Strassen’s algorithm or proofs of its existence. Brent [4] and Paterson [26] present the algorithm in a graphical form using \(4 \times 4\) diagrams indicating which elements of the two matrices are used. A more formal version of these diagrams are matrices of linear forms, which are used, for example, by Fiduccia [14] (the same proof appears in [29]), Brockett and Dobkin [5] and Lafon [20]. Makarov [22] gives a proof that uses ideas of Karatsuba’s algorithm for the efficient multiplication of polynomials. Büchi and Clausen [6] connect the existence of Strassen’s algorithm to the existence of special bases of the space of \(2 \times 2\) matrices in which the multiplication table has a specific structure (their results are more general and apply not only to matrix multiplication). Alexeyev [1] describes several algorithms for matrix multiplication as embeddings of the matrix algebra into a 7-dimensional nonassociative algebra with a special properties.

Sometimes the clever use of sparsity makes a proof rather short (e.g. [14]), but usually the verification of these proofs requires simple but somewhat lengthy computations: expansion of explicit decompositions in some basis, multiplication of several matrices or following chains of algebraic transformations in which careful attention to details is required. To obtain a more conceptual proof of the existence of Strassen’s algorithm, we do not focus on the explicit algorithm, but on the algebraic properties of the \(2 \times 2\) matrices, their transformations and symmetries of Strassen’s algorithm. It is well-known that the decomposition (\(\star \)) is not unique. Given one decomposition, we can obtain another one by applying the identity

$$\begin{aligned} XY = A^{-1} \left[ (A X B^{-1}) (B Y C^{-1}) \right] C \end{aligned}$$

and using the original decomposition for the product in the square brackets. Alternatively, we can talk about \(2 \times 2\) matrices as linear maps between 2-dimensional vector spaces. Any choice of bases in these vector spaces gives a new bilinear algorithm. De Groote [12] proved that the algorithm with seven multiplications is unique up to these transformations (this result is also announced without a proof in [23], see also [24]). Thus, Strassen’s algorithm is unique in this sense and there should be a coordinate-free description of this algorithm which does not use explicit matrices. One such description is given in [10] and the proof of its correctness uses the fact that matrix multiplication is the unique (up to scale) bilinear map invariant under the transformations described above. This is a nontrivial fact which requires representation theory to prove. Moreover, the verification of the correctness in [10] is left to the reader.

Symmetries of Strassen’s algorithm are also useful for its understanding. Clausen [11] gives a description of Strassen’s algorithm in terms of special bases, as in [6], and notices that the elements of these bases form orbits under the action of the symmetric group \(S_3\) on the space of \(2 \times 2\) matrices defined via conjugation with specific matrices, i. e., Strassen’s algorithm is invariant under this action. Clausen’s construction is also describled in [7, Ch.1]. Grochow and Moore [17, 18] generalize Clausen’s construction to \(n \times n\) matrices using other finite group orbits. Another symmetry is only apparent in the trilinear representation of the algorithm: the decompositions (\(\star \)) are in one-to-one correspondence with decompositions of the trilinear form \(\mathop {{\text {tr}}}(XYZ)\) of the form

$$\begin{aligned} \mathop {{\text {tr}}}(XYZ) = \sum _{k = 1}^7 u_k(X) v_k(Y) w_k(Z) \end{aligned}$$

where \(u_k\), \(v_k\) and \(w_k\) are linear forms. The decomposition corresponding to Strassen’s algorithm is then invariant under the cyclic permutation of matrices XYZ. This symmetry is exploited in the proof of Chatelin [9], which uses properties of polynomials invariant under this symmetry. He also notices the importance of a matrix which is related to the \(S_3\) symmetry discussed above. The symmetries of Strassen’s algorithm are explored in detail in [8, 10]. Several earlier publications note their importance [16, 27]. The paper [2] explores symmetries of algorithms for \(3 \times 3\) matrix multiplication.

In this paper we provide a proof of Strassen’s result which is

  • coordinate-free we do not use explicit matrices, which allows us to focus on the algebraic properties required to prove the correctness of the algorithm. We avoid all tedious explicit calculations, in particular any expansions of expressions and any verification of explicit sign cancellations. Our proof can be seen as a coordinate-free version of Clausen’s construction.

  • elementary our proof uses only simple facts from basic linear algebra and does not require knowledge of representation theory. This is also why we do not use tensor language. Proofs from [10] and [18] are based on more complicated mathematics and may offer other insights.

Formally, the result that we prove is the following.

Theorem 1

(Strassen [28]) Fix any field \({\mathbb {F}}\). There exist fourteen linear forms \(u_1,\ldots ,u_7, v_1,\ldots ,v_7 :{\mathbb {F}}^{2 \times 2} \rightarrow {\mathbb {F}}\) and seven matrices \(W_1,\ldots , W_7 \in {\mathbb {F}}^{2 \times 2}\) such that for all pairs of \(2 \times 2\) matrices X and Y the product satisfies

figure b

2 Preliminaries from linear algebra

If \(u_1, \ldots , u_n\) and \(v_1, \ldots , v_m\) form bases of the spaces of column vectors \(F^{n \times 1}\) and row vectors \(F^{1 \times m}\) respectively, then the nm products of the form \(u_i v_j\) form a basis of the space of matrices \(F^{n \times m}\)

The trace\(\mathop {{\text {tr}}}(A)\) of a square matrix A is the sum of its diagonal entries. If \(\mathop {{\text {tr}}}(A)\) is zero, then the matrix A is called traceless. Taking the trace of a product of (rectangular) matrices is invariant under cyclic shifts: \(\mathop {{\text {tr}}}(A_1 A_2 \cdots A_n) = \mathop {{\text {tr}}}(A_2 \cdots A_n A_1)\). As a consequence, the trace of a matrix is invariant under conjugations: \(\mathop {{\text {tr}}}(B^{-1}AB) = \mathop {{\text {tr}}}(ABB^{-1}) = \mathop {{\text {tr}}}(A)\). Another implication is that if u is a column vector and \(v^T\) is a row vector, then \(v^T u = \mathop {{\text {tr}}}(v^T u) = \mathop {{\text {tr}}}(u v^T)\).

The characteristic polynomial of a \(2 \times 2\) matrix A is \(\lambda ^2 - \mathop {{\text {tr}}}(A)\lambda + \det (A)\). The Cayley—Hamilton theorem says that substituting A for \(\lambda \) yields the zero matrix.

3 Rotational symmetry

In this section we collect some standard facts about rotation matrices. We think of the \(2 \times 2\) matrix D as a rotation of the plane by \(120^\circ \), but to make our approach work over every field we use a more algebraic definition for D.

Let D have determinant 1 and trace \(-1\), that is, D has characteristic polynomial \(\lambda ^2 + \lambda + 1\). We assume that D is not a multiple of the identity \(\mathop {{\text {id}}}\) (this is implicitly satisfied if the characteristic is not 3). For example, we could choose \(D = \begin{bmatrix} 0&-1 \\ 1&-1 \end{bmatrix}\), the matrix that cyclically permutes the three vectors \(\begin{pmatrix}1\\ 0\end{pmatrix}\), \(\begin{pmatrix}0\\ 1\end{pmatrix}\), \(\begin{pmatrix}-1\\ -1\end{pmatrix}\).

Claim 2

For the matrix D we have \(D^3 = \mathop {{\text {id}}}\), \(D^{-1} = D^2\), \(D^{-2} = D\). Additionally, D has the following properties: \(\mathop {{\text {id}}}+ D + D^{-1} = 0\) and \(\mathop {{\text {tr}}}(D^{-1})=-1\).

Proof

The characteristic polynomial of D is \(\lambda ^2 + \lambda + 1\). By the Cayley—Hamilton theorem \(D^2 + D + \mathop {{\text {id}}}= 0\). Multiplying by D we obtain \(D+D^2+D^3=0=\mathop {{\text {id}}}+D+D^2\) and hence \(D^3 = \mathop {{\text {id}}}\). Consequently, \(D^{-1} = D^2\) and \(D^{-2} = D\). Using \(D^{-1}=D^2\) we get \(\mathop {{\text {id}}}+ D + D^{-1} = 0\). This implies \(\mathop {{\text {tr}}}(D^{-1}) = - \mathop {{\text {tr}}}(\mathop {{\text {id}}}) - \mathop {{\text {tr}}}(D) = -1\). \(\square \)

For every column vector u define \(u^{\perp }\) as the row vector satisfying conditions \(u^{\perp } u = 0\) and \(u^{\perp } D u = 1\). If u is not an eigenvector of D, then u and Du are linearly independent, so \(u^{\perp }\) is uniquely defined. If, on the other hand, u is an eigenvector of D, the two conditions are inconsistent and \(u^{\perp }\) does not exist.

We fix a vector u that is not an eigenvector of D and define \(u^{\perp }\) as above. In our example we could choose \(u=\begin{pmatrix}1\\ 0\end{pmatrix}\), which is not an eigenvector of \(\begin{bmatrix} 0&-1 \\ 1&-1 \end{bmatrix}\).

A first simple observation relates \(u^\perp \) and \((Du)^{\perp }\):

Claim 3

\(u^\perp D^{-1} = (D u)^\perp \).

Proof

We need to verify the two defining properties for \((D u)^\perp \). We have \((u^\perp D^{-1})(D u) = u^\perp u = 0\) and \((u^\perp D^{-1}) D (D u) = u^\perp D u = 1\) as required. \(\square \)

The following observation complements the fact that \(u^\perp D u =1\).

Claim 4

\(u^\perp D^{-1} u=-1\).

Proof

Using Claim 2 we have \(\mathop {{\text {id}}}+D+D^{-1}=0\) and thus

$$\begin{aligned} u^\perp u+u^\perp Du+u^\perp D^{-1}u=0. \end{aligned}$$

Since \(u^\perp u = 0\) and \(u^\perp D u = 1\), the claim follows. \(\square \)

4 Seven multiplications suffice

In this section we apply structural properties from Sect. 3 to prove Theorem 1. We set \(M := u u^\perp \). Clearly \(\mathop {{\text {tr}}}(M) = u^\perp u = 0\) and we obtain the following identities that can be used to simplify products of M, D, and \(D^{-1}\):

Claim 5

\(M^2 = 0\) and \(MDM = M\) and \(M D^{-1} M = -M\).

Proof

$$\begin{aligned} M^2= & {} (u u^\perp ) (u u^\perp ) = u (u^\perp u) u^\perp = 0.\\ MDM= & {} (u u^\perp ) D (u u^\perp ) = u (u^\perp D u) u^\perp = u u^\perp = M.\\ MD^{-1}M= & {} (u u^\perp ) D^{-1} (u u^\perp ) = u (u^\perp D^{-1} u) u^\perp = -u u^\perp = -M, \end{aligned}$$

where in the last line we used Claim 4. \(\square \)

By Claim 2, conjugation with D is a map of order 3 on the vector space of all \(2 \times 2\) matrices, i.e. for any matrix A there is a triple of conjugates \(A \mapsto D^{-1}AD \mapsto DAD^{-1} \mapsto A\). Moreover, if A is traceless, then so are its conjugates.

Claim 6

The matrices M, \(D^{-1}MD\), and \(DMD^{-1}\) form a basis of the vector space of traceless matrices.

Proof

Since M is traceless, its conjugates are also traceless. Hence it is enough to prove that M, \(D^{-1}MD\) and \(DMD^{-1}\) are linearly independent.

Since u is not an eigenvector of D, the vectors u and Du are linearly independent and thus form a basis of the space of column vectors. The row vectors \(u^{\perp }\) and \(u^{\perp } D^{-1} = (Du)^{\perp }\) (Claim 3) are orthogonal to u and Du, respectively. Therefore they form a basis of the space of row vectors. Thus, the four matrices

$$\begin{aligned} u \cdot u^\perp = M,\quad u \cdot u^\perp D^{-1} = MD^{-1},\quad D u \cdot u^\perp = DM, \quad D u \cdot u^\perp D^{-1} = DMD^{-1} \end{aligned}$$

obtained as products of these basis vectors form a basis of the space of \(2 \times 2\) matrices. The matrices M and \(DMD^{-1}\) are contained in this basis. Adding up all four matrices, we get \((\mathop {{\text {id}}}+ D) M (\mathop {{\text {id}}}+ D^{-1})\), which can be simplified to \((-D^{-1}) M (-D) = D^{-1}MD\) using Claim 2. Therefore the matrices M, \(DMD^{-1}\), \(D^{-1}MD\) are linearly independent. \(\square \)

Since D and \(D^{-1}\) have trace \(-1 \ne 0\) (Claim 2), adding D or \(D^{-1}\) to the basis in Claim 6 yields two bases for the full space of \(2 \times 2\) matrices: \(\{ D, M, D^{-1}MD, DMD^{-1} \}\) and \(\{ D^{-1}, M, D^{-1}MD, DMD^{-1} \}\).

Using the properties \(D^2 = D^{-1}\), \(D^{-2} = D\) and \(M^2 = 0\) from Claim 2 and Claim 5, we can write down the multiplication table with respect to these two bases. We further simplify it using the identities \(MDM = M\) and \(MD^{-1}M = -M\) from Claim 5.

figure c

Proof of Theorem 1

Notice that in the body of the table only (scalar multiples of) 7 matrices are used, and the entries are aligned in such a way that two occurrences of the same matrix are either in the same row or in the same column. At this point we are done proving Theorem 1, because the existence of such a pattern gives a simple way to construct a matrix multiplication algorithm as follows. To multiply matrices X and Y, represent them in the bases \(\{ D, M, D^{-1}MD, DMD^{-1} \}\) and \(\{ D^{-1}, M, D^{-1}MD, DMD^{-1} \}\), respectively:

$$\begin{aligned} X= & {} x_1 D + x_2 M + x_3 D^{-1}MD + x_4 DMD^{-1} \nonumber \\ Y= & {} y_1 D^{-1} + y_2 M + y_3 D^{-1}MD + y_4 DMD^{-1} \end{aligned}$$
(4.1)

Note that the \(x_i\) are linear forms in the entries of X and the \(y_j\) are linear forms in the entries of Y. We expand the product XY and group together summands according to the table:

This finishes the proof. \(\square \)

Remark

Taking the trace in (4.1) and using the fact that M and its conjugates are traceless, we see that \(\mathop {{\text {tr}}}(X)=x_1 \mathop {{\text {tr}}}(D) = -x_1\), and \(\mathop {{\text {tr}}}(Y)=-y_1\). Thus the first of the 7 summands is \(\mathop {{\text {tr}}}(X)\mathop {{\text {tr}}}(Y)\mathop {{\text {id}}}\).