Keywords

1 Introduction

Many cryptographic schemes build on finite fields as their underlying mathematic structure. In almost all cases, the schemes can be designed without having to specify a concrete representation of the finite field in advance. However, when finally being implemented in practice, one necessarily has to choose a particular representation of the finite field, basically as bit strings. In general, this choice does not influence the security of the scheme, but might well influence the performance of the resulting implementation. In this work we focus on this choice of field representations and derive theoretical results on how to choose an optimal field representation with respect to multiplication with fixed field elements. Before going into details, we elaborate on this setup in the special case of symmetric cryptography.

Symmetric Cryptographic Primitives. Build the back-bone of virtually any secure communication today. Block ciphers and hash functions can be seen as the workhorses in cryptography, used for encrypting and authenticating the largest part of the workload.

Today, we are in the comfortable situation of having at hand a choice of strong block ciphers and hash functions that seem secure against even the strongest adversaries with practically unlimited computational resources. Moreover, those primitives are based on rather well-understood design principles that allow to construct efficient, simple and easy to analyze ciphers. Especially in the case of substitution-permutation (SP) networks, following the seminal ideas of AES [9] and its predecessor SQUARE [8], arguing the security of ciphers against the two most powerful generic attacks, that is differential- and linear attacks [6, 23], became significantly easier. In an SP-network the cipher (or the cryptographic permutation) consists of a number of almost identical rounds, each of which consists of a layer of S-boxes and an \(\mathbb {F}_2\)-linear layer to mix those parts.

One of the most important design strategies for those primitives is the so-called wide-trail strategy, initiated in [7], that aims at lower bounding the number of active S-boxes. Here, for a given linear or differential trail, an S-box is called active if its input-mask (resp. input-difference) is non-zero. The main observations of the wide-trail strategy is that it is actually the linear layer that is to a large extent responsible for the security of the primitive against linear and differential attacks. Moreover, the wide-trail strategy allows a natural decoupling of the design choice for a linear layer and an S-box.

Interestingly, for the linear layer not many general constructions are known. Two basic approaches can be identified. On the one hand, an ad-hoc approach, where lower bounding the number of active S-boxes requires computer-aided tools that search (sometimes heuristically) for optimal trails. This approach is used e.g. for Serpent [5] or Keccak [4]. On the other hand, a code-based approach, where the linear layers are chosen in such a way that they correspond to good (often locally optimal) linear codes. This is, most prominently, the case for AES where a Maximum Distance Separable (MDS) code is implemented via the MixColumns operation.

Even in the theoretically better circumstantiated code-based approach many fundamental questions are left open. Here, when using an MDS matrix for (parts of) the linear layer, the main challenge is to choose an MDS matrix that is most suitable for an efficient implementation. As those MDS matrices are usually defined over a finite field with characteristic two, i.e. \(\mathbb {F}_{2^n}\), one important and so far almost unstudied question is the choice of an \(\mathbb {F}_2\)-basis of \(\mathbb {F}_{2^n}\) and its impact on the implementation efficiency.

From a design point of view, one has to choose a linear layer given as a mapping on \(\mathbb {F}_{2^n}^b\) and an \(\mathbb {F}_2\)-basis of \(\mathbb {F}_{2^n}\) to concretely specify the primitive. This is actually a very natural separation of the design of the cipher and its specification (and thus implementation) on bit level. As nicely explained in [10] by introducing RIJNDAEL-GF this separation is probably most obvious for AES itself, but in principle possible for any cipher. Following [10], the choice of basis is to a large extent independent of the design and the security of the cipher. However, the choice of basis might have a significant impact on the efficiency of the cipher on certain platforms.

For software implementations, depending on the details, the choice of basis is either irrelevant (in e.g. a table-based implementation) or hard to capture (in e.g. a bit-sliced implementation) as the efficiency might depend on the exact instructions offered by a given platform. For hardware implementations, one has to distinguish between a serial implementation or a round-based implementation. As the round-based implementation seems most relevant in practice (cf. [27]), we mainly focus on this use-case here. Surprisingly, compared to a serial hardware implementation, the case of a round-based hardware implementation has attracted less attention so far.

For a round-based hardware implementation, the impact of the choice of basis already becomes apparent when focusing on how to implement the multiplication with one given element \(\alpha \) in \(\mathbb {F}_{2^n}\). For different choices of bases, the efficiency of implementations of the resulting \(\mathbb {F}_2\)-linear mappings differs significantly. Thus, the very fundamental task we study in the first part of the paper is:

For a given element \(\alpha \in \mathbb {F}_{2^n}\) find a basis such that multiplication by \(\alpha \) can be implemented most efficiently. Footnote 1

It is worth pointing out that the related question of how to efficiently multiply two arbitrary field elements has been studied extensively in the past.

While the above question is of independent interest, with potentially very different applications, we use our results for designing efficient linear layers. Thus, in the second part, we will give several constructions of MDS matrices. Echoing the above, the construction of our MDS matrices are independent of the choice of the basis – actually to a large extent independent of the field size as well.

The combination of the first part, i.e. how to choose a basis that allows for an optimal implementation, and the second part, i.e. the construction of MDS matrices, finally results in implementations of MDS matrices that are more efficient for a large variety of parameters than the best matrices discussed so far in literature.

Thus, this application serves as a nice example were an improved understanding on how to choose the field representation immediately leads to improved results. This is even more interesting as the construction of efficient MDS matrices has been an active field of research recently.

1.1 Related Work

In particular the construction of efficient serial MDS matrices is a well-studied subject. Considering serial implementations of MDS matrices is based on the initial idea of Guo, Peyrin, and Poschmann used in the design of PHOTON [13] and later in the block cipher LED [14]. In a nutshell the idea is not to implement an MDS matrix directly, but rather implement a matrix A such that \(A^k\) is MDS for some small k. When considering a hardware implementation, it reduces the chip area if implementing A is significantly cheaper than \(A^k\). The circuit implementing A is then iterated k times, which does not increase its size significantly. This basic idea has been further generalized and improved in a series of subsequent papers. In [24, 30] the authors focus on even more efficient choices for A by considering additive, i.e. \(\mathbb {F}_2\)-linear MDS codes. Their approach uses symbolic computations in order to derive general conditions on how to choose the matrix entries independent of the dimension.

In [31] Xu et al. furthermore took into account the cost of implementing the inverse matrix. At FSE 2014, in [2] Augot and Finiasz improved significantly upon the efficiency of the search algorithm of [24], allowing them to search for MDS matrices of much larger dimension than previously possible.

For a round-based implementation, less work has been done so far. The authors of [27] focus on MDS matrices that have an efficient implementation (in terms of the XOR-count) and put special emphasis on involutory MDS matrices, i.e. MDS matrices that are their own inverse. They derive several constructions and rather efficient search methods for MDS matrices meeting their goals. Very recently, Liu and Sim [21] improved upon some of those results by characterizing equivalences in circulant (and circulant-like) MDS matrices and thus further reduced the search space. In both works, in order to improve the efficiency for a given MDS matrix defined over a finite field, the authors considered different representations of the underlying finite fields by running through all possible irreducible polynomials of the given degree. However, in view of the question of how to choose an optimal basis, this corresponds to investigating only a small subset of all possible bases. Work on investigating the XOR-count distribution for other than the polynomial bases has been done very recently in [25].

Also recently, Li and Wang constructed circulant involutory \(\mathbb {F}_2\)-linear MDS matrices [19]. While it was already known that circulant MDS matrices over a finite field cannot be involutory [15], they have shown their existence in the additive case. Independently, the authors of [21] have shown the existence of left-circulant involutory MDS matrices over finite fields.

1.2 Our Contribution

After fixing our notation and recalling basic facts in Sect. 2, in the first part of the paper we focus on the question on how to find an optimal implementation of the multiplication by a given field element \(\alpha \) (cf. Sect. 3). Here efficiency is measured in terms of the number of XOR operations needed to implement the corresponding binary matrix. Note that this metric differs from the XOR-count used in [27]. In [27] the XOR-count of an \(n\times n\) matrix M was defined as the number of ones in M minus n. However, the number of (additional) ones in a matrix does not necessarily correspond to the number of XOR operations needed for implementation. Thus, while the number of ones in M is certainly an easier to handle metric, in our opinion it is more appropriate to consider the actual number of XOR operations as the efficiency metric. Note that this improved notion was also discussed in [16]. For technical reasons, we focus on the number of XOR operations without temporary registers, i.e. in-place XOR operations. One of our main results in this first part of the paper is, that for a non-trivial element \(\alpha \) one can find a basis such that the resulting matrix can be implemented with one XOR operation if and only if the characteristic polynomial of \(\alpha \) is an irreducible trinomial. Note that an XOR-count equal to one in our notion coincides with the definition of the XOR-count in [27]. The interesting part here is that the condition on the characteristic polynomial is not only sufficient but also necessary. As an immediate consequence, one cannot hope to implement the multiplication by any element \(\alpha \ne 1\) in \(\mathbb {F}_{2^8}^{*}\) with one XOR only. This follows by the above and the well-known fact that there are no irreducible trinomials of degree 8 [28].

We furthermore show that, for any given basis, there are at most two (non-trivial) elements \(\alpha \) and \(\beta \) such that the multiplication with those elements can be implemented with one XOR operation. In fact, \(\beta \) is necessarily the multiplicative inverse of \(\alpha \).

While the weight of the (irreducible) characteristic polynomial of an element \(\alpha \) clearly gives an upper bound of the number of XOR operations needed to implement the corresponding multiplication, we show that this bound is in general not tight in the case were the characteristic polynomial is of weight larger than three.

In particular, for all elements \(\alpha \in \mathbb {F}_{2^n}^*\) with \(n \le 8\) we present an optimal representation such that the multiplication with \(\alpha \) can be implemented with a minimal number of XOR operations. For all those elements \(\alpha \), that are not contained in a proper subfield of \(\mathbb {F}_{2^n}\), the multiplication can be implemented with at most 3 XOR operations (and often with two only). Those results are given in Tables 3, 4, 5, 6 and 7 and cover the cases which are most relevant for symmetric cryptography. Interestingly, and maybe counter-intuitive, multiplication with non trivial elements in a proper subfield turns out to be among the most expensive in all the cases explored here.

Moreover, for all \(n \le 2048\) for which no irreducible trinomial of degree n exists, we present one element \(\alpha \in \mathbb {F}_{2^n}\) such that multiplication by \(\alpha \) requires two XOR operations, cf. Table 8. Those results are proven optimal by the above mentioned necessary and sufficient condition.

In the second part of the paper (cf. Sect. 4) we present several (circulant) matrices. Entries in those matrices are represented as powers of a generic field element \(\alpha \). By symbolically computing all minors, i.e. the determinants of all square submatrices, we derive a list of polynomials in \(\mathbb {F}_2[\alpha ]\). Now, whenever \(\alpha \) is chosen such that it is not a root of any of those polynomials, the matrix is MDS. One nice consequence of this approach is that, as the degree of those polynomials is limited, our matrices are MDS for almost all elements in \(\mathbb {F}_{2^n}\) as soon as n is large enough, i.e. larger than the maximal degree of those polynomials.

Finally, the first and second part are combined in Sect. 4.2 to result in the most efficient MDS matrices in terms of the XOR-count known so far. A summary of our results and comparison with previous work is given in Tables 1 and 2, respectively. The main observation here is that if multiplication by \(\alpha \) can be implemented with t XOR operations, then multiplication by \(\alpha ^{\pm i}\) for \(i \ge 0\) can be implemented with at most \(t\cdot i\) XOR operations.Footnote 2 Thus, by simply minimizing the sum of the (absolute) exponents for our circulant MDS matrices, we immediately reduce the XOR-count.

As an interesting side result, we like to point out that the XOR-count per bit actually decreases with increasing field size.Footnote 3 For example, our \(4\times 4\) MDS matrices have a per bit XOR-count of \(3+\frac{3}{n}\), or \(3+\frac{6}{n}\) in the case that no irreducible trinomial of degree n exists.

Thus, even so reducing the number of XOR operations has already received considerable attention recently, this part nicely shows that our improved understanding of how to choose an optimal basis allows us to easily improve upon known constructions. Note that such improvements are possible independent from which XOR-count definition is used, that is, we were able to improve existing results also in the old XOR-count definition by changing the basis. For example, we found an element in \(\mathbb {F}_{2^8}\) with only 2 additional non-zero entries which directly improves the results of [27].

Finally, in Sect. 5 we give a perspective on non-linear, additive MDS matrices. In particular, we point out that while there exists no \(\alpha \in \mathbb {F}_{2^8}\) (resp. \(\mathbb {F}_{2^{13}}\), \(\mathbb {F}_{2^{16}}\)) which can be implemented with only one XOR operation, there does exist an \(8\times 8\) (resp. \(13\times 13\), \(16\times 16\)) binary matrix, that can be used in place for the multiplication by \(\alpha \) in the above mentioned \(4\times 4\) matrix to result in an additive MDS matrix with reduced cost.Footnote 4 Again, the idea of considering the entries of the matrix as powers of a single field element is beneficial as the conditions for the matrix to be MDS remain basically unchanged.

We conclude the paper by pointing to some interesting questions for future investigations.

2 Preliminaries

If p is a prime, we denote the finite field with p elements by \(\mathbb {F}_p\) and the extension field with \(p^n\) elements by \(\mathbb {F}_{p^n}\), respectively. In this work, we consider binary fields, thus \(p=2\). Although there exists up to isomorphism only one finite field for every possible order, we are interested in the specific representation. For instance, if \(q \in \mathbb {F}_2[x]\) is an irreducible polynomial of degree n, then \(\mathbb {F}_{2^n} \cong \mathbb {F}_{2}[x] / (q)\) where (q) denotes the ideal generated by q. The multiplicative group of some field K is denoted by \(K^{*}\). By the term matrix, we refer to matrices with entries in \(\mathbb {F}_2\). In general, the ring of \(n \times n\) matrices over a field K will be denoted by \({{\mathrm{Mat}}}_n(K)\). The symbol \(\mathbf {0}_n\) will denote the zero matrix and \(I_n\) will be the identity matrix. As a third important type of matrix in \({{\mathrm{Mat}}}_n(\mathbb {F}_2)\), we introduce \(E_{i,j}\) which consist of all zeros except in the i-th row of the j-th column for \(i,j \in \{1,\dots ,n\}\). We denote a block diagonal matrix consisting of d matrix blocks \(A_k\) as \(\bigoplus _{k=1}^d A_k\). By \({{\mathrm{wt}}}(A)\), we denote the number of non-zero entries of a matrix A. Analogously, \({{\mathrm{wt}}}(q)\) denotes the number of non-zero coefficients of a polynomial q.

2.1 Some Basic Facts About Linear Transformations

We next recall some basics about finite fields and matrix representations. For more background the reader is referred to e.g. [20, Sect. 2.5] and [29]. Let \(V \cong K^n\) be a finite-dimensional vector space over the field K. Every linear mapping \(f: V \rightarrow V\) can be described as \(v \mapsto A_Bv\) by a left-multiplication with a matrix \(A_B \in {{\mathrm{Mat}}}_n(K)\). This representation is dependent on the choice of the basis B for V. For instance, if \(B = \{b_1,\dots b_n\}\), the j-th column of \(A_B\) consists of the coefficients \(a_{1,j},\dots ,a_{n,j}\) of \(f(b_j)=\sum _{i=1}^n a_{i,j} b_i\). Thus, changing the basis from B to \(B'\) results in a different matrix representation of f. This transformation is called the change of basis transformation, which is simply a conjugation of \(A_B\). Thus, \(A_{B'} = T A_{B} T^{-1}\) using an invertible matrix T. In this case, \(A_B\) and \(A_{B'}\) are are called similar (resp. permutation-similar if T is a permutation matrix).

There is a natural way of representing the elements in a finite field with characteristic p as vectors with coefficients in \(\mathbb {F}_p\). In the following, we consider the representation of the multiplication by \(\alpha \) by a matrix as described in the following diagram.

figure a

The bijection \(\varPhi _B\) maps elements \(\alpha \in \mathbb {F}_{2^n}\) to its vectorial representation over \(\mathbb {F}_2\) with regard to a basis B (and \(\varPhi _B^{-1}\) vice versa). \(M_{\alpha ,B}\) denotes the \(n \times n\) matrix representing (left-) multiplication by the element \(\alpha \). For different bases B and \(B'\), one can obtain \(M_{\alpha ,B'}\) from \(M_{\alpha ,B}\) by the change of basis transformation, in particular \(M_{\alpha ,B'} = T M_{\alpha ,B} T^{-1}\) for an invertible T. We denote similarity of matrices with the relation symbol \(\sim \), (resp. \(\sim _{\pi }\) for permutation-similarity). The characteristic polynomial of a matrix A is defined as \(\chi _A := \det (\lambda I - A) \in \mathbb {F}_2[\lambda ]\) and the minimal polynomial is denoted by \(m_A\). Recall that the minimal polynomial is the (monic) polynomial p of least degree, such that \(p(A)=\mathbf {0}_n\). It is a well-known fact that the minimal polynomial divides the characteristic polynomial, thus \(\chi _A(A) = \mathbf {0}_n\). As the minimal polynomial and the characteristic polynomial are actually properties of the underlying linear mapping, similar matrices have the same characteristic and the same minimal polynomial.

A special type of matrix, that will play an important role in the following is the companion matrix of a polynomial. For a polynomial

$$\begin{aligned} q = x^n + q_{n-1}x^{n-1} + \dots + q_1x + q_0 \in \mathbb {F}_2[x] \end{aligned}$$

of degree n, the companion matrix of q is defined as

$$\begin{aligned} C_q = \begin{pmatrix} 0 &{} &{} &{} &{} q_0 \\ 1 &{} 0 &{} &{} &{} q_1 \\ &{} \ddots &{} \ddots &{} &{} \vdots \\ &{} &{} 1 &{} 0 &{} q_{n-2} \\ &{} &{} &{} 1 &{} q_{n-1} \end{pmatrix}. \end{aligned}$$

It is known from linear algebra that the characteristic polynomial and the minimal polynomial of \(C_q\) are equal to q itself, i.e. \(\chi _{C_q} = m_{C_q} = q\). In addition, any matrix A is similar to a companion matrix if and only if its characteristic polynomial coincides with its minimal polynomial. In particular, \(C_q\) is exactly the rational canonical form [11, Sect. 12.2] of A in this case.

2.2 The XOR-Count and the Cycle Normal Form

The XOR-count of a field element was already studied in [17, 27]. In the formal definition in [27], an invertible n-dimensional matrix A has an XOR-count of t if and only if A can be written as a permutation matrix with t additional non-zero entries. Formally, \(A = P + \sum _{k=1}^t E_{i_k,j_k}\) and \({{\mathrm{wt}}}(A) = n+t\). Although all matrices of that structure can be implemented with at most t XOR operations (not necessarily without temporary registers), the construction does not contain all possible matrices which are realizable with at most t XOR operations. For instance, there are matrices with three additional non-zero entries such that the result of their defining linear function can be computed with just two additions. As an example, consider

$$\begin{aligned} \begin{pmatrix} 1 &{} 0 &{} 1 \\ 1 &{} 1 &{} 1 \\ 0 &{} 0 &{} 1 \end{pmatrix} \begin{pmatrix} v_1 \\ v_2 \\ v_3 \end{pmatrix} = \begin{pmatrix} v_1 + v_3 \\ (v_1 + v_3) + v_2 \\ v_3 \end{pmatrix}. \end{aligned}$$

In the following, we provide an alternative definition which includes the cases described above.

Definition 1

An invertible matrix A has an XOR-count of t, denoted \({{\mathrm{wt_{\oplus }}}}(A) = t\), if t is the minimal number such that A can be written as

$$\begin{aligned} A = P \prod _{k=1}^t (I+E_{i_k,j_k}) \end{aligned}$$

with \(i_k \ne j_k\) for all k.

Note that if a matrix can be represented in the form \(P \prod _{k=1}^t (I+E_{i_k,j_k})\), the number of factors \((I+E_{i_k,j_k})\) clearly gives an upper bound on the actual XOR-count. It is worth pointing out that the definition above just counts the number of XOR operations without using temporary registers. Those are technically somewhat easier to handle. However, this restriction does not make a difference for matrices with XOR-count less or equal to 2, which we are most concerned about in in the following. In general, allowing temporary registers might well reduce the number of XOR operations needed for an implementation.

Our definition coincides with the one from [27] for the case that \(t=1\), that is, for matrices of XOR-count 1. For other cases, the number of additional non-zero entries can increase. We will often consider \(t=2\) within this work. By evaluating the product, it follows that any A with \({{\mathrm{wt_{\oplus }}}}(A)=2\) is of the form

$$\begin{aligned} A = {\left\{ \begin{array}{ll} P + P(E_{i_1,j_1}+E_{i_2,j_2}) &{} \text { iff } i_2 \ne j_1 \\ P + P(E_{i_1,j_1}+E_{i_2,j_2}+E_{i_1,j_2}) &{} \text { iff } i_2 = j_1. \end{array}\right. } \end{aligned}$$

The XOR-count is invariant under permutation-similarity. Moreover, naturally in the setting not allowing temporary registers, the XOR-count is invariant under taking the inverse. This is summarized and formally proven in the following Lemma and Corollary.

Lemma 1

If \(A \sim _{\pi } A'\), then \({{\mathrm{wt_{\oplus }}}}(A) = {{\mathrm{wt_{\oplus }}}}(A')\).

Proof

Let \(A' = QAQ^{-1}\) where Q is the permutation matrix representing the permutation \(\sigma \in S_n\). Let \(I+E_{i_k,j_k}\) be a factor in the XOR-count representation of \(A = P \prod _{k=1}^t (I+E_{i_k,j_k})\) where \(t={{\mathrm{wt_{\oplus }}}}(A)\). Then the following identity holds:

$$\begin{aligned} (I+E_{i_k,j_k})Q^{-1} = Q^{-1} + E_{i_k,\sigma ^{-1}(j_k)} = Q^{-1}(I+E_{\sigma (i_k),\sigma ^{-1}(j_k)}). \end{aligned}$$

One is able to commute \(Q^{-1}\) to the front before the first factor by proceeding for all of the t factors and finally obtain

$$\begin{aligned} A' = QPQ^{-1}\prod _{k=1}^t(I+E_{\sigma (i_k),\sigma ^{-1}(j_k)}). \end{aligned}$$

It follows that \({{\mathrm{wt_{\oplus }}}}(A') \le {{\mathrm{wt_{\oplus }}}}(A)\). By reverting the above steps we obtain \({{\mathrm{wt_{\oplus }}}}(A) \le {{\mathrm{wt_{\oplus }}}}(A')\).    \(\square \)

Corollary 1

If \({{\mathrm{wt_{\oplus }}}}(A) = t\), then also \({{\mathrm{wt_{\oplus }}}}(A^{-1}) = t\).

Proof

We show that \(A^{-1}\) is permutation-similar to a matrix with an XOR-count of t.

$$\begin{aligned} \left( P \prod _{k=1}^t (I+E_{i_k,j_k}) \right) ^{-1} = \prod _{k=t}^1 (I+E_{i_k,j_k}) P^{-1} \sim _{\pi } P^{-1} \prod _{k=t}^1 (I+E_{i_k,j_k}) \end{aligned}$$

   \(\square \)

Later, we would like to be able to exhaustively search over all matrices with low XOR-count for a given dimension n. Since the number of permutation matrices (which is n!) rapidly increases with n, an exhaustive search will quickly become infeasible if we do not restrict the structure of P. By a well-known fact from combinatorics, one is able to assume P to be in a specific form.

Lemma 2

For any permutation matrix P of dimension n, it is

$$\begin{aligned} P \sim _{\pi } \bigoplus _{k=1}^{d} C_{x^{m_k}+1} \end{aligned}$$

for some \(m_k\) with \(\sum _{k=1}^d m_k = n\) and \(m_1 \ge \dots \ge m_d \ge 1\).

Proof

It is well-known that two permutations with the same cycle type are conjugate [11, Chapter 4.3, Proposition 11]. That is, given the permutations \(\sigma ,\tau \in S_n\) as

figure b

in cycle notation, one can find some \(\pi \in S_n\) such that \(\pi \sigma \pi ^{-1} = \tau \). This \(\pi \) operates as a relabeling of indices.

Let \(\sigma \) in the form above be the permutation defined by P. Now, there exits a permutation \(\pi \) such that \(\pi \sigma \pi ^{-1} = (d_1, 1, 2, \dots , d_1-1)(d_2, d_1+1, d_1+2,\dots ,d_2-1)\dots (d_m, d_{m-1}+1, d_{m-1}+2, \dots , d_{m}-1)\). If Q denotes the permutation matrix defined by \(\pi \), one obtains \(QPQ^{-1}\) in the desired form.    \(\square \)

We say that any permutation matrix of this structure is in cycle normal form. The cycle normal form of P is denoted by C(P). Up to permutation-similarity, we can always assume that the permutation matrix P of a given matrix with XOR-count t is in cycle normal form, as stated in the following corollary.

Corollary 2

$$\begin{aligned} P \prod _{k=1}^t (I+E_{i_k,j_k}) \ \ {\sim }_\pi \ \ C(P) \prod _{k=1}^t (I+E_{\sigma {(i_k)},\sigma ^{-1}{(j_k)}}) \end{aligned}$$

for some permutation \(\sigma \in S_{n}\).

3 Efficient Multiplication in Finite Fields

In this section, we first present some theoretic results towards understanding the structure of matrices \(M_{\alpha ,B}\) representing (left-) multiplication by some finite field element \(\alpha \in \mathbb {F}_{2^n}^{*}\). The parameter B indicates a basis of \(\mathbb {F}_{2^n}\) considered as an n-dimensional vector space over \(\mathbb {F}_2\). The XOR-count of \(M_{\alpha ,B}\) is indeed depending on the choice of the basis B. As described in Corollary 2, we can assume a certain normal form for matrices with an XOR-count of t.

Not every (invertible) matrix is a representation of a field multiplication. For example, an obvious condition for that, is that the multiplicative order of the matrix divides \(2^n-1\). In order to understand exactly which matrices indeed represent multiplication with some field element \(\alpha \), Theorem 1 below gives a characterization that allows to efficiently decide when a given matrix corresponds to multiplication by a field element. The crucial part is the minimal polynomial of \(\alpha \). It is a property of the linear mapping

$$\begin{aligned} f_\alpha : \mathbb {F}_{2^n} \rightarrow \mathbb {F}_{2^n}, \beta \mapsto \alpha \beta \end{aligned}$$

and is invariant under changing the specific representation of \(f_\alpha \) to \(\beta \mapsto M_{\alpha ,B} \beta \).

Theorem 1

Let \(A \in {{\mathrm{Mat}}}_n(\mathbb {F}_{2})\setminus \{\mathbf {0}_n\}\). Then \(A = M_{\alpha ,B}\) for some element \(\alpha \in \mathbb {F}_{2^n}^{*}\) with respect to some basis B if and only if \(m_A\) is irreducible.

Proof

As described in [29], the ring generated by some matrix A defines a field of order \(2^n\) if and only if the characteristic polynomial \(\chi _A\) is irreducible. This is the case since \(\chi _A(A)=0\) and thus A is the root of an irreducible polynomial of degree n. One can see that \(\mathbb {F}_2(A) = \{\sum _{i=0}^{n-1} \alpha _i A^i \mid \alpha _i \in \mathbb {F}_2\}\) since it must contain all sums of powers of A. However, for \(\mathbb {F}_2(A)\) being a field it is not necessary that A has an irreducible characteristic polynomial. It can be possible that A generates a subfield \(\mathbb {F}_{2^m}\) of \(\mathbb {F}_{2^n}\). As we show now, this is the case if and only if the minimal polynomial of \(\alpha \) is irreducible and has degree m.

If \(m_A\) is not irreducible, \(\mathbb {F}_2(A)\) is not a field and thus A cannot represent a field multiplication. Let now \(m_A\) be irreducible. The characteristic polynomial \(\chi _A\) is necessarily a power of \(m_A\), since both of these polynomials share the same irreducible factors. So, \(\chi _A = (m_A)^d\) for some positive integer d. Both d and \(\deg (m_A)\) divide n. Because of the irreducibility of \(m_A\), the rational canonical form of A consists of d blocks of \(C_{m_A}\). Thus, we obtain the similarity

$$\begin{aligned} A \sim \bigoplus _{k=1}^{d} C_{m_A}. \end{aligned}$$

Since \(\chi _{C_{m_A}} = m_A\), the matrix A defines a multiplication with some element in a subfield of \(\mathbb {F}_{2^n}\).    \(\square \)

Note that, any field element \(\alpha \) is, up to its conjugates \(\alpha ,\alpha ^2,{\alpha ^2}^2,\dots ,{\alpha ^2}^{n-1}\), uniquely identified by its minimal polynomial. For every field element \(\alpha \), the minimal polynomial \(m_{\alpha }\) is exactly the minimal polynomial \(m_A\) of a matrix A representing multiplication with \(\alpha \). Furthermore, two matrices \(A, A' \in {{\mathrm{Mat}}}_n(\mathbb {F}_{2})\) with the same irreducible minimal polynomial are similar. Thus, given a matrix A, identifying the element \(\alpha \) such that \(A=M_{\alpha ,B}\) is equivalent to computing the (irreducible) minimal polynomial of A.

The main question is which field elements can be implemented with a minimal number of XOR operations, or in particular, what is the minimal XOR-count for a given (non-trivial) field element \(\alpha \in \mathbb {F}_{2^n}^{*}\). Trivially, multiplication with \(\alpha = 1\) can be implemented with zero additions since \(M_{1,B} = I_n\) for all bases B. On the other hand, if the XOR-count is 0, the element is equal to 1. In a first place, we thus aim for an XOR-count of 1 whenever possible. By a simple observation, this optimal result can be realized if the minimal polynomial of \(\alpha \) is a trinomial of degree n.

Example 1

Let the field with \(2^n\) elements be represented as \(\mathbb {F}_{2^n} = \mathbb {F}_2[x] /(q)\) for an irreducible q of degree n. For the (left-) multiplication with x in the canonical basis \(B = \{1,x,x^2,\dots ,x^{n-1}\}\), it is \(M_{x,B} = C_q\). Thus, \({{\mathrm{wt_{\oplus }}}}(M_{x,B}) = {{\mathrm{wt}}}(q)-2\) and the XOR-count of \(M_{x,B}\) equals 1 if q is a trinomial.

Since our approach is about finding any (non-trivial) element \(\alpha \in \mathbb {F}_{2^n}^{*}\) such that multiplication with \(\alpha \) can be implemented with minimal additions, this fact implies that we cannot hope to improve upon the implementation costs if there exists an irreducible trinomial of degree n. However, for several n, including the interesting case where n is a multiple of 8, there does not exist such a trinomial [28]. The question is what happens for these cases. As one of our main results, we show that the condition on the minimal polynomial is not only sufficient but also necessary.

3.1 Characterizing Elements with Optimal XOR-Count

In this section, we prove the converse of the fact described in Example 1, namely the necessary condition on the minimal (resp. characteristic) polynomial of \(\alpha \) resulting in an XOR-count of 1.

Theorem 2

Let \(\alpha \in \mathbb {F}_{2^n}\). Then there exists a matrix A with \({{\mathrm{wt_{\oplus }}}}(A) = 1\) such that \(A = M_{\alpha ,B}\) for some basis B if and only if \(m_{\alpha }\) is a trinomial of degree n.

Proof

Let \(M_{\alpha ,B}\) represent multiplication by some element \(\alpha \in \mathbb {F}_{2^n}\) with respect to the basis \(B = \{b_1,\dots ,b_n\}\) and let further \({{\mathrm{wt_{\oplus }}}}(M_{\alpha ,B}) = 1\). We show that the characteristic polynomial \(\chi _{M_{\alpha ,B}}\) is a trinomial and coincides with \(m_\alpha \). Since the XOR-count is 1, we can assume w.l.o.g. that \(M_{\alpha ,B} = P + E_{i,j}\) such that \(P = \bigoplus _{k=1}^{l} C_{x^{m_k}+1}\) is in cycle normal form. We first show that \(l = 1\). Suppose \(l >1\), then, depending on \(E_{i,j}\), the matrix \(M_{\alpha ,B}\) is either in upper or lower triangular form consisting of at least two diagonal blocks. Since one of them must be of the form \(C_{x^{m}+1}\), the polynomial \(x^m+1\) must divide the characteristic polynomial \(\chi _{M_{\alpha ,B}}\). Since further \((x+1) \mid (x^m+1)\), the minimal polynomial of \(\alpha \) is necessarily a multiple of \(x+1\). This is a contradiction since \(\alpha \ne 1\) and \(m_\alpha \) must be irreducible. Hence, \(M_{\alpha ,B}\) is permutation-similar to \(C_{x^n+1}+E_{i,j}\). It is further \(i \ne j+1 \mod n\) since otherwise \(M_{\alpha ,B}\) would be singular.

We now investigate how \(\alpha \) operates on the basis elements \(b_k \in B\). Considering the structure of \(M_{\alpha ,B}\), we obtain the following list of equations.

$$\begin{aligned} \alpha b_1&= b_2 \\ \vdots \\ \alpha b_{j-1}&= b_j \\ \alpha b_j&= b_{j+1} + b_i \\ \alpha b_{j+1}&= b_{j+2} \\ \vdots \\ \alpha b_n&= b_1. \end{aligned}$$

By defining \(\gamma := b_{j+1}\), one can express every basis element \(b_k\) as a power of \(\alpha \) multiplied by \(\gamma \). In particular,

$$\begin{aligned} b_{j+k \mod n} = \alpha ^{k-1} \gamma \end{aligned}$$
(1)

for \(k \in \{1,\dots , n\}\). Combining this observation with the identity \(\alpha b_j = b_{j+1} + b_i\), one obtains

$$\begin{aligned} \alpha ^n \gamma = \gamma + \alpha ^t \gamma \end{aligned}$$
(2)

for some exponent \(t\ne 0\). Since \(\gamma \ne 0\), the field element \(\alpha \) is a root of the trinomial \(p = x^n + x^t + 1\). It is left to show that p is exactly the minimal polynomial of \(\alpha \). Suppose that \(m_\alpha = x^m + \sum _{k=0}^{m-1} c_{k}x^{k}\) with constants \(c_k \in \{0,1\}\) and \(m < n\). By multiplying \(m_\alpha (\alpha )\) with \(\gamma \), one obtains

$$\begin{aligned} \alpha ^m \gamma = \sum _{k=0}^{m-1} c_k \alpha ^k \gamma \end{aligned}$$

and thus \(b_{t_m} = \sum _{k=0}^{m-1} c_k b_{t_k}\) for some basis elements \(b_{t_k}\). We are now able to express one basis element \(b_{t_k}\) as a sum of other elements from B which is contradictory to the linear independence of the basis. Hence, \(\deg (m_\alpha ) = n\) and thus \(m_\alpha = p\) which finally proves the theorem.    \(\square \)

Note that the polynomial p is exactly the characteristic polynomial of \(M_{\alpha ,B}\) since it must be a monic multiple of \(m_\alpha \) having degree n. An alternative way of proving that the characteristic polynomial of a matrix \(C_{x^n+1} + E_{i,j}\) is a trinomial is given in Appendix A. As a simple corollary one obtains that any \(\alpha \in \mathbb {F}_{2^n}^{*}\) with an XOR-count of 1 cannot be contained in a proper subfield.

Corollary 3

Let \(\alpha \in \mathbb {F}_{2^n}^{*}\setminus \{1\}\) and let further \(\deg (m_{\alpha }) < n\), indicating that \(\alpha \) lies in a proper subfield of \(\mathbb {F}_{2^n}\). Then, any matrix \(M_{\alpha ,B}\) representing multiplication by a field element \(\alpha \) with respect to some basis B has \({{\mathrm{wt_{\oplus }}}}(M_{\alpha ,B}) > 1\).

This result implies that building MDS layers using a block interleaving construction [1], also called subfield construction in [17], almost always results in suboptimal implementation costs. Note that specific instances of this construction are also implicitly used in the AES, LS-Designs [12] and the hash function Whirlwind [3].

Now let \(\alpha \) be an element with XOR-count 1. From Corollary 1 we know that \(\alpha ^{-1}\) has the same XOR-count. Next, we show that there do not exist any further elements with an XOR-count equal to 1.

Theorem 3

For any given basis B of \(\mathbb {F}_{2^n}\), there exist at most two field elements \(\alpha \) and \(\alpha ^{-1}\) with \({{\mathrm{wt_{\oplus }}}}(M_{\alpha ,B}) = {{\mathrm{wt_{\oplus }}}}(M_{\alpha ^{-1},B})\) = 1.

Proof

Let \(\alpha \in \mathbb {F}_{2^n}^{*}\) with \({{\mathrm{wt_{\oplus }}}}(M_{\alpha ,B}) = 1\) for the basis \(B = \{b_1,\dots ,b_n\}\). We show that for any \(\beta \in \mathbb {F}_{2^n}\) with \({{\mathrm{wt_{\oplus }}}}(M_{\beta ,B}) = 1\) it holds that \(\beta = \alpha ^{\pm 1}\).

Since w.l.o.g. \(M_{\alpha ,B}\) can be assumed to be of the form \(C_{x^n+1}+E_{i,j}\), we know that (1) and (2) hold. We further know that \(M_{\beta ,B}\) is of the form \(P+E_{i',j'}\) and thus there exist \(l,m \in \{1,\ldots ,n\}\) with \(l \ne m\) and \(\beta b_{j+l \mod n} = b_{j+m \mod n}\). Using Eq. (1), we can write where \(s \in \{-(n-1),\ldots ,n-1\}\). We directly see that \(s \ne 0\). It remains to show that \(-1 \le s \le 1\).

Assume \(s \ge 2\). We use Eqs. (1) and (2) to obtain

$$\begin{aligned} \beta b_{j+(n-s+1) \mod n} = \alpha ^n \gamma = \gamma + \alpha ^t \gamma = b_{j+1 \mod n} + b_{j+t+1 \mod n}. \end{aligned}$$

Since \(0<t<n\), it holds that \(b_{j+1 \mod n} \ne b_{j+t+1 \mod n}\) and thus the according column contains an additional 1. For the next column, we have

$$\begin{aligned} \beta b_{j+(n-s+2) \mod n}&=\alpha ^{n+1}\gamma = \alpha \gamma + \alpha ^{t+1} \gamma \\&= {\left\{ \begin{array}{ll} b_{j+2 \mod n} + b_{j+t+2 \mod n}, &{} \text {for } t < n-1\\ b_{j+2 \mod n} + b_{j+1 \mod n} + b_{j \mod n}, &{} \text {for } t = n-1 \end{array}\right. } \end{aligned}$$

Hence, this column also contains at least one additional 1 which is contradictory to the XOR-count of 1.

For \(-s\ge 2\) we can construct the same contradiction by considering \(\beta ^{-1}\).    \(\square \)

We now understand the structure of field elements \(\alpha \) that can be implemented with a single addition. One might think that also for the other cases, the weight of the minimal polynomial of \(\alpha \) strictly lower-bounds XOR-count as \({{\mathrm{wt}}}(m_\alpha ) - 2\). As we will see next, this is not the case.

3.2 Experimental Search for Optimal XOR-Counts

Surprisingly, we often can improve the XOR-count, compared to using the companion matrix for multiplication, if the weight of the minimal polynomial is greater than 3. For instance, if \(m_\alpha \) is an irreducible pentanomial, that is of weight 5, of degree n there often exists a basis B such that \({{\mathrm{wt_{\oplus }}}}(M_{\alpha ,B}) = 2\). Indeed, for all \(n \le 2048\) for which no irreducible trinomial of degree n exists, we found some element \(\alpha \in \mathbb {F}_{2^n}^{*}\) with an XOR-count of 2 for some basis B. For every such dimension, we present an example of such a matrix in Table 8. Thus, for all practically relevant fields, we are able to identify an element such that multiplication can be implemented with one or two XOR operations. By Theorem 2, these results are proven to be optimal.

Moreover, as fields of small size are most interesting for SP-networks, we investigated those in full detail. For the fields \(\mathbb {F}_{2^4}\), \(\mathbb {F}_{2^5}\), \(\mathbb {F}_{2^6}\), \(\mathbb {F}_{2^7}\) and \(\mathbb {F}_{2^8}\) we present the optimal XOR-count for each non-trivial element \(\alpha \) in Tables 3456 and 7, respectively. The main observation is that each element which is not contained in a proper subfield can be implemented with at most 3 additions. Furthermore, whenever an XOR-count of 2 is possible, the minimal polynomial of \(\alpha \) is a pentanomial in all those cases. However, a more thorough characterization of elements with non-optimal XOR-count is left as an open problem (see Sect. 6 for more details).

Those results are based on a search. Since we are only interested in matrices up to similarity (due to the change of basis), we just need to consider all matrices in the normal form described in Corollary 2. This will exhaust all possibilities of similarity classes for a given XOR-count t. In particular, the search space is reduced from \(n!(n(n-1))^{t}\) to only \(p(n)(n(n-1))^t\) where p(n) denotes the number of partitions of n, which is exactly the number of possible cycle normal forms of dimension n. This allows us to exhaustively search over all similarity classes up to \(t=3\) XOR operations for the fields of small size. The key-point here is that, instead of searching for an optimal basis for a given field element, we generated all matrices with small XOR-count and used Theorem 1 in order to check which field element (if any) the given matrix corresponds to.

In order to identify a single lightweight element for larger field sizes, we identified conditions in which cases the characteristic polynomial of a matrix with XOR-count 2 has weight 5, cf. Theorem 4 below. During the search, one only has to check for irreducibility. This allows to compute the results presented in Table 8 extremely fast, that is within a couple of minutes on a standard PC. The proof of Theorem 4 is given in Appendix A.

Theorem 4

Let \(M = C_{x^{n}+1} + E_{i_1,j_1} + E_{i_2,j_2}\) such that the following relations hold:

$$\begin{aligned} i_1 < j_1 \ne n, \ \ \ i_2 > j_2 +1, \ \ \ i_1 \le j_2, \ \ \ i_2 \le j_1, \ \ \ j_1 - (i_1-1) \ne n, \ \ \ n - (j_1 - i_1) \ne i_2 - j_2 \end{aligned}$$

The characteristic polynomial of M is a pentanomial of degree n. In particular

$$\begin{aligned} \chi _M = \lambda ^n + \lambda ^{n+i_1-j_1+i_2-j_2-2} + \lambda ^{n+i_1-j_1-1} + \lambda ^{i_2-j_2-1} + 1. \end{aligned}$$

4 Constructing Lightweight MDS Matrices

Our goal is now to construct lightweight MDS matrices. We use the results obtained in the previous sections and restrict our search to circulant matrices and entries with low XOR-count. This simplifies checking the MDS property and computing an upper bound of the XOR-count of the whole matrix. The complexity of our algorithm enables us to easily search for MDS matrices up to dimension 8. Our construction is generic and works for all finite fields \(\mathbb {F}_{2^m}\) with \(m > b\) for a given bound b.

More precisely, we construct circulant matrices with entries of the form \(\alpha ^{\pm i}\) where \(\alpha \) is an element in \(\mathbb {F}_{2^m}\). Choosing entries of this form enables us to easily upper-bound the XOR-count of the elements since

$$\begin{aligned} {{\mathrm{wt_{\oplus }}}}(x^{\pm k}) \le k {{\mathrm{wt_{\oplus }}}}(x). \end{aligned}$$

This can be easily seen by using Corollary 1 and the fact that \(\alpha ^k\) can be implemented by k times implementing \(\alpha \). We want to keep the size of the finite field over which the matrix is defined generic. Thus, we choose the matrix entries from a subgroup of the field of fractions of the polynomial ring \(\mathbb {F}_2[x]\), denoted \({{\mathrm{Quot}}}(\mathbb {F}_2[x])\). That is, every element is of the form

$$\begin{aligned} \frac{x^s+a_{s-1}x^{s-1}+\dots + a_1x + a_0}{x^t+b_{t-1}x^{t-1}+\dots + b_1x + b_0}. \end{aligned}$$

More precisely, and as mentioned above, we restrict our search to elements from \(\langle x \rangle \) which is the multiplicative subgroup of \({{\mathrm{Quot}}}(\mathbb {F}_2[x])\) generated by x. Our search works by constructing MDS conditions for an \(n \times n\) matrix M with entries in \(\langle x\rangle \). This approach later allows us to substitute the indeterminate x by any \(\alpha \in \mathbb {F}_{2^m}\) that fulfills all of the conditions given below. In this context, we let \(M(\alpha ) \in {{\mathrm{Mat}}}_n(\mathbb {F}_{2^m})\) denote the matrix obtained by substituting x with \(\alpha \in \mathbb {F}_{2^m}\).

We define the weight of some circulant matrix with entries in \(\langle x\rangle \) as the sum of the absolute values of the exponents in its first row, that is, the number of times \(\alpha \) has to be applied per row. Then, for a given dimension, we are interested in finding the lightest matrix M which can be made MDS for as many finite fields as possible. Note that the higher priority here was to find a lightweight matrix. Thus, there might exist matrices which can be made MDS for even more fields, but with a probably higher cost.

MDS Conditions. Note that a matrix is MDS, if and only if all its square submatrices are invertible [22, page 321, Theorem 8]. Thus, given a matrix \(M \in {{\mathrm{Mat}}}_n({{\mathrm{Quot}}}(\mathbb {F}_2[x]))\), we compute the determinants of all square submatrices (called minors) of M in order to check the MDS property. This way one obtains a list of conditions (polynomials in \(\mathbb {F}_{2}\)) for a matrix to be MDS. Since the determinant of a matrix with elements from a field is an element of the field itself, all of these determinants can be represented as the fraction of two polynomials. Thus, M is MDS if and only if the numerator of all minors is non-zero. One can decompose the numerators into their irreducible factors and collect all of them in a set T. This set now defines the MDS conditions. In particular, \(M(\alpha )\) is MDS if and only if \(\alpha \) is not a root of any of these irreducible polynomials in T, that is, iff \(m_\alpha \notin T\). This trivially holds for \(m > \max _{p \in T} \{\deg (p)\}\) and any \(\alpha \in \mathbb {F}_{2^m}\) which is not contained in a proper subfield. In general, if \(\alpha \) is not contained in a proper subfield, the necessary and sufficient condition for the existence of an MDS matrix \(M(\alpha )\) is that not all irreducible polynomials of degree m are contained in T. We note that there exists a value b which lower bounds the field size for which M can always be made MDS. That is, for all \(t > b\), there exists an irreducible polynomial of degree t which is not in T.

4.1 Generic Lightweight MDS Matrices

We now present some results obtained by the approach described above. Given the restrictions, these matrices achieve the smallest weight, i.e. the smallest sum of (absolute) exponents of x. Later, we will use these generic matrices to build concrete instantiations of \(n \times n\) MDS matrices \(M(\alpha )\) for \(n \in \{2,3,\dots ,8\}\) over a finite field \(\mathbb {F}_{2^m}\) with \(m > b\). We note that the given results are not necessarily the only possible constructions with the smallest weight.

We also present the conditions for the matrix to be MDS, that is, the irreducible polynomials that must not be equal to \(m_{\alpha }\). However, since the number of conditions rapidly increases with the dimension of the matrix, we refrain from presenting a complete list for dimensions 6 to 8. Instead, we give the SageMath source code that was used to compute the set T of irreducible polynomials in Listing 1.1.

figure c

\(2 \times 2\) and \(3 \times 3\) matrices. The matrices

$$\begin{aligned} {{\mathrm{circ}}}(1,\alpha ) = \begin{pmatrix} 1 &{} \alpha \\ \alpha &{} 1 \end{pmatrix} \end{aligned}$$

and

$$\begin{aligned} {{\mathrm{circ}}}(1,1,\alpha ) = \begin{pmatrix} 1 &{} 1 &{} \alpha \\ \alpha &{} 1 &{} 1 \\ 1 &{} \alpha &{} 1 \end{pmatrix} \end{aligned}$$

are MDS for all \(\alpha \ne 0, 1\).

\(4 \times 4\) matrices. For \(m > 3\), there exists an \(\alpha \in \mathbb {F}_{2^m}\) such that the matrix \({{\mathrm{circ}}}(1,1,\alpha ,\alpha ^{-2})\) is MDS. More precisely, the matrix is MDS iff \(\alpha \) is not a root of any of the following polynomials:

$$\begin{aligned} x\\ x + 1\\ x^2 + x + 1\\ x^3 + x + 1\\ x^3 + x^2 + 1\\ x^4 + x^3 + x^2 + x + 1 \\ x^5+x^2+1 \end{aligned}$$

\(5 \times 5\) matrices. For \(m > 3\), there exists an \(\alpha \in \mathbb {F}_{2^m}\) such that the matrix \({{\mathrm{circ}}}(1,1,\alpha ,\alpha ^{-2},\alpha )\) is MDS. More precisely, the matrix is MDS iff \(\alpha \) is not a root of any of the following polynomials:

$$\begin{aligned} x\\ x + 1\\ x^2 + x + 1\\ x^3 + x + 1\\ x^3 + x^2 + 1\\ x^4 + x + 1 \\ x^4 + x^3 + 1 \end{aligned}$$

\(6 \times 6\) matrices. For \(m > 5\), there exists an \(\alpha \in \mathbb {F}_{2^m}\) such that the matrix \({{\mathrm{circ}}}(1,\alpha ,\alpha ^{-1},\alpha ^{-2}, 1,\alpha ^{3})\) is MDS.

\(7 \times 7\) matrices. For \(m > 5\), there exists an \(\alpha \in \mathbb {F}_{2^m}\) such that the matrix \({{\mathrm{circ}}}(1,1,\alpha ^{-2}, \alpha , \alpha ^{2}, \alpha ,\alpha ^{-2})\) is MDS.

\(8 \times 8\) matrices. For \(m > 7\), there exists an \(\alpha \in \mathbb {F}_{2^m}\) such that the matrix \({{\mathrm{circ}}}(1,1,\alpha ^{-1},\alpha ,\alpha ^{-1},\alpha ^{3}, \alpha ^{4},\alpha ^{-3})\) is MDS.

4.2 Instantiating Lightweight MDS Matrices

We now combine the efficient multiplication in finite fields from Sect. 3 with our construction of MDS matrices. That is, the presented generic MDS matrices are instantiated with elements \(\alpha \) with low XOR-count.

In a matrix multiplication every element is computed as the sum over multiplications. The according XOR-count was already discussed in [17, 27]. For our matrices, the total number of XOR operations needed per row is upper bounded by

$$\begin{aligned} (n-1)m + w \cdot {{\mathrm{wt_{\oplus }}}}(\alpha ). \end{aligned}$$

Here, \((n-1)m\) XORs are the static part which comes from summing over the multiplication results and w is the weight as defined above. The overhead of \(w \cdot {{\mathrm{wt_{\oplus }}}}(\alpha )\) XORs is needed for multiplying with the single elements. The static part cannot be changed by fast multiplication. Therefore, this overhead is the part that has to be minimized.

The cost per bit for the whole matrix is given by

$$\begin{aligned} \frac{n((n-1)m + w{{\mathrm{wt_{\oplus }}}}(\alpha ))}{nm} = n-1 + \frac{w{{\mathrm{wt_{\oplus }}}}(\alpha )}{m}. \end{aligned}$$

One can notice that it decreases for larger field sizes.

Table 1. Optimal instantiations of the generic MDS matrices for \(2 \le n \le 8\). In each cell, the first entry describes the minimal polynomial of \(\alpha \in \mathbb {F}_2^m\) and the second entry describes the overhead of the instantiated \(n \times n\) matrix \(M(\alpha )\). The trinomial \(x^m+x^a+1\) is denoted by (a) and the pentanomial \(x^m+x^a+x^b+x^c+1\) is denoted by (abc).

For each of the matrices M described in Sect. 4.1, Table 1 presents choices for \(\alpha \) such that \(M(\alpha )\) is MDS. Note that concrete instantiations are only given up to the field size \(m=13\). The reason is that for larger m, all possible \(C_p\) with p as an irreducible degree-m polynomial of weight 3 are valid choices. If no such trinomial exists, one can choose \(M_{\alpha ,B}\) as in Table 8.

Table 2 compares the results presented in this section to the best constructions known so far. It turned out that our construction of the \(4 \times 4\) MDS matrix in \(\mathbb {F}_{2^4}\) is identical to the \(\mathbb {F}_2\)-linear matrix constructed in [19, 21]. We stress that our construction leads to the lightest MDS matrices known, improving the results described in [21, 27] for \(8 \times 8\) MDS matrices in \(\mathbb {F}_{2^4}\) and \(\mathbb {F}_{2^8}\) respectively. This is also the case when considering an unrolled implementation of the serial implementations in [30]. Unrolled variants of their implementations have an XOR-count that is slightly larger than ours. Moreover, and more importantly, the circuit depth is considerably increased due to the optimization with respect to a serial implementation.

Table 2. Comparison of our results with the (non-involutory) \(\mathbb {F}_{2^m}\)-linear MDS matrices from [27, Sect. 6.2], [19, 21] by overhead. a: In these constructions, the XOR-count is measured by counting the number of additional 1’s in the corresponding matrix.

Note that our results in Table 2 are measured by the XOR-count from Definition 1 while the results from [19, 21, 27] use the old XOR-count definition. Additionally to these results, our understanding of how to choose an optimal basis can also be used to improve existing results in the old XOR-count definition. For example, we can represent the \(8 \times 8\) MDS matrix in \(\mathbb {F}_{2^8}\) from [21] with 28 additional ones instead of 30 by change of basis.

5 Generalizing the MDS Property

Here, following e.g. [30], we consider a generalization to additive MDS codes in order to improve efficiency.

There are some dimensions for which no field element with an XOR-count of 1 exists, for instance \(m =8\). However, especially this dimension is very important since lots of block cipher designs are byte oriented. One would wish to have some element \(\alpha \) with \({{\mathrm{wt_{\oplus }}}}(\alpha ) = 1\). A way of solving this problem is to not restrict to field elements. Instead, \(\alpha \) can be chosen to be some other matrix in the ring \(R={{\mathrm{Mat}}}_m(\mathbb {F}_2)\). Given an \(n \times n\) matrix M with elements in \({{\mathrm{Quot}}}(\mathbb {F}_2[x])\), the substitution \(M(\alpha )\) now consists of elements in a commutative ring with unity, which is the subring of R generated by \(\alpha \). In general, given a commutative ring with unity R, one can define the determinant \(\det _R: {{\mathrm{Mat}}}_n(R) \rightarrow R\) in a similar way than for matrices over fields. As described in [18, pp. 212–215], any \(A \in {{\mathrm{Mat}}}_n(R)\) is invertible if and only if \(\det _R(A)\) is a unit in R. We now define the MDS property for matrices over a commutative ring.

Definition 2

Let R be a commutative ring with unity. A matrix \(M \in {{\mathrm{Mat}}}_n(R)\) is MDS if and only if for every \(1\le s \le n\), any \(s \times s\) submatrix of M is invertible.

For checking the MDS property in our case, we use a well-known fact about block matrices.

Theorem 5

(Theorem 1 in [26]). Let K be a field and let R be a commutative subring of \({{\mathrm{Mat}}}_m(K)\) for some integer m. For any matrix \(M \in {{\mathrm{Mat}}}_d(R)\), it is

$$\begin{aligned} \det (M) = \det ({\det }_R(M)), \end{aligned}$$

where \(\det (M)\) is the determinant of M considered as \(M \in {{\mathrm{Mat}}}_{dm}(K)\).

As an implication, \(M(\alpha )\) is MDS if and only if \(p(\alpha )\) is invertible for all \(p \in T\), if and only if \(\det (p(\alpha )) \ne 0\) for all \(p \in T\).

\(2 \times 2\) and \(3 \times 3\) matrices. Given \(M = {{\mathrm{circ}}}(1,x)\) (resp. \(M = {{\mathrm{circ}}}(1,1,x\))), one has to make sure that both x and \(x+1\) are invertible for M to be MDS. This is the case if x is substituted by the companion matrix \(C_{x^m+x+1}\) for \(m \ge 2\). Thus, \(M(C_{x^m+x+1})\) is MDS and each entry has an XOR-count of 1.

\(4 \times 4\) matrices. The MDS conditions are more complex than above. So, we only present some improvements for \(m \in \{8,13,16\}\). The matrix \(M = {{\mathrm{circ}}}(1,1,\alpha ,\alpha ^{-2})\) is MDS for

$$\begin{aligned} \alpha \in \{C_{x^8+x^2+1}, C_{x^{13}+x+1},C_{x^{16}+x+1}\}. \end{aligned}$$

Note that a similar matrix for \(m = 8\) was recently constructed in [19].

6 Conclusion and Open Problems

We presented a study of optimal multiplication bases with respect to the XOR-count. When applied to MDS matrices those lead to very efficient round-based implementations. We expect our results to be applied in other domains as well.

Our investigations leave many possibilities for future research. While we have been able to characterize exactly which field elements can be implemented with one XOR operation only, the general case is still open. For small fields of dimension smaller or equal to eight, we were able to compute the optimal bases with the help of an exhaustive computer search. However, for larger dimensions, this approach turns quickly inefficient and more insight would be needed. As a first step, we conjecture the following statement.

Conjecture 1

If \({{\mathrm{wt_{\oplus }}}}(M_{\alpha ,B}) = 2\), then \(m_\alpha \) is of weight smaller or equal to 5.

Note that the converse of the conjectured statement is (unlike the case of trinomials) wrong. As can be seen in Table 7, there exist a pentanomial of degree 8 which cannot be implemented with two XOR operations only. Beyond that, our intuition is that the larger the weight of the minimal polynomial, the larger the gap between the most efficient multiplication and the efficiency of multiplying by means of the companion matrix. Quantifying and demonstrating such a statement is an interesting and challenging open problem. Another interesting question is to get an improved understanding of how to most efficiently multiply with elements in proper subfields. More specifically, as a generalization of Corollary 3, one may ask the following question.

Question 1

Is the most efficient way to multiply with a subfield element given by multiplying in the subfield d times, where d is the extension degree of the field when viewed as an extension of the subfield. More precisely, given an \(\alpha \in \mathbb {F}_{2^m}^{*} \subset \mathbb {F}_{2^n}^{*}\) in a proper subfield of dimension \(m = \frac{n}{d}\) and let \(M_{\alpha \in \mathbb {F}_{2^m},B'}\) be the multiplication matrix in \(\mathbb {F}_{2^m}\) with an optimal XOR-count. Is \(M_{\alpha \in \mathbb {F}_{2^n},B} =\bigoplus _{k=1}^d M_{\alpha \in \mathbb {F}_{2^m},B'}\) a matrix with the lowest possible XOR-count for multiplication with \(\alpha \in \mathbb {F}_{2^n}\)? In particular, is \({{\mathrm{wt_{\oplus }}}}(M_{\alpha \in \mathbb {F}_{2^n},B}) = d {{\mathrm{wt_{\oplus }}}}(M_{\alpha \in \mathbb {F}_{2^m},B'})\)?

Finally, for MDS matrices, it should be noted that we locally achieve the optimal solution. What would be needed to finally settle the search for lightweight matrices is a global optimal solution. That is for a given dimension, find an MDS matrix that can be implemented with the minimal number of XOR operations.

Finally, when optimizing for software, similar questions can be phrased and investigating solutions that are valid for more than one specific platform is a challenging research topic.