1 Introduction

Block ciphers play an important role in symmetric cryptography, providing a basic tool for encryption. They are (probably) the most scrutinized cryptographic tools, and they are often used as the underlying tool to construct other cryptographic algorithms, whose proofs of security are performed under the assumption that the underlying block cipher is ideal.

The concept of known-key distinguishers was introduced by Knudsen and Rijmen in [20]. In the classical single secret-key setting, the attacker does not know the randomly generated key and aims to recover it or to build a (secret-key) distinguisher that allows to distinguish the cipher from a random permutation. The security model in known-key attacks is quite different though: the attacker knows the randomly drawn key the block cipher operates with and aims to find a structural property for the cipher under the known key—a property which an ideal cipher (roughly speaking, a permutation drawn at random) would not have. For completeness, we mention that a more relaxed version – called chosen-key distinguisher—can be considered, where the adversary is assumed to have a full control over the key. This model was introduced in [5], and has been extended to a related-key attack on the full-round AES-256, while the best chosen-key distinguisher for AES-128 [14] currently present in the literature covers 9 rounds out of 10. In this paper however we focus on the known-key model and do not allow/assume related keys.

Since their introductions, known-key attacks have been a major research topic in the symmetric-key community. Only to provide some examples besides AES, known-key distinguishers have been proposed for full PRESENT [9] (one of the most studied lightweight block cipher proposed at CHES 2007) and for Feistel networks [29]. This is justified by the fact that if known-key distinguishers could be considered less relevant than secret-key ones, they anyway allow to learn something about the security margin of a cipher. For example, if it is not possible to find distinguishers for a block cipher when the key is given, then one cannot find a distinguisher when the key is secret. Secondly and more importantly, hash functions can be built from block ciphers, and vice versa. For example, given a hash function, it is always possible to set up a block cipher using the Feistel construction. Vice versa, e.g. the Davies-Meyer construction or the Miyaguchi-Preneel construction can transform a secure block cipher into a secure compression function. In a hash setting, block cipher security models such as the known-key model (or the chosen-key model) make sense since in practice the attacker has full access and control over the internal computations. Finally, an attack in these models depicts a structural flaw of the cipher, while it should be desired to work with a primitive that does not have any flaw, even in the most generous security model for the attacker. A classical example is the devastating effect on the compression function security of weak keys for a block cipher [34], which are usually considered as a minor flaw for a block cipher if the set of these weak-keys is small. Therefore, the security notions to consider for a block cipher will vary depending on whether this block cipher is used in a hash function setting or not.

Despite this cumulative impact in the symmetric-key community over the last years, known-key attacks have been known to be difficult to formalize since [1] proposed the notion of known-key indifferentiability to capture the security of block ciphers under a known key. In particular, they focus on known-key distinguishers for block ciphers based on idealized primitives such as randomly drawn functions or permutations, that is block ciphers for which the round function looks like an ideal primitive and where the adversary can have access to this underlying ideal primitive. Later on, in [27] the impact of attacks in the known-key model on hash functions is studied.

Citing Knudsen and Rijmen [20], “imagine a block cipher” for which a known-key distinguisher exists, “but where no efficient attacks are known in the traditional black-box model. Should we recommend the use of such a cipher? We do not think so!

1.1 Known-key distinguishers for AES: the state of the art

The known-key model. In the known-key model, a full access to an instance of the encryption function associated with a known random key and its inverse is given. The purpose is to simultaneously control the inputs and the outputs of the primitive, i.e. to achieve input-output correlations that one could not efficiently achieve with inputs and outputs of a perfect random permutation to which one would have an oracle access. A formal definition of a known-key distinguisher is provided in Sect. 3, where we propose and describe in detail a generic scenario for known-key distinguishers. We emphasize that all known-key distinguishers currently present in the literature—including the ones presented in this paper – implicitly exploit (and can be described in) the scenario proposed in Sect. 3.2.

Known-key distinguishers for AES. AES and related constructions served as a benchmark for cryptanalytic techniques since the very introduction of this model by Knudsen and Rijmen [20] with a 7-round result. Subsequently, 8-round results were obtained using truncated differentials [17], which were later on improved in [19]. Currently, this last one—which exploits the rebound technique [22] and the so called “multiple limited-birthday problem”—is the best 8-round known-key distinguisher in the literature. At Asiacrypt 2014 Gilbert [16] found a way to extend an 8-round known-key distinguisher (using a novel representation of AES) into a more intricate 10-round distinguisher and hence presented for the first time a known-key distinguisher for full AES-128.

All the known-key distinguishers on AES currently present in the literature are briefly recalled in Sect. 3 using the “subspace trail notation”Footnote 1, recently introduced at FSE/ToSC 2017. In Table 1 we list the known-key distinguishers for AES, including our main results.

On Gilbert’s Approach. As we will describe in more detail in Sect. 3.1, the approach of [16] makes use of a freedom in the known-key model that was actually always there but never spelled out explicitly. In more detail, there is always the role of a “verifier”, in addition to a “shortcut-player” and a “generic player”. In [16] the verifier has perhaps for the first time some non-negligible computations to do. The details of Gilbert’s approach are such that it is still not possible to simply “peel-off” an arbitrary number of rounds, on the contrary it seems that only the detection of a very specific property (the so called “uniform distribution property”) could take advantage of computations of the verifier.

1.2 Our contributions

1.2.1 Systematization of known-key distinguisher

First of all, we begin with recapitulating the literature about the concept of known-key distinguisher in Sect. 3. By doing so, we propose and describe a generic scenario for a known-key distinguisher by interpreting it as a game between two players—a “shortcut-player” and a “generic player”—that face the same cipher. The idea is that the player who knows the key—namely, the “shortcut-player”—must be able to generate a set of plaintexts/ciphertexts (that satisfies a required property) faster than the other player who does not know the key (or equivalently, that faces an ideal cipher)—namely, the “generic-player”. Then we recall the known-key distinguishers present in the literature in the above scenario. In particular, we focus on Gilbert’s distinguisher proposed in [16], and we show that it can be easier explained using the “subspace trail notation” proposed at ToSC/FSE 2017 than using the “twisted representation”.

Table 1 AES known-key distinguishers
Table 2 1st/2nd Conjectures and AES Gilbert’s known-key distinguishers

1.2.2 Analysis of Gilbert’s known-key distinguisher: refuting both conjectures via progress in cryptanalysis

As we already recalled, at Asiacrypt 2014 Gilbert proposed a new known-key distinguisher on full AES-128 [16], by extending an 8-round known-key distinguisher based on the so-called uniform distribution property into a more intricate 10-round distinguisher. In the conclusion of his paper, Gilbert claims that it seems technically difficult to use a stronger property than the uniform distribution one to extend an 8-round known-key distinguisher to a 10-round one:

\(\mathbf {1st}\)Conjecture::

while we do not preclude that the use of the stronger property that several pairs satisfying the differential relation of [17] [i.e. truncated diff. relations exploited by the rebound distinguisher] can be derived might potentially result in a 10-round distinguisher that outperforms the 10-round distinguisher presented above, giving a rigorous proof seems technically difficult.

In particular, he left “the investigation of improved 10-round known-key distinguishers and associated proofs—or even plausible heuristic arguments if rigorous proofs turn out to be too difficult to obtain—as an open issue.

In this paper, we pick up this challenge, and using a strategy similar to the one proposed by Gilbert in [16], we show how to construct a more efficient 10-round distinguisher, by exploiting known-key distinguishers based on truncated differential trails. In particular, we use as a starting point the 8-round known-key distinguisher presented in [19], and we extend it at the end and at the beginning using the strategy proposed by Gilbert. This allows to set up a 10-round known-key distinguisher for AES (see Sect. 5) with a time complexity of approximately \(2^{50}\).

As one of the main (cryptanalytic) results, in Sect. 6 we show that it is possible to extend our 10-round distinguisher up to 12 rounds. Moreover, exploiting a similar strategy, in Sect. 7 we extend Gilbert’s 10-round distinguisher based on the uniform distribution property up to 12 rounds. These 12-round AES known-key distinguishers provide counter-examples of the claim made in [16] about the (im)possibility to use Gilbert’s technique to extend an 8-round distinguisher by more than 2 rounds:

\(\mathbf {2nd}\)Conjecture:

The reader might wonder whether the technique we used to derive a known-key distinguisher for the 10-round AES from a known-key distinguisher for the 8-round AES does not allow to extend this 8-round known distinguisher by an arbitrary number of rounds. It is easy however to see that the argument showing that 10-round relation\({\mathcal {R}}\)is efficiently checkable does not transpose for showing that the relations over\(r > 10\)rounds one could derive from the 8-round relation by expressing that the r-round inputs and outputs are related by\(r-8 > 2\)outer rounds to intermediate blocks that satisfy the 8-round relation are efficiently checkable.

Our results are summarized in Table 2.

1.2.3 A proposal of a “New” model for known-key distinguishers

The second main contribution is the high level insight that the details of the known-key model need to be changed if we aim to restore the original intent of the known-key model. The reason is that with our new result the difference between the chosen-key model and what is currently thought of as the known-key model is counter-intuitive: As we show it is now possible to have cryptanalytic results on more rounds of AES in the known-key model than in the chosen-key model and this is true for more than a single property. Hence we propose a simple restriction of the verifier in the known-key model to remedy the situation.

Firstly, we remark and emphasize that the goal of this paper is to discuss the validity of Gilbert’s model independently of its (possible) practical applications. In particular, even if Gilbert’s known-key distinguisher leads to statements on more rounds of AES than ever before (without related keys) that seem meaningful, then it is not clear if such statements can become useful in the sense of e.g. having an impact on the case where a block cipher is used to construct a hash function. This has also been noticed in [16], where it is pointed out that even if the strategy proposed by Gilbert allows to set up efficient known-key distinguishers, its “impact on the security of [...] AES when used as a known key primitive, e.g. in a hash function construction, is questionable” (see abstract of [16]).

To achieve our goal, under the assumption of the validity of such model, we set up distinguishers based on the truncated differential property instead of the uniform distribution one in Gilbert’s framework, and we show that it is also possible to extend them for up to 12-round AES, that is two rounds beyond the claim given by Gilbert in [16]. Using these results as a starting point, we propose—with more confidence than would be possible without our results—a (new) definition of known-key distinguisher model that rules out Gilbert’s and our attacks proposed in this paper. As our results show, this seems necessary for better capturing the original idea of known-key distinguishers as something “between secret-key model and hash function use-cases”. For this reason, we conclude in Sect. 8, with a discussion of the results and a proposal of a refinement of the known-key model which restores its original intent (in which the role of the verifier gets back to being marginal).

2 Preliminary

2.1 Preliminary—description of AES

The Advanced Encryption Standard [13] is a Substitution-Permutation network that supports key sizes of 128, 192 and 256 bits. The 128-bit plaintext initializes the internal state as a \(4 \times 4\) matrix of bytes that are interpreted as values in the finite fields \({\mathbb {F}}_{256}\), defined using the irreducible polynomial \(X^8+ X^4+X^3+X+1\). Depending on the version of AES, \(N_r\) round are applied to the state: \(N_r=10\) for AES-128, \(N_r=12\) for AES-192 and \(N_r=14\) for AES-256. An AES round applies four operations to the state matrix:

  • SubBytes (S-Box): applying the same 8-bit to 8-bit invertible S-Box 16 times in parallel on each byte of the state (it provides non-linearity in the cipher);

  • ShiftRows (SR): cyclic shift of each row to the left;

  • MixColumns (MC): multiplication of each column by a constant \(4\times 4\) invertible matrix \(M_{MC}\) (MC and SR provide diffusion in the cipherFootnote 2);

  • AddRoundKey (ARK): XORing the state with a 128-bit subkey.

One round of AES can be described as \( R(x) = K \oplus MC\circ SR \circ \text { S-Box} (x).\) In the first round an additional AddRoundKey operation (using a whitening key) is applied, and in the last round the MixColumns operation is omitted. Finally, as we do not use the details of the AES key schedule in this paper, we refer to [13] for a complete description.

The Notation Used in the Paper. Let x denote a plaintext, a ciphertext, an intermediate state or a key. Then \(x_{i, j}\) with \(i, j \in \{0, ..., 3\}\) denotes the byte in the row i and in the column j. We denote by \(k^r\) the subkey of the r-th round. If only the key of the final round is used, then we denote it by k to simplify the notation. Finally, we denote by R one round of AES, while we denote r rounds of AES by \(R^{r}\). We sometimes use the notation \(R_{K}\) instead of R to highlight the round key K. As a last thing, in this paper we often use the term “partial collision” (or “collision”) when two texts belong to the same coset of a given subspace X.

2.2 Preliminary—subspace trails

Invariant subspace cryptanalysis can be a powerful cryptanalytic tool, and subspace trails [18] – introduced at ToSC/FSE 2017—are a recent generalization of it.

Let F denote a round function in an iterative block cipher and let \(V\oplus a\) denote a coset of a vector space V. Then if \(F(V \oplus a) = V \oplus a\) we say that \(V\oplus a\) is an invariant coset of the subspace V for the function F. This concept can be generalized to trails of subspaces.

Definition 1

([18]) Let \((V_1, V_2, ... , V_{r+1})\) denote a set of \(r + 1\) subspaces with \(\dim (V_i) \le \dim (V_{i+1})\). If for each \(i = 1, ..., r\) and for each \(a_i \in V_i\), there exist \(a_{i+1} \in V_{i+1}\) such that \(F(V_i \oplus a_i) \subseteq V_{i+1} \oplus a_{i+1}\), then \((V_1, V_2, ... , V_{r+1})\) is a subspace trail of length r for the function F. If all the previous relations hold with equality, the trail is called a constant-dimensional subspace trail.

This means that if \(F^t\) denotes the application of t rounds with fixed keys, then \(F^t(V_1 \oplus a_1) = V_{t+1} \oplus a_{t+1}\). We refer to [18] for more details about the concept of subspace trails. Our treatment here is however meant to be self-contained.

2.2.1 Subspace trails of AES

In this section, we recall the subspace trails of AES presented in [18]. For the following, we only work with vectors and vector spaces over \({\mathbb {F}}_{2^8}^{4 \times 4}\), and we denote by \(\{e_{0,0}, ..., e_{3,3}\}\) the unit vectors of \({\mathbb {F}}_{2^8}^{4 \times 4}\) (e.g. \(e_{i,j}\) has a single 1 in row i and column j). We also recall that given a subspace X, the cosets \(X\oplus a\) and \(X\oplus b\) (where \(a\ne b\)) are equal (that is \(X\oplus a \equiv X \oplus b\)) if and only if \(a \oplus b \in X\).

Definition 2

The column spaces\({\mathcal {C}}_i\) are defined as \({\mathcal {C}}_i = \langle e_{0, i}, e_{1, i}, e_{2, i}, e_{3, i} \rangle \).

For instance, \({\mathcal {C}}_0\) corresponds to the symbolic matrix

$$\begin{aligned} {\mathcal {C}}_0 = \biggl \{ \begin{bmatrix} x_1 &{} 0 &{} 0 &{} 0\\ x_2 &{} 0 &{} 0 &{} 0\\ x_3 &{} 0 &{} 0 &{} 0\\ x_4 &{} 0 &{} 0 &{} 0 \end{bmatrix} \, \biggl | \, \forall x_1, x_2, x_3, x_4 \in {\mathbb {F}}_{2^8} \biggl \} \equiv \begin{bmatrix} x_1 &{} 0 &{} 0 &{} 0\\ x_2 &{} 0 &{} 0 &{} 0\\ x_3 &{} 0 &{} 0 &{} 0\\ x_4 &{} 0 &{} 0 &{} 0 \end{bmatrix} . \end{aligned}$$

Definition 3

The diagonal spaces\({\mathcal {D}}_i\) and the inverse-diagonal spaces\(\mathcal {ID}_i\) are respectively defined as \({\mathcal {D}}_i = SR^{-1}({\mathcal {C}}_i) \equiv \langle e_{0, i }, e_{1, i+1}, e_{2, i+2 }, e_{3, i+3} \rangle \) and \(\mathcal {ID}_i = SR({\mathcal {C}}_i) \equiv \langle e_{0, i }, e_{1, i-1}, e_{2, i-2 }, e_{3, i-3} \rangle \), where the indexes are taken modulo 4.

For instance, \({\mathcal {D}}_0\) and \(\mathcal {ID}_0\) correspond to the symbolic matrices

$$\begin{aligned} {\mathcal {D}}_0 \equiv \begin{bmatrix} x_1 &{} 0 &{} 0 &{} 0\\ 0 &{} x_2 &{} 0 &{} 0 \\ 0 &{} 0 &{} x_3 &{} 0\\ 0 &{} 0 &{} 0 &{} x_4 \end{bmatrix}, \qquad \mathcal {ID}_0 \equiv \begin{bmatrix} x_1 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{}x_2 \\ 0 &{} 0 &{} x_3 &{} 0\\ 0 &{} x_4 &{} 0 &{} 0 \end{bmatrix}. \end{aligned}$$

Definition 4

The i-th mixed spaces\({\mathcal {M}}_i\) are defined as \({\mathcal {M}}_i = MC (\mathcal {ID}_i)\).

For instance, \({\mathcal {M}}_0\) corresponds to the symbolic matrix

$$\begin{aligned} {\mathcal {M}}_0 \equiv \begin{bmatrix} 0 \text{ x } 02 \cdot x_1 &{} x_4 &{} x_3 &{} 0 \text{ x } 03\cdot x_2 \\ x_1 &{} x_4 &{} 0 \text{ x } 03\cdot x_3 &{} 0 \text{ x } 02 \cdot x_2 \\ x_1 &{} 0 \text{ x } 03\cdot x_4 &{} 0 \text{ x } 02 \cdot x_3 &{} x_2 \\ 0 \text{ x } 03\cdot x_1 &{} 0 \text{ x } 02 \cdot x_4 &{} x_3 &{} x_2 \end{bmatrix}. \end{aligned}$$

Definition 5

For \(I \subseteq \{0, 1, 2, 3\}\), let \({\mathcal {C}}_I\), \({\mathcal {D}}_I\), \(\mathcal {ID}_I\) and \({\mathcal {M}}_I\) be defined as

$$\begin{aligned} {\mathcal {C}}_I = \bigoplus _{i\in I} {\mathcal {C}}_i, \qquad {\mathcal {D}}_I = \bigoplus _{i\in I} {\mathcal {D}}_i, \qquad \mathcal {ID}_I = \bigoplus _{i\in I} \mathcal {ID}_i, \qquad {\mathcal {M}}_I = \bigoplus _{i\in I} {\mathcal {M}}_i. \end{aligned}$$

As shown in detail in [18]:

  • for any coset \({\mathcal {D}}_I \oplus a\), there exists a unique \(b\in {\mathcal {C}}_I^\perp \) such that \(R({\mathcal {D}}_I \oplus a) = {\mathcal {C}}_I \oplus b\);

  • for any coset \({\mathcal {C}}_I \oplus a\), there exists a unique \(b\in {\mathcal {M}}_I^\perp \) such that \(R({\mathcal {C}}_I \oplus a) = {\mathcal {M}}_I \oplus b\).

Theorem 1

For each I and for each \(a \in {\mathcal {D}}_I^\perp \), there exists one and only one \(b\in {\mathcal {M}}_I^\perp \) such that

$$\begin{aligned} R^{2}({\mathcal {D}}_I \oplus a) = {\mathcal {M}}_I \oplus b. \end{aligned}$$
(1)

We refer to [18] for a complete proof of this theorem. Observe that b depends on a (the constant that defines the initial coset of \({\mathcal {D}}_I\)) and on the secret key k.

Moreover, note that if X is a generic subspace, \(X\oplus a\) is a coset of X and x and y are two elements of the (same) coset \(X\oplus a\), then \(x\oplus y \in X\). It follows that:

Lemma 1

For all xy and for all \(I\subseteq \{0,1,2,3\}\):

$$\begin{aligned} \text {Prob}(R^{2}(x) \oplus R^{2}(y) \in {\mathcal {M}}_I \, | \, x \oplus y \in {\mathcal {D}}_I ) = 1. \end{aligned}$$
(2)

As demonstrated in [18], we finally recall that for each \(I, J \subseteq \{0,1,2,3\}\):

$$\begin{aligned} {\mathcal {M}}_I \cap {\mathcal {D}}_J = \{0\} \qquad \text {if and only if} \qquad |I|+|J|\le 4, \end{aligned}$$
(3)

Theorem 2

Let \(I, J \subseteq \{0,1,2,3\}\) such that \(|I|+|J|\le 4\). For all \(x\ne y\):

$$\begin{aligned} \text {Prob}(R^{4}(x) \oplus R^{4}(y) \in {\mathcal {M}}_I \, | \, x \oplus y \in {\mathcal {D}}_J ) = 0. \end{aligned}$$
(4)

We remark that all these results can be re-described using a more “classical” truncated differential notation, as formally pointed out in [8, 23]. For example, if two texts \(t^1\) and \(t^2\) are equal except for the bytes in the i-th diagonalFootnote 3 for each \(i \in I\), then they belong in the same coset of \({\mathcal {D}}_I\). A coset of \({\mathcal {D}}_I\) corresponds to a set of \(2^{32 \cdot |I|}\) texts with |I| active diagonals. Again, two texts \(t^1\) and \(t^2\) belong in the same coset of \(\mathcal {ID}_I\) if the bytes that lie in the i-th anti-diagonal for each \(i \notin I\) are equal to zero. Similar considerations hold for the column space \({\mathcal {C}}_I\) and the mixed space \({\mathcal {M}}_I\).

3 Known-key security of block ciphers

Firstly, we give a formal definition of the known-key distinguisher scenario, recalling the one proposed in [16] by Gilbert as a starting point.

3.1 Definition of known-key distinguisher

Informally, a known-key distinguisher exploits the fact that it is in general harder for an adversary who does not know the key to derive an N-tuple of input blocks of a given block cipher E that is “abnormally correlated” with the corresponding N-tuple of output blocks than for one who knows the secret key. This difficulty is well expressed by the T-intractable definition, expressed by Gilbert as follows:

Definition 6

Let \(E : (K,X) \in \{0, 1\}^k \times \{0, 1\}^n \rightarrow E_K(X) \in \{0, 1\}^n\) denote a block cipher of block size n bits. Let \(N \ge 1\) and \({\mathcal {R}}\) denote an integer and any relation over the set S of N-tuples of n-bit blocks. \({\mathcal {R}}\) is said to be T-intractable relatively to E if, given any algorithm \({\mathcal {A}}\) that is given an oracle access to a perfect random permutation \(\varPi \) of \(\{0, 1\}^n\) and its inverse, it is impossible for \({\mathcal {A}}\) to construct in time \(T^\prime \le T\) two N-tuples \({\mathcal {X}} = (X_i)\) and \({\mathcal {Y}}= (Y_i)\) such that \(Y_i = \varPi (X_i)\), \(i = 1, ..., N\) and \({\mathcal {X}}\, {\mathcal {R}} \,{\mathcal {Y}}\) with a success probability \(p \ge 1/2\) over \(\varPi \) and the random choices of \({\mathcal {A}}\). The computing time \(T^\prime \) of \({\mathcal {A}}\) is measured as an equivalent number of computations of E, with the convention that the time needed for one oracle query to \(\varPi \) or \(\varPi ^{-1}\) is equal to 1. Thus if q denotes the number of queries of \({\mathcal {A}}\) to \(\varPi \) or \(\varPi ^{-1}\), then \(q \le T^\prime \).

Definition 7

Let \(E : (K,X) \in \{0, 1\}^k \times \{0, 1\}^n \rightarrow E_K(X) \in \{0, 1\}^n\) denote a block cipher of block size n bits. A known-key distinguisher \(({\mathcal {R}}, {\mathcal {A}})\) of order \(N \ge \)1 consists of (1) a relation \({\mathcal {R}}\) over the N-tuples of n-bit blocks (2) an algorithm \({\mathcal {A}}\) that on the input of a k-bit key K produces in time \(T_{{\mathcal {A}}}\), i.e. in a time equivalent to \(T_{{\mathcal {A}}}\) computations of E, an N-tuple \({\mathcal {X}} = (X_i)\)\(i=1, ..., N\) of plaintext blocks and an N-tuple \({\mathcal {Y}} = (Y_i)\)\(i=1,...,N\) of ciphertext blocks related by \(Y_i = E_K(X_i)\) and by \({\mathcal {X}}\, {\mathcal {R}} \,{\mathcal {Y}}\). The two following conditions must be met:

  • The relation \({\mathcal {R}}\) must be \(T_{{\mathcal {A}}}\)-intractable relatively to E;

  • The validity of \({\mathcal {R}}\) must be efficiently checkable.

To formalize the last requirement, we incorporate the time for checking whether two N-tuples are related by \({\mathcal {R}}\) in the computing time \(T_{{\mathcal {A}}}\) of algorithm \({{\mathcal {A}}}\).

We emphasize that while the algorithm \({\mathcal {A}}\) takes a random key K as input, the relation\({\mathcal {R}}\)satisfied by theN-tuples of input and output blocks constructed by\({\mathcal {A}}\)or\({\mathcal {A}}^\prime \)is the same for all values ofK(in other words, it is independent ofK) and must be efficiently checkable without knowingK.

Fig. 1
figure 1

A Known-Key Distinguisher Scenario. First, we assume a relationship \({\mathcal {R}}\) is chosen and fixed. Step (1): the secret key is given to the Oracle \(\varPi /\varPi ^{-1}\), to the Shortcut Player \({\mathcal {A}}\) and to the Verifier. Step (2): the Shortcut Player \({\mathcal {A}}\) and the Generic Player \({\mathcal {A}}^{\prime }\) generate the N-tuples that satisfy the required relationship \({\mathcal {R}}\). Step (3): the Verifier receives the N-tuple and checks if \({\mathcal {R}}\) is satisfied or not. The fastest player to generate the N-tuple wins the “game”

3.2 The known-key distinguisher scenario

To better understand these definitions, we propose and describe in more detail a generic scenario for a known-key distinguisher, which is depicted in Fig. 1. This scenario is composed of five characters, which are a key generator, an oracle, two players and a verifier. We assume that the oracle is instantiated by an ideal cipher\(\varPi \) defined asFootnote 4

$$\begin{aligned} \varPi : (k, p) \in \{0,1\}^k \times \{0,1\}^n \rightarrow c = \varPi (k, p) \in \{0,1\}^n \end{aligned}$$

such that \(\varPi (k, \cdot )\) is a permutation for each fixed \(k \in \{0,1\}^k\). Equivalently, \(\varPi \) is chosen uniformly at random among all ciphers with a k-bit key and an n-bit input/output. Moreover, we assume that the verifier knows the details both of E and of \(\varPi \).

After fixing a relation \({\mathcal {R}}\) defined as in Def. 6, the known-key distinguisher scenario can be described as follows:

1st) step::

the key generator generates a key, which is given to the oracle and to one of the two player. In the following:

  • shortcut player” denotes the player who knows the key and faces the encryption scheme E;

  • generic player” denotes the player who does not know the key and faces the ideal cipher \(\varPi \).

Referring to the previous definitions by Gilbert, the generic player can be identified by the algorithm \({\mathcal {A}}^\prime \), while the shortcut player can be identified by the algorithm \({\mathcal {A}}\);

2nd) step::

the two players generate the N-tuple of (plaintexts, ciphertexts) which satisfy the required relation \({\mathcal {R}}\). Since the generic player does not know the key, he must ask the oracle (identified with \(\varPi \) and/or \(\varPi ^{-1}\) in the previous definitions) for the encryption (resp. decryption) of chosen plaintexts (resp. ciphertexts). We stress that this step does not consist only in the generation of (plaintext, ciphertext) pairs, but also includes any computational cost that the player must do in order to find the N-tuple with the required property;

3rd) step::

when a player finds the N-tuple which satisfies the required relation \({\mathcal {R}}\), he sends it to the verifier. The verifier finally checks if (1) the relation \(Y^\prime _i = E_K(X^\prime _i)\) (case of shortcut player) or \(Y^\prime _i = \varPi (X^\prime _i)\) (case of generic player) is satisfied for each i and if (2) the N-tuple satisfies the relation \({\mathcal {R}}\). The first/fastest player who sends the N-tuple with the required property \({\mathcal {R}}\) wins the “game”.

A distinguisher is meaningful if the cost of the generic player—assuming that the cost of one oracle-query is equal to the cost of one encryption—to generate theN-tuple is higher than the cost of the shortcut player, when the probability of success is equal for the two players. Equivalently, a distinguisher is meaningful if the probability of the generic player to win the game is higher than the probability of the shortcut player, when the number of (plaintext, ciphertext) pairs that the two players can generate is fixed and equal for both players. In other words, in the first version one considers the computational costs of the two players to generate the N-tuples with a fixed probability of success (equal for both the players). In the second version, the computational cost (equivalent to the number of oracle queries for the generic player and the number of N-tuple generated by the shortcut one) is fixed and one considers the probabilities of success of the two players to win the game.

Before going further, we emphasize that the role of the verifier is only to prevent one or both of the two players from cheating. In other words, in the case of honest players, the verifier can be omitted, and the winner of the game is simply the first/fastest player that claims to have found the N-tuple of (plaintexts, ciphertexts) which satisfy the required relation \({\mathcal {R}}\). We highlight that such a verifier is implicitly present in all the distinguishers currently present in the literature.

Verification Step. Both for the distinguishers that we are going to present and for Gilbert’s one, the computational cost of the verification step is not negligible. To clarify, we identify the verification costFootnote 5only as the cost to check that the relation\({\mathcal {R}}\)holds. Thus, in order to compare our distinguishers to the others present in the literature, we define the cost of the distinguisher as the sum of the cost of the verification step (i.e. the cost of the verifier) and of the cost to construct the set of plaintexts/ciphertexts with the required property. For this reason, we assume for the following that a relationship\({\mathcal {R}}\)is efficiently checkable if and only if the computational cost of the verifier is negligible with respect to the players’ ones. This implies that the cost of the distinguisher can be approximated with the computational cost of the shortcut player (since the cost of the other player is always higher in the case of a meaningful distinguisher).

What about the cost of the Generic Player? Since the generic player depends on the oracle to generate the N-tuple (i.e. he cannot work alone to generate it), two possible settings can be analyzed. In the first one, only the number of oracle queries is considered to determine the computational cost of this player, that is the number of encryptions/decryptions required by the generic player to the oracle. In the second one, both the number of oracle queries and any other computational cost of the generic player (which is in general not negligible) are considered. Intuitively this second setting is weaker than the first one, in the sense that a known-key distinguisher in the first setting works also in the second one but not vice-versa. In other words, one can expect that the required number N of tuples is in general higher (or at least equal) in the first setting than in the second one.

For the goal of this paper, in the following we limit ourselves to consider only the first scenario.

Role of the Ideal Cipher in the Known-Key Scenario. For completeness, note that the ideal cipher in the previous model can be replaced by the encryption scheme E faced by the shortcut player if such a cipher is a Strong PseudoRandom Permutation (see Appendix A for more details).

4 State of the art of known-key distinguishers for AES-128

Here we review the most relevant distinguishers for AES in the above scenario, with particular attention to the ones proposed by Gilbert in [16]. For simplicity, we assume that the relations \(Y_i = E_K(X_i)\) (case of shortcut player) and \(Y_i = \varPi (X_i)\) (case of generic player) are always satisfied for each i, that is that the two players do not cheat about these relations.

4.1 Distinguishers based on the Rebound Technique

4.1.1 7- and 8-Round known-key distinguisher & the rebound attack

For the case of the 7- and 8-round known-key distinguishers proposed in [25] and [17], the goal of the two players is to find two pairs of (plaintexts, ciphertexts)—i.e. \((p^1, c^1)\) and \((p^2, c^2)\)—s.t. (1) the two plaintexts are equal in one fixed diagonal—equivalently, belong to the same coset of \({\mathcal {D}}_i\) for a fixed \(i\in \{0,1,2,3\}\) (i.e. \(p^1 \oplus p^2 \in {\mathcal {D}}_i\))—and (2) the two ciphertexts are equal in one fixed anti-diagonal (if the final MixColumns operation is omitted)—equivalently, belong to the same coset of \({\mathcal {M}}_i\) for a fixed \(i\in \{0,1,2,3\}\) (i.e. \(c^1 \oplus c^2 \in {\mathcal {M}}_i\)).

In the above known-key distinguisher setting, the best technique that the shortcut player (i.e. the player who knows the key) can exploit to win the game is the Rebound Attack. The rebound attack is a differential attack and it was proposed in [26] for the cryptanalysis of AES-based hash functions. Since it is a differential attack, one needs a “good” (truncated) differential trail in order to exploit it. Examples of truncated differential trails used for 7- and 8-round AES are depicted in Fig. 2. The rebound attack consists of two phases, called inbound and outbound phase. In the first one, the attacker uses the knowledge of the key to find pairs of texts that satisfy the middle rounds of the truncated differential trail. In the second one, he propagates the solutions found in the first phase in the forward and in the backward directions, and checks if at least one of them satisfies the entire differential trail.

Fig. 2
figure 2

7- and 8-round differential paths for AES-128

As proved in [17], for the AES case and using the rebound attack, the shortcut player needs approximately \(2^{48}\) computations in order to find the two (plaintexts, ciphertexts) pairs \((p_1, c_1)\) and \((p_2, c_2)\) with the required properties (besides a memory cost of \(16\times 2^{32} = 2^{36}\) bytes). Instead, in the case of an ideal cipher, the generic player needs approximately \(2^{64}\) operations in order to find them with the same probability.

Fig. 3
figure 3

8-round differential characteristic for known-key distinguisher of AES-128

4.1.2 Multiple limited-birthday 8-round known-key distinguisher

An improvement of the previous known-key distinguisher on 8-round of AES was proposed in [19]. Using the subspace trail notation, in this modified version of the 8-round known-key distinguisher, the goal of the two players is to find two pairs of (plaintexts, ciphertexts) such that the two plaintexts belong to the same coset of \({\mathcal {D}}_i\) for an arbitraryi and the two ciphertexts belong to the same coset of \({\mathcal {M}}_j\) for an arbitraryj, where i and j are not fixed in advance and it is not required that they are equal (i.e. no condition is imposed on i and j). A concrete example is depicted in Fig. 3. For arbitrary initial and final subspaces, the computational cost of the shortcut player is reduced from \(2^{48}\) to \(2^{44}\) (note that there are 4 initial and final different subspaces \({\mathcal {D}}_i\) and \({\mathcal {M}}_j\), for a total of \(4^2=2^4\) possibilities) while the required memory is still \(2^{32}\), as shown in detail in [19]. In Appendix E.1 we show that the same technique can be exploited to improve the 7-round known-key distinguisher presented in [25].

4.2 Gilbert’s known-key distinguishers

4.2.1 Uniform distribution 8-round known-key distinguisher

Another 8-round known-key distinguisher for AES is based on the uniform distribution property and it was proposed by Gilbert in [16]. In this case, the goal of the two players is to find a set of \(2^{64}\) (plaintext, ciphertext) pairs—that is, \((p^i, c^i)\) for \(i=0, ..., 2^{64}-1\)—such that the bytes of the plaintexts and of the ciphertexts are uniformly distributed:

  • for each \(j, k \in \{0, 1, 2, 3\}\) and for each \(x\in {\mathbb {F}}_{2^8}\), there are \(2^{56}\) plaintexts \(p^i\) for \(i\in I \subseteq \{0, ..., 2^{64}-1\}\) with \(|I|=2^{56}\) that satisfy \(p^i_{j,k} = x\) for all \(i\in I\);

  • for each \(j, k \in \{0, 1, 2, 3\}\) and for each \(x\in {\mathbb {F}}_{2^8}\), there are \(2^{56}\) ciphertexts \(c^i\) for \(i \in I \subseteq \{0, ..., 2^{64}-1\}\) with \(|I|=2^{56}\) that satisfy \(c^i_{j,k} = x\) for all \(i\in I\).

Using the subspace trail notation, it is possible to re-formulate the goal of the two players as follows: find a set of \(2^{64}\) (plaintext, ciphertext) pairs—that is, \((p^i, c^i)\) for \(i=0, ..., 2^{64}-1\)—such that

  • for each \(I\subseteq \{0,1,2,3\}\) with \(|I|=3\) the plaintexts are uniformly distributed in cosets of the diagonal space \({\mathcal {D}}_I\), or equivalently, for each I with \(|I|=3\) and for each \(a \in {\mathcal {D}}_I^\perp \) there are \(2^{32}\) plaintexts \(p^j\) for \(j\in J \subseteq \{0, ..., 2^{64}-1\}\) with \(|J|=2^{32}\) such that \(p^j \in {\mathcal {D}}_I \oplus a\) for all \(j\in J\);

  • for each \(I\subseteq \{0,1,2,3\}\) with \(|I|=3\) the ciphertexts are uniformly distributed in cosets of the mixed space \({\mathcal {M}}_I\), or equivalently, for each I with \(|I|=3\) and for each \(a \in {\mathcal {M}}_I^\perp \) there are \(2^{32}\) ciphertexts \(c^j\) for \(j \in J \subseteq \{0, ..., 2^{64}-1\}\) with \(|J|=2^{32}\) such that \(c^j \in {\mathcal {M}}_I \oplus a\) for all \(j\in J\).

If the final MixColumns is omitted, an equivalent condition holds on the ciphertexts by replaying the mixed space \({\mathcal {M}}_I\) with the inverse-diagonal one \(\mathcal {ID}_I\). To be more formal:

Definition 8

Consider \(2^{64}\) texts \(t^i \in {\mathbb {F}}_{2^8}^{4\times 4}\) for \(i=0, ..., 2^{64}-1\), and let \(I\subseteq \{0,1,2,3\}\) with \(|I| = 3\) fixed. These \(2^{64}\) texts \(t^i\) are “uniformly distributed” in cosets of \({\mathcal {M}}_I\) if

  • for each coset \({\mathcal {M}}_I \oplus a\) for \(a\in {\mathcal {M}}_I^\perp \), there exists a set \({\mathscr {T}}_a\) of \(2^{32}\) texts \({\mathscr {T}}_a := \{t^j\}_{j=0, ..., 2^{32}-1}\) such that \(t^j\in {\mathcal {M}}_I \oplus a\) for each \(t^j \in {\mathscr {T}}_a\);

  • given sets \({\mathscr {T}}_a\) and \({\mathscr {T}}_b\) just defined for two different cosets \({\mathcal {M}}_I \oplus a\) and \({\mathcal {M}}_I \oplus b\) where \((a\oplus b)\in {\mathcal {M}}_I^\perp \), then \({\mathscr {T}}_a \cap {\mathscr {T}}_b = \emptyset \).

Before going further, we prove that the two previous formulations are equivalent, namely that the bytes of the plaintexts are uniformly distributed if and only if the plaintexts are uniformly distributed in cosets of the diagonal space \({\mathcal {D}}_J\) for each J with \(|J|=3\) (analogous for the ciphertexts). Consider the case in which the plaintexts are uniformly distributed in cosets of the diagonal space \({\mathcal {D}}_J\), which means that for each \(a\in {\mathcal {D}}_{1,2,3}^\perp \equiv {\mathcal {D}}_0\), there are \(2^{32}\) plaintexts \({\hat{p}}^i\) for \(i\in I\subseteq \{0, ..., 2^{64}-1\}\) and \(|I|=2^{32}\) that belong e.g. to the same coset of \({\mathcal {D}}_{1,2,3} \oplus a\) (analogous for the other spaces \({\mathcal {D}}_J\) with \(|J|=3\)). For each fixed \(a\in {\mathcal {D}}_{1,2,3}^\perp \equiv {\mathcal {D}}_0\), this means that

$$\begin{aligned} \forall i \in I, \,\forall j \in \{0,1,2,3\}: \quad {\hat{p}}^i \in {\mathcal {D}}_{1,2,3} \oplus a \quad \text {if and only if} \quad {\hat{p}}^i_{j, j} = a_{j, j}. \end{aligned}$$

Working at byte level, note that for each \(x \in {\mathbb {F}}_{2^8}\) and for each index j, there are \(2^{24}\) different \(a\in {\mathcal {D}}_{1,2,3}^\perp \equiv {\mathcal {D}}_0\) s.t. \(a_{j,j} = x\). It follows that there are \(2^{24} \cdot 2^{32} = 2^{56}\) plaintexts \(p^i\) s.t. \(p^i_{j,j} = x\), which means that the bytes of the given plaintexts are uniformly distributed. To complete the proof, it is sufficient to use a similar strategy and the definition of \({\mathcal {D}}\) in order to prove that, given plaintexts whose bytes are uniformly distributed, they are uniformly distributed in cosets of the diagonal space \({\mathcal {D}}_J\) for each J with \(|J|=3\).

Finally, for the follow-up we highlight that the uniform distribution property implies the balance/zero-sum propertyFootnote 6 both on the plaintexts and on the ciphertexts.

The strategy of the shortcut player. Here, we briefly re-propose the best strategy that the shortcut player can use to win the game using the subspace trails notation instead of the Super-SB notation (\(Super\text{- }SB(\cdot ) \equiv \) S-Box \(\circ ARK \circ MC \circ \) S-Box\((\cdot ))\)—see Appendix C for more details. The idea is to start in the middle with a set \({\mathfrak {S}}\) of texts defined as \({\mathfrak {S}} := {\mathcal {D}}_i\oplus {\mathcal {M}}_j \oplus c\) for a constant c, where \(|{\mathfrak {S}}|=2^{64}\). Observe that

$$\begin{aligned} {\mathfrak {S}} = {\mathcal {D}}_i\oplus {\mathcal {M}}_j \oplus c \equiv \bigcup _{b \in {\mathcal {D}}_i \oplus c} {\mathcal {M}}_j\oplus b= \bigcup _{a \in {\mathcal {M}}_j \oplus c} {\mathcal {D}}_i \oplus a, \end{aligned}$$
(5)

i.e. the set \({\mathfrak {S}}\) can be re-written as the union of cosets of the space \({\mathcal {D}}_i\) or as the union of cosets of the space \({\mathcal {M}}_j\). The ciphertexts are given by the 4-round encryption of \({\mathfrak {S}}\), and the plaintexts by the 4-round decryption of \({\mathfrak {S}}\).

After encrypting \({\mathfrak {S}}\) for 4 rounds, the texts are uniformly distributed in each coset of \({\mathcal {M}}_I\) of dimension 12 (i.e. \(|I|=3\)). That is, after 4 rounds, each coset of \({\mathcal {M}}_I\) for \(|I|=3\) contains exactly \(2^{32}\) elements. Indeed, by Theorem 2 note that given two elements in the same coset of \({\mathcal {D}}_I\), they can not belong to the same coset of \({\mathcal {M}}_J\) for \(|I|+|J|\le 4\) after 4-round. Thus, given a coset of \({\mathcal {D}}_i\) with \(|i|=1\), after 4 rounds each element is distributed in a different cosets of \({\mathcal {M}}_J\) for \(|J|=3\). Since a coset of \({\mathcal {D}}_i\) contains \(2^{32}\) elements and since there are exactly \(2^{32}\) cosets of \({\mathcal {M}}_J\), the elements of \({\mathcal {D}}_i \oplus {\mathcal {M}}_j\) are uniformly distributed in each coset of \({\mathcal {M}}_I\). The same happens if one decrypts \({\mathfrak {S}}\) for 4 rounds. In this case, after decrypting \({\mathfrak {S}}\) for 4 rounds, the texts are uniformly distributed in each coset of \({\mathcal {D}}_I\) of dimension 12 (i.e. \(|I|=3\)), that is each coset of \({\mathcal {D}}_I\) for \(|I|=3\) contains exactly \(2^{32}\) elements.

On the meaningfulness of this distinguisher. What is the minimum number\(N\equiv 2^{64}+M > 2^{64}\)of random (plaintext, ciphertext) pairs s.t. there is a subset of \(2^{64}\) pairs whose bytes are uniformly distributed both on the plaintexts and on the ciphertexts with non-negligible property?

First of all, note that \(2^{64}\) texts satisfy the uniform distribution on each byte with probability

$$\begin{aligned} p = \left[ \prod _{i=0}^{255} \left( {\begin{array}{c}2^{64} - i \cdot 2^{56}\\ 2^{56}\end{array}}\right) \cdot \left( 2^{-8}\right) ^{2^{64}} \right] ^{16}=\left( \frac{2^{64}!}{\left( 2^{56}!\right) ^{256}} \cdot \left( 2^{-8}\right) ^{2^{64}} \right) ^{16}. \end{aligned}$$

Indeed, consider the following problem. Given N texts and 2 sets, assume that each text belongs to one of the two sets with probability \(2^{-1}\). It follows that the N texts are uniformly distributed among the two sets with prob. \(\left( {\begin{array}{c}N\\ N/2\end{array}}\right) \cdot 2^{-N}\). In a similar way, given \(d\ge 2\) sets, they are uniformly distributed with probabilityFootnote 7\( \prod _{i=0}^{d-1} \left( {\begin{array}{c}N - i \cdot N/d\\ N/d\end{array}}\right) \cdot d^{-N}\).

Using Stirling’s formula \(n! \simeq n^n \cdot e^{-n} \cdot \sqrt{2 \pi \cdot n}\), this probability is well approximated by

$$\begin{aligned} p = \left( \frac{2^{64}!}{\left( 2^{56}!\right) ^{256}} \cdot \left( 2^{-8}\right) ^{2^{64}} \right) ^{16} \simeq \left( \frac{1}{2^{49} \cdot \pi }\right) ^{128} \cdot \left( 256!\right) ^{-1/2} \simeq 2^{-7328.1} \equiv 2^{-2^{12.84}}. \end{aligned}$$
(6)

In other words, given \(2^{64}\) plaintexts whose bytes are uniformly distributed, this represents the probability that the bytes of the corresponding ciphertexts are uniformly distributedFootnote 8.

Given \(2^{64} + M\) texts, it is possible to construct

$$\begin{aligned} \left( {\begin{array}{c}2^{64} + M\\ 2^{64}\end{array}}\right) \simeq \frac{1}{\sqrt{2\pi \cdot M}} \cdot \left( \frac{2^{64} + M}{M}\right) ^M \end{aligned}$$

different sets of \(2^{64}\) texts (where the approximation is given using Stirling’s formula and by the assumption \(M\ll 2^{64}\)). This number is always bigger than \(p^{-2}\equiv 2^{2^{13.84}}\) for each \(M\ge 2^{12}\). Thus, given \(2^{64} + 2^{12}\) random pairs, there is a good probability to find \(2^{64}\) (plaintext, ciphertext) pairs for which the bytes of the plaintexts and of the ciphertexts are uniformly distributed. It follows that if the cost of the generic player is approximated by the number of oracle queries, then his cost is approximately of \(2^{64} + 2^{12} \simeq 2^{64}\) encryptions versus \(2^{64}\) encryption of the shortcut player.

So, why is this distinguisher meaningful? Instead of focusing on the cost of the two players, the idea is to show that the probability of the generic player to win the game given \(2^{64}\) texts is negligible. To do this, authors of [16] claim that this probability is upper bounded by the probability of the following game: “given\(2^{64}-1\)(plaintext, ciphertext) pairs whose bytes are ‘almost uniform’—see the definition in the following, find a text for which the bytes of the corresponding\(2^{64}\)texts are uniformly distributed”. Since this probability is upper bounded by \(2^{-127}\)—see proof of Prop. 4 of [16] – and since this second game is (strongly) “related” to the original one, the conclusion follows immediately. For completeness, we emphasize that no formal proof is provided in [16] that supports this second claim. In other words, it is not formally proved that the fact that this second game is “hard” implies the hardness of the original game, and/or viceversa.

Finally, we formally define what “almost uniform” means. Consider \(2^{64}-1\) texts \(t^i \in {\mathbb {F}}_{2^8}^{4\times 4}\) for \(i=0, ..., N - 2\). We say that the bytes of \(2^{64}-1\) texts \(t^i\) are “almost uniform” if for each row and column \(j, k = 0, 1, 2, 3\) (1) there exists \(x\in {\mathbb {F}}_{2^8}\) s.t. there are \(2^{56} - 1\) texts that satisfy \(t^i_{j, k} = x\) and (2) for each \(y \in {\mathbb {F}}_{2^8} \setminus x\), there are \(2^{56}\) texts that satisfy \(t^i_{j, k} = y\). More generally:

Definition 9

Consider \(2^N-d\) texts \(t^i \in {\mathbb {F}}_{2^8}^{4\times 4}\) for \(i=0, ..., N - d-1\) for \(d\ge 1\). The bytes of these \(2^N-d\) texts \(t^i\) are “almost uniform” if for each row and column \(j, k = 0, 1, 2, 3\):

  • there exists a set \(X \equiv \{x_1, ..., x_s \in {\mathbb {F}}_{2^8} \}\) with cardinality \(s\le d\) such that for each \(x_l \in X\) with \(1\le l \le s\) there are \(2^{N-8} - d \le {\hat{s}}_l \le 2^{N-8} - s\) texts that satisfy \(t^i_{j, k} = x_l\) where \(\sum _{l=1}^s {\hat{s}}_l = d\);

  • for each \(y \in {\mathbb {F}}_{2^8} \setminus X\), there are \(2^{N-8}\) texts that satisfy \(t^i_{j, k} = y\).

Note that, given a set of \(2^N\) texts whose bytes are uniformly distributed, then the bytes of each subset of \(2^N-d\) texts (for each \(d\ge 1\)) are “almost uniform” distributed w.r.t. the previous definition.

4.2.2 Extension to 10 rounds of AES

The previous distinguisher is the starting point used by Gilbert in order to set up the first 10-round known-key distinguisher for AES. The basic idea is to extend this 8-round distinguisher based on the uniform distribution property adding one round at the end and one at the beginning. In the known-key distinguisher scenario presented above, the players have to send to the verifier \(2^{64}\) (plaintext, ciphertext) pairs, that is \((p^i, c^i)\) for \(i = 0, ..., 2^{64}-1\), such that :

  1. 1.

    there exists a key \(k^0\) s.t. the bytes of \(\{R_{k^0}(p^i)\}_i\) are uniformly distributed, or equivalently s.t. the texts \(\{R_{k^0}(p^i)\}_i\) are uniformly distributed among the cosets of \({\mathcal {D}}_I\) for each I with \(|I|=3\);

  2. 2.

    there exists a key \(k^{10}\) s.t. the bytes of \(\{R_{k^{10}}^{-1}(c^i)\}_i\) are uniformly distributed, or equivalently s.t. the texts \(\{R_{k^{10}}^{-1}(c^i)\}_i\) are uniformly distributed among the cosets of \({\mathcal {M}}_J\) for each J with \(|J|=3\).

We emphasize that it is not required that\(k^0\)and\(k^{10}\)are equal to the “real” subkeys (generated by the key-generator – see before) that define\(E_K(\cdot )\), that is\(k^r\)can be different from ther-th subkey. In other words, it is only required that such keys exist, and not that they are equal to the real subkeysFootnote 9. The same assumption holds for all Gilbert’s like distinguishers presented in this paper and in the literature. Moreover, in this game, the subkeys \(k^0\) and \(k^{10}\) are assumed to be independent—no key-schedule holds (argumentation are given by Gilbert to show that the same distinguisher is applicable also to the case in which the key-schedule holds—we discuss this topic in details in the following).

Since uniform distribution implies balance property (vice-versa is not true in general), for the follow-up we highlight that if the plaintexts and the ciphertexts satisfy the previous properties, then they also have the zero-sum property respectively after one round encryption w.r.t. the key \(k^0\) (that is, \(\bigoplus _{i=0}^{2^{64}-1} R_{k^0}(p^i) = 0\)) and after one round decryption w.r.t. the key \(k^{10}\) (that is, \(\bigoplus _{i=0}^{2^{64}-1} R_{k^{10}}^{-1}(c^i) = 0\)).

On the meaningfulness of this distinguisher. What is the probability that given a set of \(2^{64}\) texts there exists a key \({\hat{k}}\) such that the bytes of 1-round encryption (resp. decryption) of such texts are uniformly distributed? Using the calculation proposed for the 8-round distinguisher and since there are \(2^{128}\) different keys, this probability is equal to \(2^{128} \cdot p \simeq 2^{128} \cdot 2^{-7328.1} = 2^{-7200.1} \equiv 2^{-2^{12.81}}\) where p is defined in (6). Similar to the 8-round case, it follows that \(2^{64} + 2^{12} \simeq 2^{64}\) (plaintext, ciphertext) pairs are sufficient to have good probability to win the game.

Again, why is this distinguisher meaningful? Working as for the 8-round case and in order to support this distinguisher, authors of [16] show that the probability of the generic player to win the game given \(2^{64}\) texts is negligible. To do this, a claim is made about the fact that this probability is upper bounded by the probability of the following game: “given\(2^{64}-d\)(plaintext, ciphertext) pairs for\(d\ge 5\)– that is,\((p^i, c^i)\)for each\(i=0, ..., 2^{64}-d-1\)– with the property that there exist a set of keys\(k^0\)and\(k^{10}\)for which the bytes of\(R_{k^0}(p^i)\)and of\(MC^{-1} \circ R^{-1}_{k^{10}}(c^i)\)(that is 1-round encryption of\(p^i\)and the 1-round decryption of the ciphertexts) are ‘almost uniform’ distributed, find the remainingdtexts for which the bytes of the corresponding\(2^{64}\)texts after 1-round encryption/decryption are uniformly distributed”. Since this probability is upper bounded by \( (2^{128})^2 \times \left( \frac{5^{16}}{2^{128} - 2^{64} + 1}\right) ^3 \simeq 2^{-16.5}\)—see proof of Prop. 6 in [16]— and since this second game is “related” to the original one, the conclusion follows immediately.

Strategy of the verifier, of the shortcut player and of the generic one. Since the keys \(k^0\) and \(k^{10}\) for which the relation \({\mathcal {R}}\) is satisfied can be different from the real subkeys, the verifier has no information of the keys for which the relation \({\mathcal {R}}\) is satisfied, and her task is to check if they exist. It follows that one must show that the above conditions are efficiently checkable. The only way to verify these requirements is to find these two subkeys in an efficient way, which is not possible using a brute force attack (\(k^0\) and \(k^{10}\) have 128 bits). Under Gilbert’s assumption—no key-schedule holds, the verifier can work independently on \(k^0\) and \(k^{10}\). Instead of checking all the \(2\cdot 2^{128}=2^{129}\) possible values of \(k^0\) and \(k^{10}\), the idea proposed in [16] is to check uniform distribution working on single columns of \(SR(c^i)\) and of \(SR^{-1}(p^i)\) (the strategy proposed by GilbertFootnote 10 is similar to the one proposed in Algorithm 1). In this way, the verifier must guess only 32 bits instead of 128, and she has to repeat this operation 4 times (one for each anti-diagonal/diagonal) for each key. In the following, we discuss a way to improve this procedure working independently on each byte of \(k^0\) and \(k^{10}\) instead of entire anti-diagonal/diagonal. The idea is simply to use integral attack [11, 21] to filter wrong keys (much) faster.

About the shortcut player (i.e. the one who knows the key), he can construct these \(2^{64}\) (plaintext, ciphertext) pairs using the same strategy just discussed for the 8 rounds distinguisher (note that in this case the keys \(k^0\) and \(k^{10}\) correspond to the secret sub-keys). As a result, the distinguisher can be considered meaningful (w.r.t. the definition given before) since (1) the probability that the generic player (i.e. the one who does not know the secret key) successfully outputs (input, output) pairs that satisfy the previous properties (both in the input and in the output) is upper bounded by \(2^{-16.5}\) and since (2) the verifier can find the keys \(k^0\) and \(k^{10}\) that satisfy the required property (if they exist) with a computational cost which is smaller than the cost of the two players.

4.2.3 Generic considerations on Gilbert’s 10-round distinguisher

The previous 10-round distinguisher proposed in [16] is different from all the previous distinguishers up to 8 rounds present in the literature. For all distinguishers up to 8-round, the relation \({\mathcal {R}}\) that the N-tuple of (plaintexts, ciphertexts) must satisfy does not involve any operation of the block cipher E. As a consequence, it allows the verifier to check whether the N-tuple of (plaintexts, ciphertexts) satisfy the required relation \({\mathcal {R}}\) without knowing anything of the key. When \({\mathcal {R}}\) does not re-use operations of E, this provides some heuristic evidence that this distinguisher can be considered meaningful.

On the other hand, the previous 10-round distinguisher and the ones that we are going to propose do not satisfy this requirement, i.e. in these cases the relation \({\mathcal {R}}\) involves and re-uses some operations of E. The novelty of Gilbert’s work is not just the possibility to extend the distinguisher up to 10-round AES, but rather the introduction of a new distinguisher model. Requiring the existence of round keys for which the 1-round encryption of the plaintexts (respectively, 1-round decryption of the ciphertexts) satisfy the relation \({\mathcal {R}}\), or in other words considering relations \({\mathcal {R}}\) that depend on some operations of E, allows to set up new distinguishers that penetrate more round of the block cipher. For a detailed discussion on the reasons why such known-key distinguishers should not be systematically ruled out as if they were artificial we refer to Sect. 3 of [16].

A variant of Gilbert’s distinguisher. Before going further, we highlight a variant of Gilbert’s distinguisher—that also applies to all our proposed distinguishers present in the paper – which allows to better understand it. Consider the case in which the two players have to send to the verifier the N-tuple that verify the required relation \({\mathcal {R}}\)together with the subkeys for which such a relation is satisfied. As an example, in the 10-round distinguisher just presented, the players have to send \(2^{64}\) (plaintexts, ciphertexts) pairs \((p^i, c^i)\) and the two subkeys \(k^0\) and \(k^{10}\) such that the bytes of \(R_{k^{0}}(p^i) \) and \(MC^{-1}\circ R_{k^{10}}^{-1}(c^i)\) are uniformly distributed. Thus, since the task of the verifier is to check that the relation \({\mathcal {R}}\) is satisfied only for the keys she received, it follows that her computational cost is negligible. On the other hand, we show in details in Appendix B that such variant of the distinguisher is meaningless, since it can be set up for any number of rounds of AES.

figure a
figure b

4.2.4 Another strategy for the verifier

In order to extend Gilbert’s distinguisher on 12-round AES by exploiting the uniform distribution property, we present another possible strategy that allows to check the existence of keys \(k^0\) and \(k^{10}\) for which the required property \({\mathcal {R}}\) is verified. The goal of the following strategy is not to improve the computational cost of the verifier, but to show the possibility to check the existence of such keys working independently on each byte of the keys instead of combinations of 4 bytes. The idea is simply to first filter wrong key candidates using the integral attack [11, 21]: in this way, the verifier limits herself to check the uniform distribution property only on the keys that satisfy the zero-sum property. In other words, instead of checking directly the uniform distribution property as done in [16], we first filter wrongly guessed key by checking the zero-sum property.

In more detail, instead of working on 4 bytes of the subkeys \(k^{0}\) and \(k^{10}\) simultaneously (as proposed in [16]), we highlight that it is actually possible to work at byte level, finding \(k^{0}\) and \(k^{10}\) on single bytes (independently of the others) as in a classical integral/zero-sum attack. The idea is to exploit the fact that uniform distribution implies zero-sum property:

1st) step/filter::

the verifier first looks for subkeys \(k^{0}\) and \(k^{10}\) that satisfy \(\bigoplus _{i=0}^{2^{64}-1} R_{k^{0}}(p^i) = 0\) and \(\bigoplus _{i=0}^{2^{64}-1} R^{-1}_{k^{10}}(c^i) = 0\) working independently on each byte;

2nd) step/filter::

only for keys that satisfy zero-sum, she then checks if the uniform property is verified, working simultaneously on 4 bytes of the subkeys.

We emphasize that if zero-sum is not satisfied, then also uniform distribution is not satisfied. Moreover, we highlight that the number of subkeys that satisfy zero-sum is very small compared to the number of all possible keys. Indeed, note that since zero-sum is satisfied with prob. \(2^{-128}\) and since there are only \(2^{128}\) keys, on average only one key passes the first step/filter. This also implies that “checking uniform distribution once that zero-sum property is satisfied” has negligible cost compared to the total cost. A pseudo-code of this strategy is proposed in Algorithm 2.

Just for completeness, we mention that using this proposed strategyFootnote 11, the verification cost is a little smaller than the one given in the original strategy proposed in [16] (approximatelyFootnote 12\(2\cdot 2^{64}\) vs \(10 \cdot 2^{64}\) look-ups table, that is \(2^{57.36}\) vs \(2^{59.7}\) ten-round encryptions assuming 1 S-Box look-up \(\approx \) 1 table look-up). One more time, we emphasize that the goal of this strategy is to show the possibility to work on single byte of the key independently of the others in order to find \(k^0\) and \(k^{10}\): this allows us in the following to set up a distinguisher on 12-round AES.

4.3 Statistical integral distinguisher with multiple structures

Finally, we mention for completeness that at ACISP 2017 the distinguishers proposed by Gilbert in [16] has been improved by T. Cui, L. Sun, H. Chen and M. Wang [10]. In this paper, authors turn both the 8- and 10-round Gilbert’s distinguishers into “statistical integral ones” [32] with the goal to reduce the data/time complexity.

5 New 10-round distinguisher of AES—Full AES-128

Using the same strategy proposed by Gilbert in [16], we set up our 10-round distinguisher by extending the 8-round one presented in [19] and recalled in Sect. 4.1.2 both at the beginning and at the end. In the above defined known-key distinguisher scenario, the players have to send to the verifier \(n\ge 64\) different tuples of (plaintext, ciphertext) pairs, that is \(\{(p^1_i, c^1_i), (p^2_i, c^2_i) \}\) for \(i = 0, ..., n-1\), s.t.:

  1. 1.

    there exists a key \(k^0\) s.t. for each tuple there exists j for which the two plaintexts belong to the same coset of \({\mathcal {D}}_j\) after one round, that is

    $$\begin{aligned} \exists \, k^0 \quad \text {s.t.} \quad \forall i \in \{ 0, ..., n-1\}, \quad \exists j \in \{ 0, ..., 3 \} \quad \text {s.t.} \quad R_{k^0}(p^1_i) \oplus R_{k^0}(p^2_i) \in {\mathcal {D}}_j; \end{aligned}$$
  2. 2.

    there exists a key \(k^{10}\) s.t. for each tuple there exists l for which the two ciphertexts belong to the same coset of \({\mathcal {M}}_l\) one round before, that is

    $$\begin{aligned} \exists \, k^{10} \quad \text {s.t.} \quad \forall i \in \{ 0, ..., n-1\}, \quad \exists l \in \{ 0, ..., 3 \} \quad \text {s.t.} \quad R^{-1}_{k^{10}}(c^1_i) \oplus R^{-1}_{k^{10}}(c^2_i) \in {\mathcal {M}}_l. \end{aligned}$$

We stress that the keys\(k^{0}\)and\(k^{10}\)must be equal for all the tuples. In other words, if there exist two different tuples \((c_0, c_1)\) and \((c_2, c_3)\) such that \(R^{-1}_k(c_0) \oplus R^{-1}_k(c_1) \in {\mathcal {M}}_l\) and \(R^{-1}_{{\tilde{k}}}(c_2) \oplus R^{-1}_{{\tilde{k}}}(c_3) \in {\mathcal {M}}_{{\tilde{l}}}\) for two different keys \(k\ne {\tilde{k}}\), then the above defined relation \({\mathcal {R}}\) is not satisfied. Note that without this request on the secret keys \(k^{0}\) and \(k^{10}\), it is extremely easy to construct tuples such that the two ciphertexts belong to the same coset of \({\mathcal {M}}_l\) one round before. Indeed, as we are going to show, given two ciphertexts \(c^1\) and \(c^2\), on average there exist \(4\cdot (2^8)^4 = 2^{34}\) different keys k such that \(R^{-1}_k(c^1) \oplus R^{-1}_k(c^2) \in {\mathcal {M}}_l\) for a certain l. As a result, it is straightforward to construct n different tuples that satisfy the above defined relationship \({\mathcal {R}}\) without any condition on the key k.

We also observe that the claim “the transposition of our technique to the 8-round distinguisher of [17] does not allow to derive a valid 10-round distinguisher” made in [16] is justified only when just \(n=1\) tuple of pairs is used and/or no assumption on the key k is done. In other words, the above defined relation \({\mathcal {R}}\)—for which we consider \(n\ge 64\) different tuples of pairs of texts—together with the requirement of uniqueness of the keyk allows to extend the 8-round distinguisher of [17] exploiting the same strategy proposed in [16].

Key-schedulevsIndependent Subkeys. Before we go on, it is also important to emphasize that no condition on the keys \(k^0\) and \(k^{10}\) is imposed, except that they exist and they are equal for all the tuples. That is, we do not require that these keys are equal to the real secret subkeys. The same consideration holds also for the next distinguishers presented in this paper, and for the 10-round distinguisher presented by Gilbert in [16].

Moreover, as in [16], two possible scenarios can be considered and studied:

  1. 1.

    no key-schedule holds—\(k^{0}\) and \(k^{10}\) are independent;

  2. 2.

    AES key-schedule among \(k^{0}\) and \(k^{10}\).

Intuitively, the second case (i.e. with key schedule) is harder than the first one (i.e. without key schedule) for the generic player, since a further property must be verified. In other words, the time required by this player to generate the tuples for the second scenario is not smaller than for the first one, that is the probability of success in the second scenario is not higher than in the first one.

In the following, we limit ourselves to consider the case of independent subkeys. To justify this choice, we recall the strategy adopted by Gilbert in [16] to set up his 10-round distinguisher. First he considers the case of AES with independent subkeys (denoted by \(\hbox {AES}^\star _{10}\)), and he presents a 10-round known-key distinguisher for \(\hbox {AES}^\star \). Then, he simply observes that this known-key distinguisher on \(\hbox {AES}^\star _{10}\)is obviously applicable without any modification to\(\hbox {AES}_{{10}}\), i.e. the full AES-128” (see [16, Sect. 4.2 - page 221]). Using the same argumentation, we can easily conclude that also our distinguisher can be applied to real AES, i.e. to the case in which the key schedule holds. Indeed, as we are going to highlight in the following, nothing changes for the shortcut player (i.e. the one who knows the key), while this scenario is more complicated for the generic player, since a further condition on \(k^0\) and \(k^{10}\) (that is, the key schedule) is imposed.

About the “Number\({\mathbf {n}}\)of Different Tuples of (plaintext, ciphertext) Pairs”. In the following we present the distinguisher in the case of independent subkeys. To obtain a suitable value for n, we consider the best strategy that the generic player can adopt to win the game.

A value of n is suitable when the computational cost of the generic player is worse than the one of the other player. To find such a value, one has to consider the numbers of oracle-queries done by the two players (and potentially any further cost of the generic player). In particular, if only the number of oracle-queries is taken in account, then n must be equal or greater than 8, which implies that the computational cost for the shortcut player is of \(2^{47}\) and for the generic player is (approximately) of \(2^{48.9}\). In order to make the advantage of the shortcut player more significant, we have chosen an (arbitrary) value of \(n=64\), which implies a cost for the shortcut player of \(2^{50}\) computations and (approximately) of \(2^{65.6}\) computations for the generic player.

5.1 The verifier

Given n tuples, the verifier has to check the existence of keys \(k^{0}\) and \(k^{10}\) as defined previously. Since the subkeys are independent (no key schedule is considered), the idea is simply to work independently on the plaintexts (in order to find \(k^{0}\)) and on the ciphertexts (in order to find \(k^{10}\)).

Let’s work for simplicity on the ciphertexts (analogous for the plaintexts). The idea is to find the key \(k^{10}\) (if it exists) using the the low-data truncated differential attackFootnote 13 on 3-round AES-128 presented in [18]. In the following, we briefly recall such an attack, opportunely modified with respect to the one presented in [18] due to the different scope of this work. In particular, here we describe the attack on 3 rounds presented in [18] as an attack on a single round.

Truncated differential attack [18]. Consider three texts in the same coset of \({\mathcal {M}}_i\) for \(i \in \{0,1,2,3\}\) and the corresponding ciphertexts after one round, that is \((p^j, c^j)\) for \(j=1,2,3\) where \(c^j = R(p^j)\) and \(p^j \in {\mathcal {M}}_i \oplus a\) for an arbitrary (fixed) \(a \in {\mathcal {M}}_i^\perp \). The goal of the attack is to find the key k such that the ciphertexts belong to the same coset of \({\mathcal {M}}_i\) one round before, that is k has to satisfy the following condition:

$$\begin{aligned} R^{-1}_k(c^1) \oplus R^{-1}_k(c^2) \in {\mathcal {M}}_i \qquad \text {and} \qquad R^{-1}_k(c^1) \oplus R^{-1}_k(c^3) \in {\mathcal {M}}_i. \end{aligned}$$
(7)

For simplicity, we assume that the final MixColumns operation is omitted (otherwise one simply switches the final MixColumns and the final AddRoundKey operation, as usual in the literature).

Since each column of \({\mathcal {M}}_i\) depends on different and independent variables, the idea of the attack is to work independently on each column of \({\mathcal {M}}_i\) (and so of \(SR^{-1}(k)\)), and to exploit the relationships that hold among the bytes that lie in the same column of \({\mathcal {M}}_i\).

Without loss of generality, we assume \(i=\{0\}\) and we present the attack only for the first column of \(SR^{-1}(k)\) (analogous for the others). As showed in [18], the conditions (7) are fulfilled if the bytes of the first column of \(SR^{-1}(k)\) satisfy the following relations:

$$\begin{aligned} s^h_{0,0} = 0\text{ x }02 \cdot s^h_{1,3}, \qquad s^h_{2,2} = s^h_{1,3}, \qquad s^h_{3,1} = 0\text{ x } 03 \cdot s^h_{1,3}, \end{aligned}$$
(8)

where \(s^h_{i,j} =\) S-\(\hbox {Box}^{-1}(c^1_{i,j} \oplus k_{i,j}) \oplus \) S-\(\hbox {Box}^{-1}(c^h_{i,j} \oplus k_{i,j})\) for \(h=2,3\). For each value of \(k_{1,3}\) (\(2^8\) possible values in total), the idea is to find the values of \(k_{0,0}\), \(k_{2,2}\) and \(k_{3,1}\) that satisfy the previous relationships. On average, \(2^8\) combinations of these four bytes (i.e. one for each possible value of \(k_{1,3}\)) satisfy the relations (8) for each pair of the ciphertexts. In other words, given two texts \(c^1\) and \(c^2\), on average there are \((2^{8})^4 = 2^{32}\) keys k for which the condition \(R^{-1}_k(c^1) \oplus R^{-1}_k(c^2) \in {\mathcal {M}}_i\) is satisfied (as already mentioned before). The idea is to test them using the second pair of ciphertexts: on average, only the right combination passes the test. The same procedure is used for the others columns of \(SR^{-1}(k)\).

The total computational cost of the attack is well approximated by the cost of the first phase, that is by the cost to find (on average) the \(2^8\) combinations of \(k_{0,0}, ..., k_{3,1}\) that satisfy (8) for the first column—similar for the others (the cost of checking them with the second pair of texts is negligible). In particular, the computational cost of this attack using 3 chosen plaintexts can be approximated by \(2^{17.1}\) S-Box look-ups (and negligible memory cost), or approximately \(2^{11.6}\) table look-ups and a memory cost of \(16 \times 2^{12} = 2^{16}\) using a precomputation phase. We refer to [18] for all the details.

Finally, we emphasize that the same attack works exactly in the same way also in the decryption direction (chosen ciphertexts attack) with the same complexity. In this case, the idea is to look for a key such that the corresponding plaintexts belong to the same coset of \({\mathcal {D}}_i\) after one round (see [18] for details).

figure c

The verifier strategy. The verifier simply applies the previous strategy in order to find key \(k^{10}\) (analogous for \(k^0\)). First of all, given a single tuple, there exist on average \(4\cdot (2^{8})^4 = 2^{34}\) keys of the final round such that the two ciphertexts belong to the same coset of \({\mathcal {M}}_l\) one round before for a certain \(l\in \{0,1,2,3\}\). Since the probability that two keys \(k^{10}\) are equal is \(2^{-128}\) and given n tuples, the probability that at least one key \(k^{10}\) exists (for which the previous requirements are satisfied) is given by:

$$\begin{aligned} 2^{34n}\cdot 2^{-128(n-1)} = 2^{-94\cdot n + 128}. \end{aligned}$$

By this preliminary analysis, it is already possible to deduce that the number of tuples should be at least 2 (i.e. \(n\ge 2\)). Indeed, for \(n=1\) such a key always exists (which implies that using a random tuple it is possible to win the game), while for \(n=2\) the probability that such key exists for two random tuples is only \(2^{-60}\).

Thus, assume that the verifier receives \(n\ge 2\) tuples. Given the first tuple and working independently on each column as described before, the attacker finds \(2^8\) combinations for each column of \(SR^{-1}(k)\) and checks them immediately with the second tuple. Since she repeats this attack for each possible \({\mathcal {M}}_l\) (i.e. 4 times), the cost of this step is of \(4\cdot 2^{17.1}=2^{19.1}\) S-Box look-ups. In this way, the verifier finds on average only one key (if it exists). If at least one possible key is found using two tuples, she simply checks if the other \(n-2\) tuples satisfy the relation \({\mathcal {R}}\) for this found key. The cost of this operation is well approximated by \(2 \cdot 16 = 2^{5}\) S-Box look-ups for each tuple (note that she must decrypt one round for two ciphertexts).

In conclusion, since the verifier applies the same attack on the plaintexts and on the ciphertexts, given \(n\ge 2\) tuples, the cost of the verifier is well approximated by \(2 \times [2^{19.1} + (n-2) \cdot 2^5]\) S-Box look-ups, that is approximately \(2^{12.5}\) 10-round encryptions if \(n \ll 2^{14}\).

5.2 The shortcut player

The shortcut player can simply use the rebound attack described in [19] and in Sect. 4.1.2 for the known-key distinguisher on 8 rounds in order to find the n tuples that satisfy the above defined relation \({\mathcal {R}}\). Indeed, it is straightforward to prove that all the properties are satisfied, since for each tuple the two plaintexts belong to the same coset of \({\mathcal {D}}_i\) (for a certain i) after 1-round encryption (w.r.t. the real subkey \(k^0\)) and the two ciphertexts belong to the same coset of \({\mathcal {M}}_l\) (for a certain l) 1-round decryption (w.r.t. the real subkey \(k^{10}\)) by construction. Since the computational cost to build one tuple is of \(2^{44}\) encryptions, the cost to construct n tuples is well approximatedFootnote 14 by \(n\cdot 2^{44}\).

Before going further, we mention that the same strategy works even if a key-schedule is considered (that is, if the subkeys are independent), since the texts generated in the previous way satisfied the required property w.r.t. the real subkeys \(k^{0}\) and \(k^{10}\).

5.3 The generic player

Here we analyze and present the (intuitively) best strategy that the generic player can use in order to find n tuples with the required properties, and the corresponding computational cost. Intuitively, the best strategy for this player is to choose tuples for which the condition on the plaintexts is fulfilled with probability 1. Then, the player asks the oracle for the corresponding ciphertexts. The idea is to check if there exists a subkey \(k^{10}\) and n tuples such that the two ciphertexts of each of these n tuples belong to the same coset of \({\mathcal {M}}_l\) one round before. We remember that it is not necessary that the subkey for which this condition is satisfied is the real one (similar for the plaintext). This process is repeated until the n tuples of pairs of texts that satisfy the required relation \({\mathcal {R}}\) are found.

Working on the Plaintexts. To do this, the generic player must fix a random key \({\hat{k}}\), and computes for a certain \(j \in \{0, ..., 3\}\) and for a random \(a\in {\mathcal {D}}_j^\perp \) the following set:

$$\begin{aligned} D_a := R^{-1}_{{\hat{k}}} ({\mathcal {D}}_j \oplus a). \end{aligned}$$
(9)

The idea is to choose plaintexts in the set \(D_a\) we just defined. In this way, the property on the plaintexts is (obviously) satisfied. The corresponding ciphertexts are simply got by oracle-queries. Since the cardinality of a coset of \({\mathcal {D}}_j\) is \(2^{32}\), the computation of a set \(D_a\) requires \(16\cdot 2^{32}=2^{36}\) S-Box look-ups for each coset \({\mathcal {D}}_j \oplus a\). Note that if the player needs more than \(2^{32}\) (plaintext, ciphertext) pairs, he simply chooses another \(a^\prime \in {\mathcal {D}}_j^\perp \) (or/and another j) and, using the same key \({\hat{k}}\), he computes the corresponding set \(D_{a^\prime }\) defined as before. We emphasize that the player must always use the same key \({\hat{k}}\) to compute these sets, in order to fulfill the property on the plaintexts.

Working on the Ciphertexts. As we have already seen, given a single tuple there exist on average \(2^{34}\) keys such that the two ciphertexts belong to the same coset of \({\mathcal {M}}_j\) one round before. To set up a meaningful distinguisher, a value of n is suitable if the number of oracle-queries (hence, the cost) of the generic player is higher than the cost of the shortcut player. By previous observations, given a set of n tuples, the probability that at least a key exists for which the property on the ciphertexts is satisfied is \(2^{-94n+128}\). Thus, the idea is to estimate the number of (plaintext, ciphertext) pairs that this player has to generate in order to win the game (that is, in order to find with high probability n tuples with the required property). If this number is higher than \(2^{44} \cdot n\) (for a fixed n), then the other player wins the game.

Since each set \(D_a\) defined as before contains \(2^{32}\) different plaintexts, it is possible to construct approximately \(2^{63}\) different couples \(\{(p^1, c^1), (p^2, c^2)\}\). Given t different sets \(D_a\), it is possible to construct \(s = 2^{63} \cdot t\) different couples. It follows that one can construct approximately

$$\begin{aligned} \left( {\begin{array}{c}s\\ n\end{array}}\right) \approx \frac{s^n}{n!} \end{aligned}$$

different sets of n different tuples (i.e. n different couples \(\{(p^1, c^1), (p^2, c^2)\}\)), where the previous approximation holds for \(n \ll s\). Since the probability that a set of n tuples satisfies the above defined relation \({\mathcal {R}}\) is \(2^{-94n+128}\), the generic player must consider at least s different couples such that \(s^n / n! \simeq 2^{94n-128}\) or equivalently

$$\begin{aligned} s \simeq 2^{94-\frac{128}{n}} \cdot (n!)^{\frac{1}{n}}. \end{aligned}$$
(10)

By this formula, for \(n=8\) this player has to consider approximately \(2^{79.9}\) different tuples, or equivalently \(2^{48.9}\) (plaintext, ciphertext) pairs (that is, \(2^{16.9}\) initial different sets \(D_a\)). In other words, given \(2^{16.9}\) initial different sets \(D_a\), it is possible to construct approximately \(2^{16.9} \cdot 2^{63} = 2^{79.9}\) different couples, that is approximately \(2^{624}\) different sets of 8 tuples. Since each one of these sets satisfies the required properties with probability \(2^{-94\cdot 8 + 128} =2^{-624}\), he has a good probability to find 8 different tuples that satisfy the required relation. The cost to generate these \(2^{48.9}\) (plaintexts, ciphertexts) pairs is of \(2^{48.9}\) oracle-queries (with the assumption 1 oracle-query \(\simeq \) 1 encryption), while the cost to generate these 8 tuples for the shortcut player is of \(8\cdot 2^{44} = 2^{47}\) (which is smaller). Since the cost of the generic player is higher than the cost of the shortcut player for each n s.t. \(n\ge 8\), we finally choose an (arbitrary) value of \(n=64\) in order to make the advantage of the shortcut player more significant.

Finally, in the case in which a key-schedule holds, the generic player has to repeat the previous strategy until the subkeys \(k^{0}\) and \(k^{10}\)—for which the texts satisfy the required property \({\mathcal {R}}\)—satisfy the key-schedule. Since a further property must be satisfied, the game becomes harder for the generic player (while, as we have seen before, nothing changes for the shortcut player).

6 New 12-round distinguisher of AES

As one of the major contributions of this paper, in this section we present the first known-key distinguisher for 12-round AES. This distinguisher is obtained by extending the previous 10-round distinguisher both at the end and at the beginning, or equivalently by extending two times at the end and at the beginning the 8-round known-key distinguisher presented in [19] and in Sect. 4.1.2. We highlight that this result provides a counterexample to the claims made in [16].

In the known-key distinguisher scenario, the players have to send to the verifier \(n\ge 2^{38}\) different tuples of (plaintext, ciphertext) pairs, that is \(\{(p^1_i, c^1_i), (p^2_i, c^2_i) \}\) for \(i = 0, ..., n-1\), such that:

  1. 1.

    there exist keys \(k^{0},k^{1}\) s.t. for each tuple there exists j for which the two plaintexts belong to the same coset of \({\mathcal {D}}_j\) after two rounds, that is

    $$\begin{aligned} \exists \, k^0,k^1 \, \text { s.t. } \, \forall i \in \{ 0, ..., n-1\}, \quad \exists j \in \{ 0, ..., 3\} \, \text { s.t. } \, R_{k^0, k^1}^2(p^1_i) \oplus R_{k^0, k^1}^2(p^2_i) \in {\mathcal {D}}_j; \end{aligned}$$
  2. 2.

    there exist keys \(k^{11}, k^{12}\) s.t. for each tuple there exists l for which the two ciphertexts belong to the same coset of \({\mathcal {M}}_l\) two rounds before, that is

    $$\begin{aligned} \exists k^{11}, k^{12} \text { s.t. } \forall i \in \{ 0, ..., n-1\}, \, \, \exists l \in \{ 0, ..., 3\} \text { s.t. } R_{k^{11}, k^{12}}^{-2}(c^1_i) \oplus R_{k^{11}, k^{12}}^{-2}(c^2_i) \in {\mathcal {M}}_l; \end{aligned}$$

where \(R_{k^0, k^1}^2(\cdot ) = R_{k^1}(R_{k^0}(\cdot ))\) and \(R_{k^{11}, k^{12}}^{-2}(\cdot ) = R^{-1}_{k^{11}}(R^{-1}_{k^{12}}(\cdot ))\).

As for the known-key distinguisher for 10-round AES, here we limit ourselves to present a known-key distinguisher for 12-round of AES with independent subkeys (that is, no key schedule is considered). However, using similar argumentation as before, we claim that the same distinguisher can be applied to the case in which the key schedule holds (in this case, nothing changes for the shortcut player, while the challenge becomes much harder for the other player).

The strategy used by the players and by the verifier is very similar to the one presented for the 10-round distinguisher in the case of no key-schedule. For this reason, we refer to the previous section for all the details, and we limit ourselves here to highlight the major differences.

6.1 The shortcut and the generic player

Exactly as before, the shortcut player can generate n tuples of texts with the required properties exploiting the Rebound attack, for a cost of \(n\cdot 2^{44}\) computations.

The generic player exploits a strategy similar to the one proposed for the 10-round distinguisher with no key-schedule. First he fixes random keys \({\hat{k}}^0, {\hat{k}}^1\) and \({\hat{k}}^{12}\), and using the keys \({\hat{k}}^0\) and \({\hat{k}}^1\), he computes the set \(D_a = R^{-1}_{{\hat{k}}^0}(R^{-1}_{{\hat{k}}^1}({\mathcal {D}}_j \oplus a))\). Similar to the previous case, the idea is to work with plaintexts in the same set \(D_a\). He then gets the corresponding ciphertexts by oracle-queries, and he simply decrypts them using the key \({\hat{k}}^{12}\). Then, using the same strategy proposed for the 10-round distinguisher, he can construct n tuples that satisfy the relation \({\mathcal {R}}\), that is he is able to find n tuples of texts for which a common key \(k^{11}\) exists such that the requirement on the ciphertexts is satisfied.

By analogous calculation as before, at least \(n\ge 8\) tuples are sufficient to set up a meaningful distinguisher when only the number of oracle-queries is considered.

6.2 The verifier

When the verifier receives the n tuples, she can work as in the case of the 10-round distinguisher in order to check if the required properties are satisfied or not. First of all, since there is no key schedule, the verifier can work independently on \(k^{0}, k^{1}\) (that is on the plaintexts) and on \(k^{11}, k^{12}\) (that is on the ciphertexts). Similarly to the previous case, also for this 12-round distinguisher the idea is to exploit a truncated differential key-recovery attack to find (if they exist) the four keys \(k^{0}, k^{1}\) and \(k^{11}, k^{12}\). Such truncated differential attack—presented on 4-round AES in [18] and recalled in Appendix D—is obtained by extending the attack presented in the previous section for the 10-round case.

In the following, we limit ourselves to work on the ciphertexts (analogous for the plaintexts). Given the first tuple and using the strategy described in Appendix D, the verifier first guesses eight bytes of the final subkey \(k^{12}\) (two diagonals) and partially decrypts the texts. Exploiting the same strategy proposed for the 3-round key-recovery attack described in Sect. 5.1, she finds \(2^{34}\) values for eight bytes of \(k^{11}\), for a total of \(2^{34}\cdot 2^{64} = 2^{98}\) candidates for eight bytes of \(k^{11}\) and of \(k^{12}\). Then, she eliminates wrong candidates by testing them against the other tuples of texts—to reduce the computational cost, she can work independently on each column of \(k^{11}\). Since the probability that the subkeys \(k^{11}\) and \(k^{12}\) satisfy the required property for another tuple of texts is \(4\cdot 2^{-32} = 2^{-30}\), using other four tuples the verifier finds approximately only one pair of subkeys \(k^{11}\) and \(k^{12}\) for which the property on the ciphertexts is satisfied (note \(2^{98} \cdot (2^{-30})^4 = 2^{-22}\)). The cost of this step is of \(2^{76}\) table look-ups (using the pre-computation phase)—see Appendix D or in [18] for details. The remaining eight bytes of \(k^{11}\) and of \(k^{12}\) can be found in a similar way.

As a result, given 5 different tuples, the total cost for this attack is approximately of \(4\cdot 2^{76} = 2^{78}\) table look-ups (using the pre-computation phase). When the verifier has found a possible candidate for the four subkeys, she checks that also the other \(n-5\) tuples satisfy the relation \({\mathcal {R}}\) for the found keys. In conclusion, given \(n\ge 5\) tuples, the total cost for the verifier can be approximated at \(2\cdot (2^{78} + 2^6 \cdot (n-5))\) table look-ups in order to find the four required subkeys. If \(n \ll 2^{72}\), then the computational cost of the verifier is approximately of \(2^{71.1}\) twelve-round encryptions.

6.3 About the “number \({\mathbf {n}}\) of different tuples of (plaintext, ciphertext) pairs”

Due to the previous analysis, the distinguisher can be considered meaningful for \(n\ge 8\). However, if \(n=8\) then the cost of the shortcut player (\(2^{47}\) computations) is much smaller than the cost of the verifier (\(2^{71.1}\) computations), which is not consistent with the given definition of known-key distinguisher (see Sect. 3.1). Indeed, by definition the verification cost must be less than the cost of the shortcut player (and so less than the cost of the generic player). In order to fulfill this condition, it is sufficient to choose a number of tuple n that satisfy the condition \(n\cdot 2^{44} \gg 2^{71.1}\) (and \(n\ll 2^{72}\)). It follows that a good (arbitrary) choice for this distinguisherFootnote 15 could be \(n \ge 2^{38}\).

In conclusion, to win the game, the two players have to send \(2^{38}\) tuples of (plaintext, ciphertext) pairs with the required properties. The cost for the shortcut player is of \(2^{82}\) computations, while the verification cost is of \(2^{71.1}\) computations.

7 Extending Gilbert’s distinguisher to 12-round AES

In this section, we show that Gilbert’s 10-round distinguisher can be extended to 12-round AES still exploiting the uniform distribution property. The main argumentation made in [16] about the impossibility of such extension regards the impossibility to efficiently check the relationship \({\mathcal {R}}\) when more than a single round is added a the beginning (resp. at the end) of the 8-round distinguisherFootnote 16. To solve this problem, we make use of the verification strategy proposed in Sect. 4.2.4.

In the following, we first formally define the 12-round distinguisher based on the uniform distribution property, and – after showing that \({\mathcal {R}}\) is efficiently checkable—we prove that this new 12-round distinguisher is meaningful using the same argumentation proposed in [16] for the 10-round case.

7.1 Gilbert’s distinguisher (based on uniform distribution property) on 12-round AES

Using Gilbert’s 10-round distinguisher as a starting point, a formal definition of the 12-round known-key distinguisher based on the uniform distribution property is given in the following. In the known-key distinguisher scenario, the players have to send to the verifier \(n\ge 4\)different sets of \(2^{64}\) (plaintext, ciphertext) pairs, that is \((p^j_i, c^j_i)\) for \(i = 0, ..., 2^{64}-1\) and \(j = 0, ..., n-1\), such that

  1. 1.

    there exist keys \(k^0,k^1\) such that for all \(j = 0, ..., n-1\) the texts \(\{R_{k^1}(R_{k^0}(p^j_i))\}_i\) are uniformly distributed among the cosets of \({\mathcal {D}}_I\) for each \(I\subseteq \{0,1,2,3\}\) with \(|I|=3\), or equivalently s.t. for all \(j = 0, ..., n-1\) the bytes of the texts \(\{R_{k^1}(R_{k^0}(p^j_i))\}_i\) are uniformly distributed;

  2. 2.

    there exist keys \(k^{11}, k^{12}\) such that for all \(j = 0, ..., n-1\) the texts \(\{R^{-1}_{k^{11}}(R^{-1}_{k^{12}}(c^j_i))\}_{i}\) are uniformly distributed among the cosets of \({\mathcal {M}}_J\) for each \(J\subseteq \{0,1,2,3\}\) with \(|J|=3\), or equivalently s.t. for all \(j = 0, ..., n-1\) the bytes of the texts \(\{R^{-1}_{k^{11}}(R^{-1}_{k^{12}}(c^j_i))\}_i\) are uniformly distributed.

As for Gilbert’s distinguisher, we assume that all subkeys are independent, that is no key-schedule holds. However, due to the same argumentation given before, we recall that the same distinguisher works exactly in the same way also in the case in which a key-schedule holds. Moreover, we emphasize that the keys \(k^0, k^1, k^{11}\) and \(k^{12}\) for which the previous properties are satisfied must be the same for all n sets of \(2^{64}\) texts. In other words, given n sets s.t. the first set satisfies the property of uniform distribution for some keys \(k^0, k^1\), the second for some other keys \({\hat{k}}^0, {\hat{k}}^1\) and so on, then the required properties can not be considered fulfilled.

As we are going to show, this distinguisher is meaningful for \(n\ge 2\): our choice of \(n=4\) is due in order to make the advantage of the shortcut player more significant. Using the same strategy proposed for the 8-round distinguisher and for the 10-round one, the cost of the shortcut player to construct the \(n = 4\) sets of \(2^{64}\) (plaintext, ciphertext) pairs with the required properties is of \(4\cdot 2^{64} = 2^{66}\) encryptions. In the following, we show that (1) the verification cost is smaller than \(2^{66}\) encryptions and that (2) the probability of victory of the shortcut player using \(2^{66}\) oracle queries is negligible.

7.2 The verifier

The main problem in order to extend Gilbert’s 10-round distinguisher regards the verification process. For this reason, here we analyze this problem in detail. In order to set up the verification process in an efficient way, we exploit the strategy that we proposed in Sect. 4.2.4:

  • in the first step, the verifier filters wrong keys by checking zero-sum property;

  • in the second step, the verifier checks if the uniform distribution property is satisfied only for those keys for which the balance property holds.

Integral attack. Consider the case in which the final MixColumns operation is omitted (if it is not omitted, it is sufficient to swap the final MixColumns and the final AddRoundKey) and the case in which the attacker looks for keys \(k^{11}, k^{12}\) (analogous for \(k^0,k^1\)). Using a classical integral attack with the extension at the end (see [13] for all the details) the verifier guesses 5 bytes of the keys, that is 4 bytes (i.e. one anti-diagonal) of the last subkey \(k^{12}\) and 1 byte of the subkey \(k^{11}\). She first uses the 4 bytes of the last key to partially decrypt one round, and then she simply checks that the zero-sum property holds 2-round before (working on each byte independently of the others).

The complete pseudo-code of this attack is given in Algorithm 4, where the verifier looks for 4 bytes of the last subkey \(k^{12}\) and (simultaneously) for 4 of the subkey \(k^{11}\). Working on 4 bytes of \(k^{11}\), the zero-sum property is satisfied with prob. \(2^{-32}\). Since the verifier tests \(2^{32}\) values of 4 bytes of \(k^{12}\) and \(2^{32}\) of 4 bytes of \(k^{11}\), we expect that using a single set (i.e. \(n=1\)) of \(2^{64}\) texts, then \(2^{64} \cdot 2^{-32} =2^{32}\) combinations of these 8 bytes pass this first test (analogous for the other 4 combinations of 8 bytes). The idea is to test these \(2^{32}\) combinations using a second set of texts (thus, \(n\ge 2\)). It follows that on average only 1 combination of these 8 bytes of the keys passes the test. Using the same procedure, the verifier is able to find keys \(k^{12}\) and \(k^{11}\) for which the zero-sum property is satisfied. Finally, as for the 10-round case, the idea is to work as described in Algorithm 1 in order to check that the uniform distribution property is satisfied.

Computational Cost of the Verifier. First of all, given 2 sets of \(2^{64}\) texts the verifier must compute the array A as defined in Algorithm 4 and Algorithm 1. The cost of this step is given by \(2 \cdot 2^{64}\) look-ups table (note that all these arrays can be computed and stored at the same time).

In order to compute the arrays B of Algorithm 4, the verifier must compute \(2^{32} \cdot 2^{32} \cdot 4 = 2^{66}\) S-Box and \(2^{32} \cdot 2^{32} \cdot 2 = 2^{65}\) look-ups table. Given the arrays B, the cost to find the candidates for 4 bytes of the key \(k^{12}\) and 4 bytes of key \(k^{11}\) is given by

$$\begin{aligned} \underbrace{2^{32}}_{\text { due to } k^{12}} \cdot \underbrace{ 4\cdot 2^8}_{\text { due to } k^{11}} \cdot 2^8 \cdot 2 = 2^{51} \text { S-Boxes} \end{aligned}$$

and \(2^{51}\) table look-ups. Assuming that the cost of 1 S-Box look-up is equivalent to the cost of 1 table look-upFootnote 17, the total cost of the verifier to compute Algorithm 4 and to find the entire keys (i.e. \(k^0, k^1\) and \(k^{11}, k^{12}\)) is given by \(2 \cdot 4 \cdot (2^{66} + 2^{65} + 2 \cdot 2^{51}) \simeq 2^{69.6}\) table look-ups, or equivalently \(2^{61.8}\) 12-round encryptions.

For the (few) candidates of the key that satisfy zero-sum, the verifier tests the uniform distribution property. Given n sets of texts, the total cost is well approximated by \(2^{69.6} + n\cdot 2^{64}\) table look-ups (note that the cost to check the uniform distribution property is basically negligible w.r.t. the cost of finding the keys for which the zero-sum is satisfied), that is \(2^{61.8} + n \cdot 2^{56.1}\) 12-round encryptions. It follows that the cost of the verifier is lower than the costs of the two players (which is higher than \(n\cdot 2^{64}\) encryptions).

Partial-sum attack. Finally, we mention that the verifier can use the “Partial-Sum” key-recovery attack [15, 30] in order to find the keys \(k^0,k^1\) and \(k^{11}, k^{12}\) (much) faster than using the interpolation attack described before. On the other hand, we highlight that this is out of the scope of the paperFootnote 18. Indeed, as we are going to show, the verification cost is already smaller than the costs of the two players using the integral attack, which means that the distinguisher can be set up using a classical integral attack for the verification phase.

figure d

7.3 The generic player: on the meaningfulness of this distinguisher

The last problem that we have to face regards the cost of the generic player. In particular, we have to show that the shortcut player can generate the n sets of \(2^{64}\) texts (with the required property) in a more efficient way than the generic player.

What is the probability that given a set of \(2^{64}\) texts there exist keys \({\hat{k}}^1\) and \({\hat{k}}^2\) such that the bytes of those texts after 2-round encryptions (resp. decryptions) are uniformly distributed? Due to similar calculation provided in Sect. 4.2 and since there are \((2^{128})^2 = 2^{256}\) different keys, this probability is equal to \(2^{256} \cdot p \simeq 2^{256} \cdot 2^{-7328.1} = 2^{-7072.1} \equiv 2^{-2^{12.78}}\) where p is defined in (6).

More generally, given \(2^{64}\) random pairs of texts, the probability that keys \(k^0, k^1\) and \(k^{11}, k^{12}\) exist for which the bytes of the plaintexts/ciphertexts are uniformly distributed after 2-round encryption/decryption is \(2^{512} \cdot 2^{-7328.1} = 2^{-6816.1} \equiv 2^{-2^{12.73}}\). Thus, similar to the 8-round case, it follows that \(2^{64} + 2^{12} \simeq 2^{64}\) random (plaintext, ciphertext) pairs are sufficient to have good probability to win the gameFootnote 19. In other words, \(n\cdot (2^{64} + 2^{12}) \simeq n\cdot 2^{64}\) oracle queries – with random plaintexts/ciphertets—are sufficient for the generic player in order to win the game (note that—as done in [16]—this number does not take into account the cost to find the required pairs of texts).

Thus, why is this distinguisher meaningful? As for the 8- and the 10-round cases, instead of focusing on the cost of the players and using similar argumentation to the ones proposed by Gilbert, we show that the probability of the generic player to win the game given \(n \ge 2\) sets of \(2^{64}\) texts is negligible.

To do this, we claim that this probability is upper bounded by the probability of the following “related” game. Assume \(n=2\) and consider 2 sets of \(2^{64}-d\) (plaintext, ciphertext) pairs for \(d\ge 5\), that is \((p^i, c^i)\) for each \(i=0, ..., 2^{64}-d-1\), with the following property: there is a set of keys \(k^0, k^1\) and \(k^{11}, k^{12}\)—which can correspond to the set of the entire keys—such that for each one of the two sets, the bytes of \(R_{k^1}\circ R_{k^0}(p^i)\) and of \(R^{-1}_{k^{11}} \circ R^{-1}_{k^{12}}(c^i)\) (that is 2-round encryption of \(p^i\) and the 2-round decryption of the ciphertexts) are “almost uniform” w.r.t. the definition given before. The goal of the player is to find \(2\cdot d\) texts such that the bytes of the \(2^{64}\) texts of each set after 2-round encryption/decryption are uniformly distributed. The conclusion follows immediately since—as we are going to show – this probability is upper bounded by \(2^{-25}\) and since this second game is “related” to the original one (as assumed in [16]).

More formally, and based on the same argumentation proposed by Gilbert—see proof of Prop. 4 of [16], it is possible to prove the following statement.

Proposition 1

For any oracle algorithm \({\mathcal {A}}\) that makes \(\le N=2\cdot 2^{64} = 2^{65}\) oracle queries to a perfect random permutation \(\varPi \) or \(\varPi ^{-1}\) of \(\{0, 1\}^{128}\), the probability that \({\mathcal {A}}\) outputs \(n\ge 2\) sets of \(2^{64}\)-tuple \((X_i, Y_i)\) for \(i=0, ..., 2^{64}-1\) that satisfy \(Y_i =\varPi (X_i)\) and also satisfy \({\mathcal {R}}\) defined previously is upper bounded by \(\left( {\begin{array}{c}10\\ 5\end{array}}\right) \times 2^{512} \times \left( \frac{5^{16}}{2^{128} - (2^{64}-5)} \right) ^6 \approx 2^{-25}\).

Proof

If at least one of the N pairs \((X_i, Y_i)\) output by \({\mathcal {A}}\) does not result from a query \(X_i\) to \(\varPi \) or a query \(Y_i\) to \(\varPi ^{-1}\), then the probability that for this pair \(Y_i = \varPi (X_i)\) and consequently the success probability of \({\mathcal {A}}\) is upper bounded by \(\frac{1}{2^{128} - (N - 1)}\). So from now on we only consider the opposite case, i.e. all the \((X_i, Y_i)\) result from queries to \(\varPi \) or \(\varPi ^{-1}\).

As we have already said, a set of N texts is uniformly distributed if any subset is “almost” uniformly distributed w.r.t. the definition given before. Following the same argumentation provided by Gilbert for the 10-round case, we consider 2 sets of \(2^{64}-5\) (plaintext, ciphertext) pairs which are “almost” uniformly distributed for a set of keys \(k^0, k^1\) and \(k^{11}, k^{12}\) after 2-rounds decryption/encryption, and we study the probability of the generic player to find the remaining \(2 \cdot 5 = 10\) pairs such that there exist keys \(k^0, k^1\) and \(k^{11}, k^{12}\) for which the bytes of the 2 corresponding sets of \(2^{64}\) are uniformly distributed after 2-round encryption/decryption.

As shown in [16, Prop. 6], for each one of the two sets the probability that 5 pairs satisfy this condition is upper bounded by \(\left( \frac{5^{16}}{2^{128} - (2^{64}-5)} \right) ^3\). Moreover, observe that the player does 10 oracle queries, which can be divided in \(\left( {\begin{array}{c}10\\ 5\end{array}}\right) = 252\) different sets of 5 elements. Since the \(2^{512}\) four subkeys are considered to be independent and must be equal for the two sets, one gets the claimed upper bound about the total probability of \(\left( {\begin{array}{c}10\\ 5\end{array}}\right) \times 2^{512} \times \left( \frac{5^{16}}{2^{128} - (2^{64}-5)} \right) ^6 \approx 2^{-25}\). \(\square \)

The same strategy applies for each \(n\ge 2\). In particular, if \(n=4\) the probability becomes

$$\begin{aligned}\left( {\begin{array}{c}20\\ 5\end{array}}\right) \times \left( {\begin{array}{c}15\\ 5\end{array}}\right) \times \left( {\begin{array}{c}10\\ 5\end{array}}\right) \times 2^{512} \times \left( \frac{5^{16}}{2^{128} - (2^{64}-5)} \right) ^{12} \approx 2^{-544.7}.\end{aligned}$$

8 Discussion of results and proposal for a “New” model

In this paper, we improve all the known-key distinguishers (or present conjectures for such known-key distinguishers) currently present in the literature for AES from 7 up to 10 rounds of AES and we set up the first known-key distinguishers on 12 rounds of AES, by extending distinguishers based on truncated differential trails and uniform distribution property (using the technique proposed by Gilbert in [16]).

In order to extend Gilbert’s distinguisher based on the uniform distribution property up to 12-round AES, we propose a different strategy that can be used by the verifier in order to check that the required relation \({\mathcal {R}}\) holds, and we present a formal proof which is based on the same argumentation proposed by Gilbert in order to justify the 8- and the 10-round distinguisher presented in [16]. For our new distinguishers using truncated-differential properties the situation is different: The problem to formally prove that no generic attack is better than those conjectured before remains open.

8.1 A “New” model: “Classical” known-key distinguisher

Taking a step back from the concrete results, what we also showed is that the gap between the known-key model and the chosen-key model may be even larger. Among the possibilities to remedy this counter-intuitive situation, we propose to define a new model that better capture the desire to have something “in-between” the chosen-key and the known-key model. Our proposal is to distinguish “classical” known-key distinguisher—where the verifier can directly verify the relation \({\mathcal {R}}\) on the plaintexts and ciphertexts without guessing any key material—and the “Gilbert” known-key distinguisher.

In particular, characterizing a meaningful—or non-trivial—known-key distinguisher for a concrete cipherEremains an open problem. Informally, a known-key distinguisher can be considered meaningful if the relation \({\mathcal {R}}\) (that defines it) has no “obvious connection” with the specification of E and is independent of the value of the key.

More generally, the relation \({\mathcal {R}}\) should not “extensively” re-use the operations that define E. Indeed, note that if one considers a relation\({\mathcal {R}}\)that depends on the details of the internal primitives\(E_K(\cdot )\), then any concrete implementable cipher (like the AES instantiated by a known key) can be trivially distinguished from an ideal cipher. For instance, consider the following straightforward distinguishability attack. Assume the goal is to distinguish if an oracle is instantiated by a cipher \(E_K(\cdot )\) or by an ideal cipher \(\varPi (K, \cdot )\) under a known/chosen key K. Given a query X, one gets Y (which can be \(Y = E_K(\cdot )\) or \(Y=\varPi (K, X)\)). Since the details of \(E_K(\cdot )\) and the key K are known, one can simply compute \(Y^\prime = E_K(X)\). If \(Y^\prime = Y\), one can conclude that the oracle is instantiated by \(E_K(\cdot )\).

“Classical” Known-Key Distinguisher: About the Relation\({\mathcal {R}}\). In order to achieve our goal, let us first introduce a set \({\mathfrak {D}}\) of distinguishers D defined as follows:

\({\mathfrak {D}}\) Set of Distinguishers::

\({\mathfrak {D}}\)denotes the set of all distinguishersDfor which the relation\({\mathcal {R}}\)(that defines it) has no “obvious connection” with the specification ofE (e.g. the details of the S-Box or/and of the round-constants etc.) and it is independent of the value of the key.

For a concrete example, note that a distinguisher that exploits the relation \(X {\mathcal {R}} Y\) defined as \(Y = E_X(X)\)—as the one presented before—does not belong to \({\mathfrak {D}}\). Indeed, one has to know e.g. the details of the S-Box in order to check the relation \({\mathcal {R}}\) previously defined. The same happens for Gilbert’s like distinguishers like [16, 10] and the ones introduced in this paper. Instead, a distinguisher that exploits the relation \((X_1, X_2) {\mathcal {R}} (Y_1, Y_2)\) as \(X_1 \oplus X_2 \in {\mathcal {X}}\) and \(Y_1 \oplus Y_2\in {\mathcal {Y}}\) for particular subspaces \({\mathcal {X}}\) and \({\mathcal {Y}}\) (equivalently, \(X_1\) and \(X_2\) are equal in certain bits/bytes/words—similar for \(Y_1\) and \(Y_2\)) belongs in \({\mathfrak {D}}\), since such a relation does not exploit any detail of \(E(\cdot )\). Moreover, note that in this last case, \(X = (X_1, X_2)\) and \(Y = (Y_1, Y_2)\) satisfy two independent properties, namely the fact that \(X_1 \oplus X_2 \in {\mathcal {X}}\) is independent from the fact that \(Y_1 \oplus Y_2\in {\mathcal {Y}}\) (for every non-trivial \(E(\cdot )\)).

A possible way to formally define the set \({\mathfrak {D}}\) is to fix the relation \({\mathcal {R}}\) in advance, as done e.g. in [27]. In particular, consider a cipher \(E_K(\cdot ): {\mathbb {F}}_2^N \rightarrow {\mathbb {F}}_2^N\) for a certain \(N \in {\mathbb {N}}\). In [27], authors limit themselves to work with the class of known-key distinguishers whose relation \({\mathcal {R}}\) is defined as follows: given \(\varphi , \psi \subseteq \{0, 1, ..., N-1\}\), the players have to send to the verifier n different (plaintext, ciphertext) pairs \((p_i, c_i)\) for \(i=0, ..., n-1\) such that

$$\begin{aligned} \text {Bit}_\varphi \bigl (p_0 \oplus ... \oplus p_{n-1}\bigl ) = 0 \qquad \text { and } \qquad \text {Bit}_\psi \bigl (c_0 \oplus ... \oplus c_{n-1}\bigl ) = 0 \end{aligned}$$
(11)

where \(\hbox {Bit}_\chi (x)\) outputs a string consisting of all bits of x whose index is in \(\chi \subseteq \{0, 1, ..., N-1\}\). This relation allows to cover several known-key distinguisher in the literature, including the zero-sum one initial proposed by Knudsen and Rijmen in [20] (recalled in Appendix C.1) and the one based on the truncated diff./rebound attack proposed in [17, 19, 25] (recalled in Sect. 4.1.1-4.1.2). Moreover, it satisfies the definition previously given, since such a relation is independent of the specification of E. On the other hand, not all known-key distinguishers in the literature can be simply described using the property (11), as for example the known-key distinguisher on 8-round AES based on the uniform distribution property proposed in [16] (and recalled in Sect. 4.2.1) or the ones based on linear cryptanalysis (e.g. [9]).

As a result, the problem to formalize—with a proper mathematical definition—the set\({\mathfrak {D}}\)of all distinguishersDfor which the relation\({\mathcal {R}}\)(that defines it) has no “obvious connection” with the specification of Eis still open for future research.

“Classical” known-key distinguisher indifferentiability. With this in mind, we can define what “classical known-key indifferentiability” is:

Definition 10

Let \(E : \{0,1\}^k \times \{0, 1\}^n \rightarrow \{0,1\}^n\) be a block cipher (where \((K, p) \mapsto c = E(K, p) = E_K(p)\)), and let \(\varPi \) an ideal block cipher. Let \(D\in {\mathfrak {D}}\) be a distinguisher with oracle access to a permutation and its inverse, and returning a single bit. The “classical known-key indifferentiability” (class)Inf-KK advantage of D is defined as

$$\begin{aligned} Adv^{\mathrm {(class)Inf\text{- }KK}}(D) =&\bigl | Prob\bigl [ K \xleftarrow []{\$} \{0,1\}^k; D^{E_K(\cdot ), E_K^{-1}(\cdot )}(K) = 1 \bigl ] +\\&- Prob\bigl [ K \xleftarrow []{\$} \{0,1\}^k; D^{\varPi (K, \cdot ), \varPi ^{-1}(K, \cdot )}(K) = 1 \bigl ] \bigl |. \end{aligned}$$

For integers \(q_D\) and t, the (class)Inf-KK advantage of E is defined as

$$\begin{aligned} Adv^{\mathrm {(class)Inf\text{- }KK}}(q_D, t) = \max _{D\in {\mathfrak {D}}} Adv^{\mathrm {(class)Inf\text{- }KK}}(D) \end{aligned}$$

where the maximum is taken over all distinguishers (for which the relations \({\mathcal {R}}\) (that define them) has no “obvious connection” with the specification of E) making at most \(q_D\) oracle queries and running in time at most t. E is a \((q, t, \varepsilon )\) (class)Inf-KK if \(Adv^{\mathrm {(class)Inf\text{- }KK}}(q_D, t)\le \varepsilon \).

We point out that this definition is similar to the “known-key indifferentiability” one—denoted \(\mathrm {Inf\text{- }KK}\) – proposed in [1, Def. 1]. In there, authors consider known-key distinguishers for block ciphers based on idealized primitives such as randomly drawn functions or/and permutations, e.g. an Even-Mansour construction \(EM_r : \{0,1\}^k \times \{0,1\}^n \rightarrow \{0,1\}^n\) defined as

$$\begin{aligned} EM_r(K, x) := k_r \oplus \pi (... \, \pi (k_1 \oplus \pi (x\oplus k_0) \, ...) \end{aligned}$$

for a fixed and ideal permutation \(\pi \) on n bits, where \(k_0, ... ,k_r\) denote the round keys derived from the master key K using some key-schedule.

However, here we point out the most important differences between these two definitions:

  1. 1.

    in Def. 1 of [1], the randomness is due to the fixed and ideal permutation \(\pi \) that defines \(EM_r(K, \cdot )\); conversely, note that in our definition the first probability does not contain any randomness, but there’s a time complexity involved inD;

  2. 2.

    since the “round function” \(\pi \) that defines \(EM_r(K, \cdot )\) is ideal in [1, Def. 1], the problem to formalize the set of distinguisher \({\mathfrak {D}}\) (previously discussed) does not arise in this case (roughly speaking, since \(\pi \) is a random permutation, it is not possible to have any “obvious connection” between it and the relation \({\mathcal {R}}\) that defines the distinguisher). On the other hand, the security against known-key distinguishers as defined in [1, Def. 1] is meaningless from a practical point of view, since in practice we deal with ciphers with fixed and known/public round functions (as for the AES case). From this point of view, our work is more practically oriented.

“Classical” Known-Key Distinguisher: Some (useful) Properties/Considerations. Finally, we point out that

  • if a cipher is a Strong PseudoRandom Permutation, then the ideal cipher in the (class)Inf-KK definition can be replaced by the encryption scheme instantiated with an unknown secret key;

  • if a cipher is (class)Inf-KK secure, then it is also SPRP secure.

More formally:

Definition 11

Let \(E : \{0,1\}^k \times \{0, 1\}^n \rightarrow \{0,1\}^n\) be a block cipher (where \((K, p) \mapsto c = E(K, p) = E_K(p)\)), and let \(\varPi \) be an ideal block cipher. Let D be a distinguisher with oracle access to a permutation and its inverse, and returning a single bit. The (Strong PseudoRandom Permutation) SPRP-advantage of D is defined as

$$\begin{aligned} Adv^{\mathrm {SPRP}}(D)= & {} \bigl | Prob\bigl [K \xleftarrow []{\$} \{0,1\}^k; D^{E_K(\cdot ), E_K^{-1}(\cdot )} = 1 \bigl ] \\&- Prob\bigl [K \xleftarrow []{\$} \{0,1\}^k; D^{\varPi (K, \cdot ), \varPi ^{-1}(K, \cdot )} = 1 \bigl ] \bigl |. \end{aligned}$$

For integers \(q_D\) and t, the SPRP-advantage of E is defined as

$$\begin{aligned} Adv^{\mathrm {SPRP}}(q_D, t) = \max _D Adv^{\mathrm {SPRP}}(D) \end{aligned}$$

where the maximum is taken over all distinguishers making at most \(q_D\) oracle queries and running in time at most t. E is a \((q, t, \varepsilon )-\)SPRP if \(Adv^{SPRP}(q_D, t)\le \varepsilon \).

Proposition 2

Let \(E : \{0,1\}^k \times \{0, 1\}^n \rightarrow \{0,1\}^n\) be a block cipher which satisfies the SPRP (“Strong Pseudo-Random Permutation”) definition. Then, E is \((q, t, \varepsilon )\) (class)Inf-KK if and only if

$$\begin{aligned} \max _{D \in {\mathfrak {D}}} \bigl | Prob\bigl [ K \xleftarrow []{\$} \{0,1\}^k; D^{E_K(\cdot ), E_K^{-1}(\cdot )}(K) = 1 \bigl ] - Prob\bigl [K \xleftarrow []{\$} \{0,1\}^k; D^{E_K(\cdot ), E_K^{-1}(\cdot )} = 1 \bigl ] \bigl | \le \varepsilon \end{aligned}$$

where the maximum is taken over all distinguishers making at most \(q_D\) oracle queries and running in time at most t.

Proposition 3

If \(E : \{0,1\}^k \times \{0, 1\}^n \rightarrow \{0,1\}^n\) is (class)Inf-KK secure, then it also satisfies the SPRP (“Strong Pseudo-Random Permutation”) definition.

Informally, if it is not possible to distinguish\(E_K(\cdot )\)from\(\varPi \)when the keyKis known, then it is not possible to distinguish them when the key is secret. Vice-versa this is not true in general. As example, the best secret-key distinguishers on AES (independent of the key) covers 5/6 rounds [24, 28], while the best known-key distinguisher covers 8 rounds (12 if one allows Gilbert’s strategy).