Abstract
\(\mathsf {Gimli}\) is a family of cryptographic primitives (both a hash function and an AEAD scheme) that has been selected for the second round of the NIST competition for standardizing new lightweight designs. The candidate \(\mathsf {Gimli}\) is based on the permutation \(\mathsf {Gimli}\), which was presented at CHES 2017. In this paper, we study the security of both the permutation and the constructions that are based on it. We exploit the slow diffusion in \(\mathsf {Gimli}\) and its internal symmetries to build, for the first time, a distinguisher on the full permutation of complexity \(2^{64}\). We also provide a practical distinguisher on 23 out of the full 24 rounds of \(\mathsf {Gimli}\) that has been implemented. Next, we give (full state) collision and semifree start collision attacks on \(\mathsf {Gimli}\)Hash, reaching, respectively, up to 12 and 18 rounds. On the practical side, we compute a collision on 8round \(\mathsf {Gimli}\)Hash. In the quantum setting, these attacks reach 2 more rounds. Finally, we perform the first study of linear trails in \(\mathsf {Gimli}\), and we find a linear distinguisher on the full permutation.
Introduction
\(\mathsf {Gimli}\) is a cryptographic permutation that was published at CHES 2017 [4]. It is also the core primitive of a submission to the NIST lightweight cryptography project [5] which is part of the 32 candidates that made it to the second round. It is intended to run well on a vast variety of platforms and contexts, from powerful processors supporting vector instructions to sidechannel protected hardware.
A cryptographic permutation is a versatile primitive which is easily used to construct a hash function (as originally intended for this type of object [7]). It was later shown that they can also be used to build authenticated ciphers [10], pseudorandom number generators [9], etc. In all such structures, the security of the cryptographic function relies on the properties of the permutation. In particular, it is assumed in the underlying security proofs that the permutation used behaves like a permutation picked uniformly at random—apart of course from the existence of a compact implementation, a property which should not be expected from a random object.
By definition, a cryptographic permutation does not have a key. Thus, we cannot define its security level using a game that relies on distinguishing a random permutation from a keyed instance with a random key. Still, since it should behave like a permutation picked uniformly at random, we can assess its security level by trying to identify properties that hold for the permutation studied but which should not be expected for one picked uniformly at random. In this context, cryptanalysts can reuse approaches originally intended for block cipher cryptanalysis (e.g., differential attacks [11]). In fact, given that no key material is involved, we can also borrow techniques from hash function cryptanalysis such as rebound attacks [29].
The aim is usually then to obtain inputs of the permutation satisfying a certain property using an algorithm which is more efficient than the generic one, i.e., the one that would work on a random permutation.
Our Contributions. In this paper, we complete the original security analysis of the designers of \(\mathsf {Gimli}\) by targeting both the permutation on its own, and the NIST candidate \(\mathsf {Gimli}\)Hash. Our results on the permutation are summarized in Fig. 1 (plain lines). In order to account for the different costs of the generic attacks, we divided the logarithm of the time complexity of our distinguishers by the logarithm of the time complexity of the corresponding generic distinguisher. In Fig. 1, a distinguisher is valid if the ratio is under 1.0. Previous attacks from the literature are represented with dotted lines. The complexities of all our attacks (included those against the hash function) are given in Table 1, along with all the results from the literature we are aware of.
Our main result is a distinguisher of the full 24round permutation with a cost of \(2^{64}\), while a similar generic distinguisher has a cost of \(2^{96}\). We also propose a distinguisher on 23 rounds that is practical, with a cost of \(2^{32}\), and has been successfully implemented. These distinguishers exploit internal symmetries that are encouraged by the round function. The 23round distinguisher could be extended by 1 round for free if the rounds were shifted.^{Footnote 1}
Using similar guessanddetermine ideas, we increase to 12 the number of rounds susceptible to collision attacks on \(\mathsf {Gimli}\)Hash. A reducedround version of this attack has been implemented. In the quantum setting, we obtain collisions up to 14 rounds. We also build semifree start collisions, i.e., we show how to find one internal state value and two different messages (thus not affecting the capacity part) that provide a collision on the capacity after applying the permutation. This attack is more efficient than a generic one for 18 rounds classically and up to 20 quantumly. As a side note, these results provide a new example where quantum attacks reach more rounds than classical ones, much like in [25]. We also find a staterecovery attack on the authenticated encryption \(\mathsf {Gimli}\)Cipher (which leads to a keyrecovery) up to 12 rounds, with only 3 blocks of data.
In addition, we provide the first extensive study of the linear properties of the round function of \(\mathsf {Gimli}\). We design a fullround linear distinguisher and study faster differentiallinear distinguishers on reducedround variants.
Our implementations (23round distinguisher, reducedround collision attack, search for linear trails) are available at this URL.^{Footnote 2}
Differences with [18]. This article is an extended version of the paper “New Results on Gimli: FullPermutation Distinguishers and Improved Collisions” which appeared in the proceedings of ASIACRYPT 2020 [18]. Our new contributions are the staterecovery attacks explored in Sect. 6 and the fullround linear distinguisher on \(\mathsf {Gimli}\) in Sect. 7. The rest of the paper (e.g., the distinguishers of Sect. 3 and the collision attacks of Sect. 4) is unchanged with respect to the ASIACRYPT version.
Organization of the paper. The organization of the paper is as follows. In Sect. 2 we provide the description of the \(\mathsf {Gimli}\) permutation and primitive, as well as previous known results. Section 3 provides the new distinguishers exploiting the internal symmetries that allow to distinguish the full permutation, and to build practical distinguishers up to 23 rounds. Section 4 presents improved collision and semifree start collision attacks, and Sect. 5 their quantum counterpart. In Sect. 6, we use similar methods to perform staterecovery attacks on reducedround versions of the AE scheme \(\mathsf {Gimli}\)Cipher. Section 7 presents our new results regarding statistical distinguishers, with new linear trails and differentiallinear attacks. We conclude the paper in Sect. 8 with a summary, a discussion on the impact of our results and a proposal of tweak that would mitigate their reach.
Preliminaries
In this section we describe the \(\mathsf {Gimli}\) permutation and we provide an overview of previous cryptanalysis results. The \(\mathsf {Gimli}\)Hash function is described directly in Sect. 4.
We adopt the following notations in this paper: \(\ll , \gg , \lll , \ggg \) represent, respectively, shift left, shift right, rotate left and rotate right operations. x, y, z will denote elements of \(\mathbb {F}_2^{32}\). SP is the 96bit SPBox. We denote \(x_i\) the \((i \mod 32)^{th}\) bit of x (\(x_{33} = x_{1})\) with \(x_0\) least significant (rightmost). We denote the output of the SP box as \(\mathrm {SP}(x,y,z) = (x', y', z')\) and \(\mathrm {SP}^2(x,y,z) = (x'',y'',z'')\).
The \(\mathsf {Gimli}\) Permutation
State Structure. We denote by S the 384bit \(\mathsf {Gimli}\) state, which is the concatenation of 4 columns of 96bit, that we denote A, B, C, D, where A is column number 0, and D is column number 3. Each column is cut into three 32bit words x, y, z which are denoted e.g., \(A_x, A_y, A_z\). Thus, the state is a \(4 \times 3 \times 32\) parallelepiped. We will speak of the x lane to denote the sequence or concatenation of words \(A_x, B_x, C_x, D_x\).
SPBox. The only nonlinear operation in \(\mathsf {Gimli}\) is the SPBox, which is applied columnwise. On input x, y, z, it updates the three words as follows:

1.
Rotate x and y: \(x \leftarrow x \lll 24, y \leftarrow y \lll 9\).

2.
Perform the following nonlinear operations in parallel (shifts are used rather than rotations):

\(x \leftarrow x \oplus (z \ll 1) \oplus ((y \wedge z) \ll 2)\),

\(y \leftarrow y \oplus x \oplus ((x \vee z) \ll 1)\),

\(z \leftarrow z \oplus y \oplus ((x \wedge y) \ll 3)\).


3.
Swap x and z: \((x,z) \leftarrow (z,x)\).
Rounds. \(\mathsf {Gimli}\) applies a sequence of 24 rounds numbered from 24 downto 1 inclusively. Each round applies an SPBox layer, then performs a swap (every two rounds, either a “big swap” or a small “small swap” as shown in Algorithm 1) and a constant addition (every four rounds). The constant at round i, if there is one, will be denoted \(\mathsf {rc}_i\) in what follows. In \(\mathsf {Gimli}\) we have: \(\mathsf {rc}_i = \texttt {0x9e377900} \oplus i\). Note that all the attacks studied in this paper are independent of the choice of round constants.
An algorithmic depiction of full \(\mathsf {Gimli}\) is given in Algorithm 1 and it is depicted in Fig. 10, where each wire represents a word.
Boolean Description of the SPBox Now we give a full description of the SP box using Boolean functions:

for \(x'\):
$$\begin{aligned} {\left\{ \begin{array}{ll} x'_0 = y_{23} + z_0\\ x'_1 = y_{24} + z_1 \\ x'_2 = y_{25} + z_2 \\ x'_i = y_{i9} + z_i + x_{i+5} y_{i12}, &{} 3 \le i \le 32 ~, \end{array}\right. } \end{aligned}$$(1) 
for \(y'\):
$$\begin{aligned} {\left\{ \begin{array}{ll} y'_0 = x_8 + y_{23}\\ y'_i = x_{i+8} + y_{i9} + x_{i+7} + z_{i1} + x_{i+7} z_{i1}, &{} 1 \le i \le 32 ~, \end{array}\right. } \end{aligned}$$(2) 
and for \(z'\):
$$\begin{aligned} {\left\{ \begin{array}{ll} z'_0 = x_8\\ z'_1 = x_9 + z_0\\ z'_i = x_{i+8} + z_{i1} + y_{i11} z_{i2}, &{} 2 \le i \le 32 ~. \end{array}\right. } \end{aligned}$$(3)
Description of the SP\(^{\mathbf{2}}\) Box. If \(x'_0 = y_{23} + z_0\) as in Equation (1) then it naturally holds that \(x''_0 = y'_{23} + z'_0\) and thus we can use Equations (2) and (3) to get the full formula. Here we write some of them:
The 2round probability 1 linear relation \(x''_0 + y''_0 + z''_0 = x_8\) follows.
Previous Work
We provide here a brief overview of the main previous thirdparty results of cryptanalysis against either the permutation or the NIST candidate \(\mathsf {Gimli}\). Notice that all the cryptanalysis previously considered were classical attacks, while in this paper, we will also give quantum attacks on reducedround \(\mathsf {Gimli}\)Hash. Let us point out that no search of linear trails was done prior to our work.
Zerosum permutation distinguishers on 14 rounds. In [15], Cai, Wei, Zhang, Sun and Hu present a zerosum distinguisher on 14 rounds of \(\mathsf {Gimli}\). This distinguisher uses the insideout technique and improves by one round the integral distinguishers given by the designers.
Structural permutation distinguisher on 22.5 rounds. In [24], Hamburg proposed the first thirdparty cryptanalysis of the \(\mathsf {Gimli}\) permutation, providing distinguishers on reducedround versions of the permutation. This analysis does not depend on the details of the SPBox, and is based only on the slow diffusion of \(\mathsf {Gimli}\). Thus, it follows a similar path as the distinguishers of Sect. 3. In his work, Hamburg defines a PRF with 192bit input x and 192bit key k that computes \(F(k, x) = \mathrm {trunc}_{192}(\mathsf {Gimli}(k \Vert x))\). He gives a distinguishing attack in time \(2^{64}\) for 15.5 rounds (omitting the final swap), and a keyrecovery attack on F when using 22.5 rounds of \(\mathsf {Gimli}\), precisely rounds 25 to 2.5 (omitting again the final swap). This attack runs in time \(2^{138.5}\) with a memory requirement of \(2^{129}\), which is faster than the expected \(2^{192}\), and thus shows that 22.5round \(\mathsf {Gimli}\) behaves differently than what could be expected from a random permutation.
Hamburg’s attacks are based on a meetinthemiddle approach, exploiting the slow diffusion by tabulating some of the values that are passed from an SPBox to another. The 15.5round distinguisher relies on a table of size \(2^{64}\), and the 22.5round attack on a table of size \(2^{128}\). None of these attacks are practical.
ZID Permutation Distinguishers. In an independent and simultaneous work posted on ePrint [34], Liu, Isobe, and Meier present a “hybrid zerointernal differential” (ZID) distinguisher on full \(\mathsf {Gimli}\), which extends a ZID distinguisher of previous unpublished work. The basic ZID distinguisher happens to be what we call an internal symmetry distinguisher, where states with symmetries are produced in the input and in the output of a reducedround variant of \(\mathsf {Gimli}\). A “hybrid” one adds a limited birthdaylike property (which is absent from our distinguishers). The steps that they take are, however, different from ours, as this distinguisher only spans 14 rounds. Compared with our analysis in Sect. 3, they will actually start from a much more constrained middle state, which limits the number of rounds by which one can extend the distinguisher afterward (or significantly increases the complexity). In contrast, we complete the middle state in multiple successive steps, each step ensuring that more rounds will be later covered. The ZID distinguisher targets 18 rounds of \(\mathsf {Gimli}\) with a negligible complexity. After the publication of our results, and personal communications, the authors updated their ePrint report [34]. They proposed to modify our distinguisher of Sect. 3 to reduce the complexity from \(2^{64}\) to \(2^{52}\).
Collisions and Preimages on GimliHash. In [39], Zong, Dong and Wang study \(\mathsf {Gimli}\) among other candidates of the competition. They present a 6round collision attack on \(\mathsf {Gimli}\)Hash of complexity \(2^{113}\), using a 6round differential characteristic where the input and output differences are active only in the rate. This differential characteristic was invalidated in [33].
In [32, 34] and [33], Liu, Isobe and Meier give collision and preimage attacks on reducedround \(\mathsf {Gimli}\)Hash. Their attacks rely on divideandconquer methods, exploiting the lack of diffusion between the columns, as did Hamburg, but they also rely on SPBox equations in order to attack the hash function itself. These equations are different from those that we will solve in Sect. 4, and they mostly relate the input and outputs of a single SPBox, whereas we study directly two SPBoxes. Their analysis is also much more precise, since they prove running times of solving these equations.
After giving a meetinthemiddle generic preimage attack of time and memory complexity \(2^{128}\), which sets a bound against the sponge construction used in \(\mathsf {Gimli}\)Hash, they give practical preimage attacks on 2round \(\mathsf {Gimli}\)Hash and practical collision attacks on 3round \(\mathsf {Gimli}\)Hash. They give a collision attack on 5round \(\mathsf {Gimli}\)Hash with a time complexity \(2^{65}\) and a second preimage attack with time complexity \(2^{96}\). They give in [34] a preimage attack on 5round \(\mathsf {Gimli}\)Hash. In [33], they give a semifree start collision attack on 8 rounds and a staterecovery attack on the AE scheme for 9 rounds.
On the Notion of Distinguisher
Some of our results are “distinguishers” targeting the \(\mathsf {Gimli}\) permutation itself. However, as there is a unique instance of this permutation, what is the meaning of “distinguishing” it from a random permutation?
As far as primitives are concerned, distinguishers are normally defined for keyed algorithms, and work as follows (in the case of a block cipher). First, an nbit block cipher \(E_{k}\) is instantiated with a key k picked uniformly at random from the relevant set, and a permutation P of \(\mathbb {F}_2^{n}\) is picked uniformly at random from the set of such permutations. Then, the attacker is given blackbox access to both algorithms and their task is to figure out which is which with a success probability greater than 1/2.
This notion of distinguisher breaks down when we look at cryptographic permutations. Indeed, since there is no key, the permutation \(\pi \) used is picked from a set of size 1, and thus, the attacker could simply query, for instance, P(0) and, if it is equal to \(\pi (0)\), win the game with overwhelming probability.
At the same time, it is also obvious that cryptographic permutations must satisfy specific requirements in order for them to be valid building blocks for secure hashing or AEAD. To take a trivial example: using the notion of distinguisher outlined above, it is not much harder to distinguish the Keccak permutations from random than it is to distinguish the identity from random. And yet, instantiating a sponge with each of these permutations yields hash functions with vastly different security levels.
As we can see, the problem of defining a distinguisher for a fixed permutation is in fact close to that of defining one for an open key cipher, i.e., a block cipher instantiation for which the key is known. A first notion to capture the meaning of “distinguisher” in this context was proposed by Gilbert in [19], namely that of Tintractable relation. Informally, a Tintractable relation \({\mathcal {R}}\) is such that it is infeasible to find a set \(X = (x_{0},\ldots ,x_{i})\) of inputs and a corresponding set \(Y = (P(x_{0}), \ldots , P(x_{i}))\) of outputs such that \(X~{\mathcal {R}}~Y\) in time less than T if P is a permutation picked uniformly at random. Of course, the definition of \({\mathcal {R}}\) cannot be chosen arbitrarily—otherwise, we could simply define \({\mathcal {R}}\) to be such that \(x ~{\mathcal {R}}~ E_{k}(x)\). For a keyed primitive, the relation should not depend on the key.
As a consequence, it is natural to rely on linear structures to define a relationship \({\mathcal {R}}\) that may be of interest when assessing the security level of a cryptographic permutation. The limited birthday [20, 26], originally intended for hash functions, is an approach in this direction. The idea is to find pairs of inputs \((x,x')\) such that the pair \((x \oplus x', H(x) \oplus H(x))\) lives in a vector space of a high dimension. If these pairs can be found faster than with a classical birthday search, then it is a distinguisher.
In this paper, we propose another type of distinguisher whereby we generate an input x such that (x, P(x)) lives in a specific affine space. As with the limited birthday, if this can be done faster than with a bruteforce approach, then it can be seen as a distinguisher.
More generally, we consider that a distinguisher for a permutation is an algorithm that can return (tuples of) input/output pairs in a specific affine space that is more efficient than the generic algorithm. As the aim of a cryptographic permutation is to yield complex and nonlinear relations between its input and output, we claim that this approach is the correct one to assess the security of a keyless cryptographic permutation.
Internal Symmetry Distinguishers against \(\mathsf {Gimli}\)
In this section, we present new distinguishers on the \(\mathsf {Gimli}\) permutation. Our distinguishers improve upon the best previously known ones, reaching the full 24round permutation. They are practical on 23 rounds and have been implemented. The results presented in this section do not exploit the specifics of the SPBox: they would work equally well if the SPBox was replaced with a permutation picked uniformly at random. Like all the other analyses presented in this paper, they do not depend on the values of the round constants.
Our distinguishers rely on internal symmetries. The general idea consists in identifying a specific form of symmetry (formally, a vector space) that is preserved by the round function under some circumstances, and then trying to craft an input for the permutation such that this symmetry traverses all the rounds so that the output has the same type of property.
In our case, we formalize the symmetry using the notion of 2identical states.
Definition 1
(2identical states) A state S is 2identical if \(B = D\), if \(A = C\), or if one of these properties holds up to a swap and a constant addition.
Our internal symmetries distinguisher aims at finding a 2identical input that is mapped to a 2identical output. Since there are 96 bits of constraint, a generic algorithm returning such an input should run in time \(2^{96}\) by evaluating the permutation on a set of inputs satisfying the property until the output matches it by chance. Our aim is to find more efficient algorithms in the case of \(\mathsf {Gimli}\).
This definition is similar to the one used in [16]. In fact, an internal symmetry distinguisher can be seen as a stronger variant of a limited birthday distinguisher of the type used in [16]. Indeed, we can build a limited birthday pair using our distinguisher: by producing a pair of inputs \(S, S'\) satisfying the internal symmetry property, we obtain \(S \oplus S' \in V_{in}\) and \(\varPi (S) \oplus \varPi (S') \in V_{out}\). Further, since the converse is not true, an internal symmetry distinguisher is strictly stronger.
From now on, \(S^{i}\) denotes the \(\mathsf {Gimli}\) state before round i.
23round Practical Distinguisher
We design an internal symmetry distinguisher on 23 rounds of \(\mathsf {Gimli}\), that is represented in Fig. 2, running in time equivalent to \(2^{32}\) evaluations of \(\mathsf {Gimli}\) on average. Algorithm 2 starts from a symmetric state in the middle and completes the state \(S^{11}\) in three steps. Each step assigns a value to more words of the state and ensures that the 2identical symmetry property traverses more rounds.
Each step of Algorithm 2 requires to evaluate a few SPBoxes \(2^{32}\) times (we do not even need to evaluate the inverse SPBox). The total amount of computations is smaller than \(2^{32}\) evaluations of 23round \(\mathsf {Gimli}\). Notice also that the algorithm uses only a small amount of memory. Our implementation of Algorithm 2 ran in less than one hour on a regular laptop.
The time complexity of the algorithm can be computed as follows: \(8\times 2^{32}\) SPBox evaluations for the first step, \(8\times 2^{32}\) for the second and \(16\times 2^{32}\) for the third, meaning a total of \(8\times 2^{32}+8\times 2^{32}+16\times 2^{32}=40 \times 2^{32}\) which is less than \(2^{32}\) evaluations of 23round \(\mathsf {Gimli}\) (each of them consisting essentially of 92 SPBox evaluations). This complexity is to be compared to that of the generic algorithm for obtaining our internal symmetry property, which costs \(2^{96}\).
Below, we provide an example of inputoutput pair that we obtained, with a 2identical input S that remains 2identical after \(\mathsf {Gimli}(23, 1)\):
Distinguisher on full \(\mathsf {Gimli}\) and Extensions
Here, we will describe how to extend the 23round distinguisher to the full \(\mathsf {Gimli}\) permutation, and even to more rounds. All these results are summarized in Fig. 1 from Sect. 1. An extension of our distinguisher to the full \(\mathsf {Gimli}\) is a trivial matter. Indeed, after running Algorithm 2, we obtain a 2identical input state \(S^{23} = A^{23}, B^{23}, C^{23}, D^{23}\) with \(A^{23} = C^{23}\). Then, if \(B^{23}_x = D^{23}_x\), which is a 32bit condition, the state remains 2identical after the inverse round 24. By repeating the previous procedure \(2^{32}\) times, we should find an input value that verifies the output property. The generic complexity of finding a 2identical input that generates a 2identical output is still \(2^{96}\). Thus, full \(\mathsf {Gimli}\) can be distinguished in time less than \(2^{32+32}=2^{64}\) full \(\mathsf {Gimli}\) evaluations, and constant memory.
An interesting question is: how many rounds of a \(\mathsf {Gimli}\)like permutation can we target? The distinguisher works mainly because the diffusion in \(\mathsf {Gimli}\) is somewhat slow. Thus, a possible fix would be to increase the number of swaps, for example by having one in each round instead of every two rounds. An attack exploiting this behavior that worked previously for r rounds would now a priori work for r/2 rounds only. Of course, the details of the SPbox could allow further improvement of these results given that a single iteration would now separate the swaps rather than a double.
Extending to 28 Rounds. It is trivial to adapt this distinguisher to an extended version of \(\mathsf {Gimli}\) with more rounds. The 2identicality of \(S^0\) is preserved after one round since the next round would apply only an SPBox layer and a small swap. Similarly, the 2identicality of \(S^{24}\) is preserved after 3 more inverse rounds since the next swap operation is a big swap which exchanges data between A and C only. Thus, our practical distinguisher works against \(\mathsf {Gimli}(23, 0)\) (a 24round version of \(\mathsf {Gimli}\) shifted by one round), and our extended distinguisher works against \(\mathsf {Gimli}(27, 0)\) (a 28round version of \(\mathsf {Gimli}\)).
Classical Collisions on ReducedRound \(\mathsf {Gimli}\)Hash
In this section, we describe collision attacks on \(\mathsf {Gimli}\)Hash when it is instantiated with a roundreduced variant of \(\mathsf {Gimli}\). Table 2 summarizes our results.
The \(\mathsf {Gimli}\)Hash Function
This function is built using the \(\mathsf {Gimli}\) permutation in a sponge construction [8], represented in Fig. 3.
\(\mathsf {Gimli}\)Hash (Algorithm 5) initializes the \(\mathsf {Gimli}\) state to the allzero value. The message is padded and separated into blocks of size \(r=128\), which corresponds to the rate r, introducing message blocks of 128 bits between two permutation applications by XORing them to the first 128 bits of the state. Once all the padded message blocks are processed, a 32byte hash is generated by outputting 16 bytes of the internal state, applying once more the permutation, and outputting 16 additional ones. In \(\mathsf {Gimli}\)Hash, the rate part is formed of the words \(A_x, B_x, C_x, D_x\) and the capacity part of \(A_{y,z}, B_{y,z}, C_{y,z}, D_{y,z}\).
We will consider two kinds of collision attacks:

Fullstate collision attacks: we will build pairs of twoblock messages \(M_0, M_1\) and \(M_0, M_1'\) such that the state after absorbing these pairs becomes again equal. Thus, one can append any sequence of message blocks after this and obtain the same hash.

Semifree start collision attacks: we will build pairs of (384bit) states \(S, S'\) such that S differs from \(S'\) only in a single x, and after r rounds of \(\mathsf {Gimli}\), \(\pi (S)\) and \(\pi (S')\) differ only in a single x as well. This does not yield a collision on the hash function as we would need to choose the value of the same initial state; however, it represents a vulnerability that may be used in the context of the \(\mathsf {Gimli}\) modes of operation. For example, in \(\mathsf {Gimli}\)cipher, the initial state contains a key of 256 bits and a nonce of 128 bits which is put in the x values. Then, each block of plaintext is handled in the same way as \(\mathsf {Gimli}\)hash. Thus, by XORing the right values before and after \(\pi \), one can create a key, a nonce and a pair of messages which yield the same tags.
SPBox Equations and How to Solve Them
All collision attacks in this section exploit the slow diffusion of \(\mathsf {Gimli}\) and the simplicity of the SPBox (contrary to the distinguishers on the permutation, which worked regardless of the SPBox used). In this section, we describe a series of “double SPBox equations”; solving them will be the main building block of our attacks. We define the following equations.
Number of Solutions. Except Equation (7), all these equations have on average, when the inputs are drawn uniformly at random, a single solution. However, the variance on the number of solutions depends on the equation considered. For example, only approx. \(6.2\%\) of inputs to Equation (8) have a solution, and they have on average 82.4 solutions each. Equation (10) gives a little more than 1.5 solutions. This variance is not a problem for us, as long as we can produce efficiently all solutions of the equations, which remains the case. In order to simplify our presentation, we will do as if equations (8), (9) and (10) always gave exactly a single solution for each input.
Solving the Equations. We use an offtheshelf SAT solver [38]. In some cases, more time seems spent building the SAT instance rather than solving it, and we believe that our current implementation is highly unoptimized.
The solver allows us to retrieve all solutions of a given equation (we treat Equation (7) differently because it has on average \(2^{32}\) of them). Let us consider the average time to produce a solution when random inputs are given. On a standard laptop, this time varies between approximately 0.1 milliseconds (Equation (8)) and 1 millisecond (Equation (10)). This difference mainly stems from the fact that Equation (8) often has no solutions and that the solver quickly finds a counterexample, while Equation (10) practically always has solutions that must be found.
On the same computer, an evaluation of the full \(\mathsf {Gimli}\) permutation (not reducedround) takes about 1 microsecond, so there is approximately a factor 1000 between computing \(\mathsf {Gimli}\) and solving a double SPBox equation.
We consider that all equations have approximately the same complexity and introduce a factor \(t_e\) that expresses the time taken to solve them in number of evaluations of \(\mathsf {Gimli}\) or a reducedround version (depending on the studied case).
Practical 8round Collision Attack
We consider 8 rounds of \(\mathsf {Gimli}\), e.g., rounds 21 to 14 included, and name \(\mathsf {Gimli}\)(21, 14) this reducedround permutation. We omit the last swap, because it has no incidence (it only swaps x values). The situation is represented in Fig. 4. As before, we name \(S^i\) the partial state immediately before round i.
Algorithm 3 finds on average a single solution, with any input state. There is some variance on the number of solutions, that is induced by the SPBox equations, but it is small in practice. Furthermore, we can eliminate the memory requirement by solving Equation (7) for many input random states. Starting from a given state, it suffices to apply one more \(\mathsf {Gimli}\) permutation with a random message block, in order to rerandomize the input.
Remark that if we omit the second step, then we already have a semifreestart collision attack, because we can reconstruct the inputs \(C^{21}\) and \(D^{21}\) immediately from the middle.
Practical Application: first step. In our practical computations, we considered rounds 21 to 14 included. We solved step 1, starting from 0, 0, 0, 0 and using a random message \(m_1, 0, 0, 0\) to randomize the first block. We also solved at the same time the two Equations (10) that enabled us to go back to \(A^{17}_x, B^{17}_x\).
We had to produce \(15582838652 \simeq 2^{33.86}\) solutions for Equation (7) until we found a solution for Step 1 and for both equations. We verified experimentally that each solution for Equation (7) yielded on average a solution for the final equation. We obtained in total 5 solutions (Table 3). There are two different solutions for \(A^{15}_x \oplus \mathsf {rc}_{16}\), which yield two and three solutions, respectively, for \(B^{17}_x\). The total computation ran in less than 5000 corehours. It was easy to run on many concurrent processes as this algorithm is trivial to parallelize.
Practical Application: second step. We solved step 2, that is, looking for \(C^{21}_x\), \(D^{21}_x\) that lead to one of the pairs \(A^{17}_x, B^{17}_x\). This step was much faster than the previous one, although it ought to have the same complexity: this is because we paid in step 1 the probability to find a solution (twice) in Equation (10), while in step 2 we benefited from having 5 different possible solutions. We found two solutions: \(C^{21}_x, D^{21}_x = \texttt {819b1392}, \texttt {9f4d3233}\) and \(C^{21}_x, D^{21}_x = \texttt {aa9f6f2d}, \texttt {3a6e613a}\).
Putting both Steps Together. With these solutions, we built two collisions on 8round \(\mathsf {Gimli}(21, 14)\). We start from \(m_1, 0, 0, 0\), then after one round, we inject the values \(A^{21}_x, B^{21}_x, C^{21}_x, D^{21}_x\) and \(A'^{21}_x, B^{21}_x, C^{21}_x, D^{21}_x\), respectively, in the rate; then we obtain two states that differ only on the xcoordinate of the third column (not the first, due to a big swap), and we inject two different blocks to cancel out this difference, obtaining the same state. The full state then collides, and we can append any message block that we want. The two collisions are given in Table 4.
Extending the Attack. Remark that the first step can be extended to span any number of \(SP^2\)boxes. However, each time we add two more rounds, there is one more branch coming from the B, C, D states which has to match an expected value, so we add a factor \(2^{32}\) in complexity. Since \(t_e \ll 2^{32}\), we can do that twice before meeting the bound \(2^{128}\). Thus, a collision on 12round \(\mathsf {Gimli}\)Hash can be built in time \(2^{96} \times 4 \times t_e\).
Semifree Start Collisions on Reducedround Gimli
We will now design semifree start collision attacks based on the same principle. This time, our goal is to obtain two input states \(S, S'\) that differ only in the rate (in practice, only in \(A_x\)) and such that after applying a reducedround \(\mathsf {Gimli}\), the output states differ only in the rate (the x values). They can also be seen as finding one state and two pairs of 2block messages such that after inserting both messages we obtain a collision. The previous “first step” remains the same, with an extension to whichever number of rounds we are targeting. The “second step” is changed, because we can now choose completely the columns B, C, D, e.g., by starting from the middle instead of having to choose only the input rate.
Doing this allows us to reach 4 rounds more for the same cost as before, as outlined in Fig. 5 and Algorithm 4. We can then append new rounds as before, reaching 16 rounds classically in time \(2^{96} \times 10 \times t_e\).
Another Improvement using Precomputations. We are going to win a factor \(2^{32}\) using \(2^{64} \times t_e\) precomputations and a table of size \(2^{64}\). This way, we can attack two more rounds. Indeed, once we have computed the first step, the two branches \(C_x^{17}\) and \(A_x^{13}\) contain arbitrary fixed values. Then, when we try to find the right C, we could have a table that for all \(C^{15}_y, C^{15}_z\), gives all inputoutput values for \(C^{17}\) and \(C^{14}\), and we could directly use this table to match the values \(C^{15}_x\) and \(D^{15}_x\) that come from D (instead of having to make a guess of \(C_z^{15}\).
Let us fix \(C_x^{17} = A_x^{13} = 0\). Thus, we repeat step 1 in Algorithm 4 a total of \(2^{64}\) times in order to have \(C_x^{17} = A_x^{13} = 0\). Step 1 now costs \(2^{96} \times t_e\).
The table that we precompute shall contain: for each \(x', x''\), all values (on average 1) of \(y', z'\) such that \(SP^2(0, *, *) = x', y', z'\) and \(SP^2(x'', y', z') = 0, *, *\).
Now, in Algorithm 4, for each guess of \(B_{y,z}^{19}\), and for each guess of \(D_{y,z}^{19}\), we can find the value of C that matches all the fixed branches in time 1, using this table. Thus, we can repeat this \(2^{96}\) times, extending the attack by 6 rounds.

Step 1 costs \(2 \times 2^{96} \times t_e\) (we solve only 2 equations most of the time, before aborting if the wanted “0” do not appear).

The table costs \(2^{64} \times t_e\), which is negligible

Step 2 costs \(2^{96} \times 5 \times t_e\), since it is the same as before, and we only need forward computation of SPBoxes to check if the full path is correct.
Note that we can get rid of the term \(t_e\) if we use a memory of size \(2^{96}\) to store the solutions of the SPBox equations. In that case, the overall time complexity is slightly below \(2^{96}\) evaluations of \(\mathsf {Gimli}{}\), since fewer SPBoxes are evaluated in each step than in the full primitive.
Better Quantum Collision Attacks
In this section, we explain how our attacks can be extended in the quantum setting, where even more rounds can be broken. We want to emphasize that, as our goal is simply to determine a security margin, we will not go into the details of the implementation of these attacks as quantum algorithms. We will only show how to use wellknown building blocks of quantum computing in order to build these new attacks, and show why they perform better than the corresponding generic quantum attacks. At this point, we assume that the reader is familiar with the basics of quantum computing that are covered in textbooks such as [36]. We define quantum algorithms in the quantum circuit model. The circuit starts with a set of qubits (elementary quantum systems) initialized to a basis state and applies quantum operations. The state of the system lies in a Hilbert space of dimension \(2^n\) if there are n qubits. Quantum operations are linear operators of this space, and a quantum circuit is built from such elementary operators coined quantum gates. The result of a quantum computation is accessed through measurement of the qubits, which destroys their state.
The cryptanalytic algorithms that we consider in this section do not require any form of query to a blackbox, since we want only to build a collision on the hash function. Thus, they do not require any more specific model (e.g., the Q2 model used in some works in quantum cryptanalysis).
Tools, Model and Complexity Estimates
Most of the collision attacks presented in this section rely on an exhaustive search. For example, consider the 8round attack of Algorithm 3. Both steps are exhaustive searches in spaces of size \(2^{32}\) that contain on average a single solution:

In the first step, we find \(A^{21}_x\) such that, after solving a sequence of SPBox equations, a 32bit condition is met: the first equation finds \(A'^{21}_x\) such that there is a collision in x after two SPBoxes, the second equation finds \(A^{19}_x\) such that there is a collision in x after two SPBoxes, etc., and the final 32bit condition is that \(A'^{13}_{z}\) and \(A^{13}_{z}\) must collide.

In the second step, we find the good \(C_x^{21}\) by guessing it and trying to match with a 32bit condition.
Quantumly, Grover’s algorithm [23] speeds up exhaustive search quadratically. Amplitude Amplification [13] is a powerful generalization which applies to any pair \({\mathcal {A}}, \chi \) such that:

\({\mathcal {A}}\) is a quantum algorithm without measurements (a unitary and reversible operation), that takes no input and produces an output \(x \in X\).

\(\chi ~: X \rightarrow \{0,1\}\) is a function that decides whether \(x \in X\) is a “good” output of \({\mathcal {A}}\) (\(\chi (x) = 1\)) or a “failure” of \({\mathcal {A}}\), such that \(\chi \) can also be implemented as a quantum algorithm.
Theorem 1
(Amplitude Amplification [13], informal) Let \({\mathcal {A}}\) be a quantum algorithm without measurements that succeeds with probability p and \(O_\chi \) be a quantum algorithm that tests whether an output of \({\mathcal {A}}\) is a failure or not. Then there exists a quantum algorithm that finds a good output of \({\mathcal {A}}\) using \(O(\sqrt{1/p})\) calls to \({\mathcal {A}}\) and \(O_\chi \).
Quantum Embeddings. Any classical algorithm admits a quantum embedding, that is, a quantum algorithm that returns the same results. Note that this is not a trivial fact, because a quantum algorithm without measurement is reversible.
Definition 2
Let \({\mathcal {A}}\) be a randomized algorithm with no input. A quantum embedding for \({\mathcal {A}}\) is a quantum algorithm \({\mathcal {A}}'\) that has no input, and the distribution over the possible outcomes of \({\mathcal {A}}'\) (after measurement) is the same as the distribution over possible outcomes of \({\mathcal {A}}\).
This quantum embedding admits similar time and space complexities, where classical elementary operations (logic gates) are replaced by quantum gates and classical bits by qubits. Generic timespace tradeoffs have been studied in [3, 28, 31], but precise optimizations are required in practice, where the bulk of the work comes from making the computation reversible. As we just want to compare costs with quantum generic attacks, the following fact will be useful.
Remark 1
The ratio in time complexities is approximately preserved when embedding classical algorithms into quantum algorithms.
For example, if a classical algorithm has a time complexity equivalent to 1000 evaluations of \(\mathsf {Gimli}\), we can consider that the corresponding quantum embedding has a time complexity equivalent to 1000 quantum evaluations of \(\mathsf {Gimli}\). In all quantum attacks, we will give quantum time complexities relatively to quantumly evaluating \(\mathsf {Gimli}\). In order to use Amplitude Amplification (Theorem 1), we simply need to define classical randomized algorithms for \({\mathcal {A}}\) and \(O_\chi \).
Example
We take the example of the classical 8round collision attack. Both steps run in classical time \(2^{32} \times 4 \times t_e\) by running \(2^{32}\) iterates of a randomized algorithm of time complexity \(4 \times t_e\). Using Amplitude Amplification, we obtain a corresponding quantum algorithm with time complexity approximately \(2^{16} \times 4 \times t_{qe}\), where \(t_{qe}\) is the time to solve quantumly an SPBox equation, relative to the cost of a quantum implementation of \(\mathsf {Gimli}\). As we remarked above, we can approximate \(t_{qe} \simeq t_e\).
This approximation comes from different factors:

a small constant factor \(\frac{\pi }{2}\) which is inherent to quantum search.

the tradeoffs between time and space in the detailed implementations of the primitive and its components. Let us simply notice that \(\mathsf {Gimli}\), compared to other primitives that have been studied in this setting, e.g., AES [27], seems fairly easy to implement using basic quantum computing operations. In the example of AES, the most costly component is the SBox [27], and \(\mathsf {Gimli}\) does not have such.
We are mainly interested in the security margin, and these approximations will be sufficient for us to determine whether a given algorithm runs faster or slower than the corresponding quantum generic attack. Thus, we will write that the quantum 8round attack on \(\mathsf {Gimli}\)Hash runs in time \(\simeq 2^{16} \times 4 \times t_e\).
Quantum Collision Bounds and Quantum Attacks
The best quantum generic attack for finding collisions depends on the computational model, more precisely, on the cost assigned to quantumaccessible memory. Different choices are possible, which are detailed, e.g., in [25]. In short, the overall cost of quantum collision search depends on the cost that is assigned to quantum hardware.
In this paper, we will simply consider the most conservative setting, where quantum memory is free. Note that this actually makes our attacks overall less efficient, since the generic algorithm is the most efficient possible (and they will also work in the other settings). In this situation, the best collision search algorithm is by Brassard, Høyer and Tapp [14]. It will find a collision on \(\mathsf {Gimli}\)Hash in approximately \(2^{256/3} \simeq 2^{85.3}\) quantum evaluations of \(\mathsf {Gimli}\), using a quantumaccessible memory of size \(2^{85.3}\).
Quantum collision attacks reaching more rounds than classical ones. In [25], Hosoyamada and Sasaki initiated the study of dedicated quantum attacks on hash functions. They remarked that quantum collision search does not benefit from a squareroot speedup (it goes from roughly \(2^{n/2}\) to \(2^{n/3}\) with the BHT algorithm, and the gain is even smaller in more constrained models of quantum hardware), while some collisionfinding procedures may have a better speedup, say, quadratic. Thus:

there may exist quantum collision attacks such that the corresponding classical algorithm is not an attack (it gets worse than the generic bound);

the quantum security margin of hash functions for collision attacks is likely to be smaller than the classical one.
Hosoyamada and Sasaki studied differential trails in the hash functions AESMMO and Whirlpool. Although our attacks are based on a different framework, we show that similar findings apply for \(\mathsf {Gimli}\).
Quantum Collision Attacks on \(\mathsf {Gimli}\)
We assume that \(t_e < 2^{20}\), hence solving an equation costs less than evaluating reducedround \(\mathsf {Gimli}\) \(2^{20}\) times, which is suggested by our computations, and should hold in the quantum setting as well.
Fullstate collisions. By adding another 32bit condition in the classical 12round collision attack, we obtain a procedure which runs classically in time \(4 \times 2^{128} \times t_e\), which is too high. However, using Amplitude Amplification, we obtain a procedure that runs in quantum time \(\simeq 4 \times 2^{64} \times t_e\) and reaches 14 rounds, with less complexity than the quantum collision bound.
Semifree start collisions. We can extend the 18round semifree start collision attack in the same way. Building the table will still cost a time \(2^{64}\). This table must be stored in a classical memory with quantum random access. The first step goes from \(2 \times 2^{96} \times t_e\) classically to approximately \(2 \times 2^{48} \times t_e\) quantumly. The second step does as well. Thus, adding a 32bit condition enables us to attack 20 rounds in quantum time \(2^{64} \times 4 \times t_e\).
StateRecovery Attacks on GimliCipher
In this section, we study staterecovery attacks on \(\mathsf {Gimli}\)Cipher. \(\mathsf {Gimli}\)Cipher uses the Duplex mode, where message blocks are XORed in the same place as in \(\mathsf {Gimli}\)Hash. The goal of a staterecovery attack is to recover the complete internal state, including the capacity. Once this is done, the Duplex is invertible and the key can also be recovered.
Since there are 256 bits of key in \(\mathsf {Gimli}\)Cipher, a meaningful staterecovery attack can have a complexity up to \(2^{256}\), although it will not necessarily contradict the security claims of [5], which go only up to 128 bits of security.
The current best attack is from [33], targeting 9 rounds in time \(2^{192}\) and memory \(2^{190}\).
Generic Principle. We will target reducedround variants of the permutation, starting, as before, from an intermediate round in order to leverage the rounddependent linear layer. Due to the Duplex mode, the value of the rate (the x words \(A_x, B_x, C_x, D_x\)) is known before and after each call to the permutation. Starting from any nonce, we inject zero messages and observe the results. Let \(S = (A, B, C, D)\) be the current state. Let \(\varPi \) be the reducedround permutation. Let \(S' = \varPi (S)\) and \(S'' =\varPi (S')\). This means that we know \(S_x = (A_x, B_x, C_x, D_x)\), \(S_x'\) and \(S_x''\).
Given \(S_x\) and \(S_x'\), there are on average \(2^{128}\) possibilities for \(S'\). Given \(S_x'\) and \(S_x''\), there are also \(2^{128}\) possibilities for \(S''\). Thus, our goal is to produce these two lists of (at least) \(2^{128}\) values and to find a collision between them. We expect a single collision to occur.
Strategy for 8 rounds. As we have seen before, the double SPBox equations allow to relate the input and outputs of a double SPBox. Thus, it makes sense to consider each double SPBox individually, and to focus on the number of values that it can take. Without any constraint, it can take \(2^{96}\) values. If we constrain a word in input or output, it can take \(2^{64}\) values, and so on.
We write 8 rounds of \(\mathsf {Gimli}\) as in Fig. 6, where singleword constraints are put on the beginning and the end. Since each double SPBox takes at most \(2^{96}\) values, we can start by writing down a list of these values for each double SPBox. We will then merge these lists together progressively. If we merge two lists of size \(2^{\ell }\) and \(2^{\ell '}\), and if the double SPBoxes have 2 words in common, then we obtain a list of size \(2^{\ell + \ell '  2\times 32}\).
The goal is to obtain a list of states that will match all the input and output constraints. This is graphically intuitive: in Fig. 6, we will progressively merge the nodes. When we merge two nodes of labels i and j, we obtain a node of label \(\max (i+j  m,0)\) where m is the number of edges between them. All outgoing edges of i and j are copied to the new node. We will end with a single node, that represents a list of possible states. The time and memory complexities of this procedure are roughly equal to the biggest list size encountered, that is, in \(\log _2\), 32 times the biggest label that was put on a node during the process.
Starting from Fig. 6, our strategy for 8 rounds is to first merge the nodes pairwise, obtaining Fig. 7.
Next, we merge A, B, C, D into single nodes and obtain a label 4. That is, thanks to the input and output constraints, we have only \(2^{128}\) choices for each column separately. Next, there are 4 edges between A and B, 4 edges between C and D, so we merge into \(2^{128}\) choices for A, B and C, D separately. There are 4 edges between the remaining two nodes, so we merge into a list of \(2^{128}\) possible states.
This gives a staterecovery attack of time and memory complexity \(2^{128}\) (up to a small constant factor). Note that at this point, we can assume that the double SPBox has been tabulated, and so, we do not need to solve equations anymore.
More Rounds. The complexity rapidly increases with the number of rounds. Once we represent them as in Fig. 6 and Fig. 7, we find that our merging strategies are rather limited.
As soon as we move to 10 rounds, the merging process produces nodes of label 6, that is, lists of size \(2^{192}\) (see Fig. 8). This seems to correspond to the 9round attack of [33], written differently (note that [33] starts from the first round of \(\mathsf {Gimli}\), and thus, there is a swap immediately after the first SPbox).
With a minor modification in the merging process, we obtain the same complexity for 12 rounds, where the final step gives Fig. 9.
With more rounds, that is, adding another layer of double SPBoxes, all the merging strategies that we tried produced nodes of label at least 8 (that is, a complexity \(2^{256}\)). We conjecture that this method cannot perform better.
Statistical Analyses of \(\mathsf {Gimli}\)
Linear Cryptanalysis
This section aims to provide the first analysis of the linear properties of the \(\mathsf {Gimli}\) permutation and its components. For this purpose, we used a bitoriented mixed integer linear programming (MILP) modelization of the state transformations of \(\mathsf {Gimli}\) constructed according to [1]. The resulting optimization problems were then solved with the SCIP solver [21, 22] to search for linear trails with optimal correlation.
Using this tool, we provide a rudimentary analysis of the linear approximation table of the double \(\mathsf {Gimli}\) SPbox, as well as constructing effective linear distinguishers for up to the full 24 rounds of the \(\mathsf {Gimli}\) permutation.
Linear trails of the (double) SPbox We begin by studying the linear trails of the SPBox. Since the \(\mathsf {Gimli}\) permutation mainly uses the composition of the SPBox with itself, we focus on the “double" SPBox SP\(^2\).
Let us consider that we apply the double SPbox to \(A = (x,y,z)\) to obtain \(A'' = (x'',y'',z'') = SP^2(x,y,z)\). We are interested in correlated linear approximations, that is, masks \(\alpha = (\alpha _x, \alpha _y, \alpha _z)\) and \(\beta = (\beta _x, \beta _y, \beta _z)\) for which
is as large (in absolute value) as possible. From Sect. 2.1 we already know that the relationship \(x_8 + x_0'' + y_0'' + z_0'' = 0\) always holds. This is a linear trail of the double SPbox with correlation 1, and it is the only one.
An automated MILPbased search for linear trails of correlation \(2^{1}\) and \(2^{2}\) shows that there exist at least 41 trails of the former kind and 572 of the latter, but this is not an exhaustive count. Although these approximations probably only account for a very small fraction of the possible ones, a more thorough study of the distribution of the different correlation values among all the trails would be of interest.
We have found no signs of significant linearhull effects (that is, of different highlybiased linear trails with the same input and output masks, as shown in [37]) within these linear approximations of the double SPbox, although since we have not considered every interesting linear trail, they might still exist for trails of lower correlation.
Some linear trails of roundreduced \(\mathsf {Gimli}\). In order to provide some linear trails for reducedround \(\mathsf {Gimli}\), we first focus on trails with only one active SPBox in each round, or more specifically, with masks which only cover one column in each round. They do not provide an upper bound on the correlation of more general trails, but we still think they could be of interest, and this restriction greatly limits the search space.
More specifically, we consider linear trails on powers of the SPbox such that the mask for the x word is zero every two rounds. This means that the mask is unaffected by the big and small swaps, and these trails easily translate into trails for the reducedround \(\mathsf {Gimli}\) construction with the same correlation.
We first look at iterative linear trails for the double SPbox so that both the input and output masks have the x word set to zero, which means they can be easily extended and the correlation is computed using the pilingup lemma from [35]. We find that the optimal correlation is \(2^{26}\), and this is the (maybe not unique) associated trail:
As this trail is iterative, we can construct 2lround trails with correlation \(2^{26l}\).
Next, we provide a similar iterative trail for four rounds with correlation \(2^{47}\), though we were unable to check optimality and other trails with larger correlation might exist within the same restrictions:
With this, we can construct trails of 4l rounds with correlation \(2^{47l}\).
We have also been able to construct an eightround iterative trail with correlation \(2^{67}\), which allows to construct trails of 8l rounds with correlation \(2^{67l}\):
We also performed some additional experiments to see whether there linear hull of the linear approximation based on these input and output masks has a larger linear potential than the one suggested by this single linear trail. We found that, in fact, there are 8 trails with correlation \(2^{67}\), 32 trails with correlation \(2^{68}\), 128 trails with correlation \(2^{69}\), 466 trails with correlation \(2^{70}\), and 1527 trails with correlation \(2^{71}\). This gives the linear approximation an estimated linear potential (ELP, see [37]) of at least
For comparison, the ELP if we consider a single trail would be \(2^{67 \cdot 2} = 2^{134}\). Additionally, when we consider iterations of this approximation, it is possible that even more trails appear.
For a small number of rounds, iterative trails are far from optimal. We now construct some nice trails for up to six rounds. We start with an optimal fourround trail with correlation \(2^{16}\):
We attempt to extend this trail at the top. There are no approximations for the double SPbox for which the output mask is the input mask of \(\varGamma _4\) and so that the input mask has the x word set to zero. However, by removing the last condition we can add two rounds with a \(2^{16}\) correlation:
This gives us a six round linear trail with correlation \(2^{32}\).
We now aim to construct an effective 24round distinguisher for the full permutation based on the 8round iterative linear trail. To this end, we find a prolongation of this linear trail by seven rounds at the top, which includes a swap. We obtain a 24round linear trail with correlation \(2^{189}\). Furthermore, because of the linear hull properties of the iterative linear trail, we know that the ELP of the linear approximation using the same input and output linear masks is at least \(2^{367.6}\).
In general, we provide the linear trails for up to 24 rounds of \(\mathsf {Gimli}\) shown in Table 5. We should note that we have not proven the optimality of these trails (in fact we consider it quite likely that they are not), as we have focused on a very specific subfamily of trails and we have not even shown optimality within that family for more than four rounds.
Our 24round linear approximation can be used to mount a distinguishing attack on the \(\mathsf {Gimli}\) permutation, which also works for the block cipher built with the Even–Mansour construction. It is possible to reduce the complexity slightly by using multiple linear cryptanalysis, as in [12]. The time complexity is equal to the data complexity, which is a number of known plaintextciphertext pairs inversely proportional to the ELP of \(2^{367.6}\). By considering the same approximation but in the four columns, we can mount a multiple attack with an increase in the capacity by a factor of four.
The problem of finding a linear approximation whose input and output masks lie on the x lane (that is, on the rate of the \(\mathsf {Gimli}\)Hash and \(\mathsf {Gimli}\)Cipher constructions) remains open.
DifferentialLinear Cryptanalysis
We now consider differentiallinear cryptanalysis, a technique that combines a differential trail and a linear trail built independently. The differentiallinear distinguishers of this section have the advantage of smaller complexities than the linear distinguishers presented above.
We use the approach of Leurent [30] where we actually split the cipher in three parts , with a differential trail in \(E_\top \), a linear trail in \(E_\bot \), and an experimental evaluation of the bias in . This gives a more accurate evaluation of the complexity. More precisely, we consider

a differential trail \(\delta _\text {in} \rightarrow \delta _\text {out}\) for \(E_\top \) with probability \(p = \Pr _X\big (E_\top (X) \oplus E_\top (X \oplus \delta _\text {in}) = \delta _\text {out}\big )\).

an experimental bias b from \(\delta _\text {out}\) to \(\beta \) for :

a linear trail \(\alpha \rightarrow \beta \) for \(E_\bot \) with correlation \(c = 2 \Pr _Y(\alpha \cdot Y = \beta \cdot E_\bot (Y)) 1\).
If the three parts are independent then we can estimate the bias of the differentiallinear distinguisher as:
Therefore, the complexity of the distinguisher is about \(2/p^2b^2c^4\).
In \(\mathsf {Gimli}\), there are no keys, so the assumption of independence does not hold, but experiments show that the computed bias is close to the reality. In practice, the best results are obtained when \(\delta _\text {out}\) and \(\alpha \) have a low hamming weight [30].
Differential Trail. We start by picking a trail that mainly follows the one given by the designers [4] with slight changes to optimize it for our number of rounds. We chose a trail with a difference pattern \(\delta _\text {out}\) with two active bits. A differential trail over 5 rounds with probability \(p=2^{28}\) is given in Table 6. We considered tradeoffs between the different phases, and it never seems to be worth it to propagate the trail any further.
Experimental Bias. Starting from the target difference pattern \(\delta _\text {out}\) at round 19, we experimentally evaluate the bias after a few rounds with all possible masks \(\alpha \) with a single active bit. Concretely, we choose the state at random, build the second state by adding \(\delta _\text {out}\) and observe the bias a few rounds later.
The most useful results are on the least significant bit \(z_0\) of the last word, where the probability of having a difference is smaller than 1/2. After computing 8 round, the probability of having an active difference on this bit in round 12 is \(\frac{1}{2}  2^{6.2}\), a correlation of \(b = 2^{5.2}\). After 9 rounds, at the end of round 11, there is a correlation of \(b = 2^{16.9}\). These probabilities are large enough to be experimentally significant after the \(2^{40}\) trials we have made.
Linear Trail. We use assisted tools to find good linear trails, starting from the mask corresponding to \(z_0\). The diffusion is not the same depending whether we start after round 12 or 11 so we show the best 3 rounds linear approximation for both case. We find a correlation c of \(2^{17}\) and \(2^{16}\), respectively, see Table 7.
Complexity of the distinguishers. We can combine the trails in different way to obtain distinguishers on 15, 16 or 17 rounds (starting from round 24).
 15 rounds:

We use 5 rounds for \(E_\top \), 8 rounds for , 2 rounds for \(E_{\bot }\). The corresponding complexity is \(2/pbc^2 = 2 \times 2^{2 \times 28} \times 2^{2 \times 5.2} \times 2^{4 \times 5} = 2^{87.4}\).
 16 rounds:

We use 5 rounds for \(E_\top \), 9 rounds for , 2 rounds for \(E_{\bot }\). The corresponding complexity is \(2/pbc^2 = 2 \times 2^{2 \times 28} \times 2^{2 \times 16.9} \times 2^{4 \times 5} = 2^{110.8}\).
 17 rounds:

We use 5 rounds for \(E_\top \), 9 rounds for , 3 rounds for \(E_{\bot }\). The corresponding complexity is \(2/pbc^2 = 2 \times 2^{2 \times 28} \times 2^{2 \times 16.9} \times 2^{4 \times 16} = 2^{154.8}\).
Those distinguishers can be used when the \(\mathsf {Gimli}\) permutation is used to build a block cipher with the Even–Mansour construction. Such a cipher should ensure a birthday bound security of up to \(2^{192}\) query, which is less efficient than our differentiallinear distinguisher if the number of rounds \(\mathsf {Gimli}\) is reduced to 17 (or fewer). Further improvement should be possible with the partitioning technique of [30], but we leave this to future work.
Conclusion
A common point of the results presented in this paper is that they exploit the relatively slow diffusion between the columns of the \(\mathsf {Gimli}\) state. This issue has trivial causes: swaps are effectively the identity for 256 out of the 384 bits of the internal state, and occur only every second round. Thus, the \(\mathsf {Gimli}\) SPBox is always applied twice, except at the first and last rounds. This means that the permutation can be viewed as an SPN with only 12 rounds, and with very simple linear layers. Meanwhile, the double SPBox is a rather simple function, and some of our attacks rely crucially on solving efficiently equations that relate its inputs and outputs.
Though our results do not pose a direct threat to the \(\mathsf {Gimli}\) NIST candidate, lowcomplexity fullround distinguishers on the permutation or reducedround attacks for a high proportion of the rounds (specially when not predicted by the designers) have been considered in some cases as an issue worth countering by proposing a tweak, as can be seen, for instance, in the modification proposed by the Spook team [2] to protect against the cryptanalysis results from [16]. In September 2020, after the results of [18] were made public, the NIST offered the submitters of secondround algorithms to propose status updates. In their document [6], the designers of \(\mathsf {Gimli}\) acknowledged the collision attacks of Sect. 4 but dismissed the distinguishers of Sect. 3, and did not introduce any change to their specification. Later on, \(\mathsf {Gimli}\) was not accepted among the finalists of the competition.
The \(\mathsf {Gimli}\) designers studied other linear layers instead of the swaps, like using an MDS or the linear transformation from SPARX [17], and they found some advantages in proving security against various types of attacks. On the other hand, they also found it unclear whether these advantages would outweight the costs. We believe our results show some light in this direction: the other variants that were considered seem a priori to be stronger regarding our analysis, though an extensive study should be performed.
We believe the symmetry distinguishers might still be improved by exploiting the properties of the SPBox, which we have not done yet.
In order to mitigate the attacks based on internal symmetries and guessanddetermine methods (including our distinguishers on the permutation) a simple fix would be to perform a swap at each round instead of every second round. This would, however, imply a renewed cryptanalysis effort.
Notes
This behavior appears because the linear layer of \(\mathsf {Gimli}\) is round dependent.
Note that the formulas given page 15 of the specification of \(\mathsf {Gimli}\) are erroneous. In the line \(z_n' \leftarrow z_n + y_n + (x_{n3} \wedge z_{n3})\), \(z_{n3}\) should be replaced by \(y_{n3}\) and \(x_j \wedge z_j\) must be replaced by \(x_j \wedge y_j\) in the subsequent formulas.
References
A. Abdelkhalek, Y. Sasaki, Y. Todo, M. Tolba, A.M. Youssef, MILP modeling for (large) sboxes to optimize probability of differential characteristics. IACR Trans. Symm. Cryptol. 2017(4), 99–129 (2017)
D. Bellizia, F. Berti, O. Bronchain, G. Cassiers, S. Duval, C. Guo, G. Leander, G. Leurent, I. Levi, C. Momin, O. Pereira, T. Peters, F.X. Standaert, B. Udvarhelyi, F. Wiemer, Spook: Spongebased leakageresistant authenticated encryption with a masked tweakable block cipher. IACR Trans. Symm. Cryptol. 2020(S1), 295–349 (2020)
C.H. Bennett, Time/space tradeoffs for reversible computation. SIAM J. Comput. 18(4), 766–776 (1989)
D.J. Bernstein, S. Kölbl, S. Lucks, P.M.C. Massolino, F. Mendel, K. Nawaz, T. Schneider, P. Schwabe, F.X. Standaert, Y. Todo, B. Viguier, Gimli : A crossplatform permutation. In: Fischer, W., Homma, N. (eds.) CHES 2017. LNCS, vol. 10529, pp. 299–320. Springer, Heidelberg (Sep 2017)
D.J. Bernstein, S. Kölbl, S. Lucks, P.M.C. Massolino, F. Mendel, K. Nawaz, T. Schneider, P. Schwabe, F.X. Standaert, Y. Todo, B. Viguier, Gimli. Submission to the NIST Lightweight Cryptography project. Available online https://csrc.nist.gov/CSRC/media/Projects/LightweightCryptography/documents/round1/specdoc/gimlispec.pdf. (2019)
D.J. Bernstein, S. Kölbl, S. Lucks, P.M.C. Massolino, F. Mendel, K. Nawaz, T. Schneider, P. Schwabe, F.X. Standaert, Y. Todo, B. Viguier, Gimli: NIST LWC secondround candidate status update. Available online https://csrc.nist.gov/CSRC/media/Projects/lightweightcryptography/documents/round2/statusupdatesep2020/gimli_update.pdf. (2020)
G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, Sponge functions. In: ECRYPT hash workshop (2007)
G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, On the indifferentiability of the sponge construction. In: Smart, N.P. (ed.) EUROCRYPT 2008. LNCS, vol. 4965, pp. 181–197. Springer, Heidelberg (Apr 2008)
G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, Spongebased pseudorandom number generators. In: Mangard, S., Standaert, F.X. (eds.) CHES 2010. LNCS, vol. 6225, pp. 33–47. Springer, Heidelberg (Aug 2010)
G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, Duplexing the sponge: Singlepass authenticated encryption and other applications. In: Miri, A., Vaudenay, S. (eds.) SAC 2011. LNCS, vol. 7118, pp. 320–337. Springer, Heidelberg (Aug 2012)
E. Biham, A. Shamir, Differential cryptanalysis of DESlike cryptosystems. Journal of Cryptology 4(1), 3–72 (1991)
A. Biryukov, C. De Cannière, M. Quisquater, On multiple linear approximations. In: Franklin, M. (ed.) CRYPTO 2004. LNCS, vol. 3152, pp. 1–22. Springer, Heidelberg (Aug 2004)
G. Brassard, P. Hoyer, M. Mosca, A. Tapp, Quantum amplitude amplification and estimation. Contemporary Mathematics 305, 53–74 (2002)
G. Brassard, P. Høyer, A. Tapp, Quantum cryptanalysis of hash and clawfree functions. In: Lucchesi, C.L., Moura, A.V. (eds.) LATIN 1998. LNCS, vol. 1380, pp. 163–169. Springer, Heidelberg (Apr 1998)
J. Cai, Z. Wei, Y. Zhang, S. Sun, L. Hu, Zerosum distinguishers for roundreduced Gimli permutation. In: Mori, P., Furnell, S., Camp, O. (eds.) Proceedings of the 5th International Conference on Information Systems Security and Privacy, ICISSP 2019, Prague, Czech Republic, February 2325, 2019. pp. 38–43. SciTePress (2019)
P. Derbez, P. Huynh, V. Lallemand, M. NayaPlasencia, L. Perrin, A. Schrottenloher, Cryptanalysis results on Spook  bringing fullround Shadow512 to the light. In: Micciancio, D., Ristenpart, T. (eds.) CRYPTO 2020, Part III. LNCS, vol. 12172, pp. 359–388. Springer, Heidelberg (Aug 2020)
D. Dinu, L. Perrin, A. Udovenko, V. Velichkov, J. Großschädl, A. Biryukov, Design strategies for ARX with provable bounds: Sparx and LAX. In: Cheon, J.H., Takagi, T. (eds.) ASIACRYPT 2016, Part I. LNCS, vol. 10031, pp. 484–513. Springer, Heidelberg (Dec 2016)
A. FlórezGutiérrez, G. Leurent, M. NayaPlasencia, L. Perrin, A. Schrottenloher, F. Sibleyras, New results on Gimli: fullpermutation distinguishers and improved collisions. In: Moriai, S., Wang, H. (eds.) ASIACRYPT 2020, Part I. LNCS, vol. 12491, pp. 33–63. Springer, Heidelberg (Dec 2020)
H. Gilbert, A simplified representation of AES. In: Sarkar, P., Iwata, T. (eds.) ASIACRYPT 2014, Part I. LNCS, vol. 8873, pp. 200–222. Springer, Heidelberg (Dec 2014)
H. Gilbert, T. Peyrin, Supersbox cryptanalysis: Improved attacks for AESlike permutations. In: Hong, S., Iwata, T. (eds.) FSE 2010. LNCS, vol. 6147, pp. 365–383. Springer, Heidelberg (Feb 2010)
A. Gleixner, M. Bastubbe, L. Eifler, T. Gally, G. Gamrath, R.L. Gottwald, G. Hendel, C. Hojny, T. Koch, M.E. Lübbecke, S.J. Maher, M. Miltenberger, B. Müller, M.E. Pfetsch, C. Puchert, D. Rehfeldt, F. Schlösser, C. Schubert, F. Serrano, Y. Shinano, J.M. Viernickel, M. Walter, F. Wegscheider, J.T. Witt, J. Witzig, The SCIP Optimization Suite 6.0. Technical report, Optimization Online (July 2018), http://www.optimizationonline.org/DB_HTML/2018/07/6692.html
A. Gleixner, M. Bastubbe, L. Eifler, T. Gally, G. Gamrath, R.L. Gottwald, G. Hendel, C. Hojny, T. Koch, M.E. Lübbecke, S.J. Maher, M. Miltenberger, B. Müller, M.E. Pfetsch, C. Puchert, D. Rehfeldt, F. Schlösser, C. Schubert, F. Serrano, Y. Shinano, J.M. Viernickel, M. Walter, F. Wegscheider, J.T. Witt, J. Witzig, The SCIP Optimization Suite 6.0. ZIBReport 1826, Zuse Institute Berlin (July 2018), http://nbnresolving.de/urn:nbn:de:0297zib69361
L.K. Grover, A fast quantum mechanical algorithm for database search. In: 28th ACM STOC. pp. 212–219. ACM Press (May 1996)
M. Hamburg, Cryptanalysis of 22 1/2 rounds of Gimli. Cryptology ePrint Archive, Report 2017/743 (2017), https://eprint.iacr.org/2017/743
A. Hosoyamada, Y. Sasaki, Finding hash collisions with quantum computers by using differential trails with smaller probability than birthday bound. In: Canteaut, A., Ishai, Y. (eds.) EUROCRYPT 2020, Part II. LNCS, vol. 12106, pp. 249–279. Springer, Heidelberg (May 2020)
M. Iwamoto, T. Peyrin, Y. Sasaki, Limitedbirthday distinguishers for hash functions  collisions beyond the birthday bound can be meaningful. In: Sako, K., Sarkar, P. (eds.) ASIACRYPT 2013, Part II. LNCS, vol. 8270, pp. 504–523. Springer, Heidelberg (Dec 2013)
S. Jaques, M. Naehrig, M. Roetteler, F. Virdia, Implementing grover oracles for quantum key search on AES and LowMC. In: Canteaut, A., Ishai, Y. (eds.) EUROCRYPT 2020, Part II. LNCS, vol. 12106, pp. 280–310. Springer, Heidelberg (May 2020)
E. Knill, An analysis of Bennett’s pebble game. CoRR arXiv:abs/math/9508218 (1995)
M. Lamberger, F. Mendel, M. Schläffer, C. Rechberger, V. Rijmen, The rebound attack and subspace distinguishers: Application to Whirlpool. Journal of Cryptology 28(2), 257–296 (2015)
G. Leurent, Improved differentiallinear cryptanalysis of 7round Chaskey with partitioning. In: Fischlin, M., Coron, J.S. (eds.) EUROCRYPT 2016, Part I. LNCS, vol. 9665, pp. 344–371. Springer, Heidelberg (May 2016)
R.Y. Levin, A.T. Sherman, A note on Bennett’s timespace tradeoff for reversible computation. SIAM J. Comput. 19(4), 673–677 (1990)
F. Liu, T. Isobe, W. Meier, Preimages and collisions for up to 5round GimliHash using divideandconquer methods. Cryptology ePrint Archive, Report 2019/1080 (2019), https://eprint.iacr.org/2019/1080
F. Liu, T. Isobe, W. Meier, Automatic verification of differential characteristics: Application to reduced Gimli. In: Micciancio, D., Ristenpart, T. (eds.) CRYPTO 2020, Part III. LNCS, vol. 12172, pp. 219–248. Springer, Heidelberg (Aug 2020)
F. Liu, T. Isobe, W. Meier, Exploiting weak diffusion of Gimli: A fullround distinguisher and reducedround preimage attacks. Cryptology ePrint Archive, Report 2020/561 (2020), https://eprint.iacr.org/2020/561
M. Matsui, Linear cryptanalysis method for DES cipher. In: Helleseth, T. (ed.) EUROCRYPT’93. LNCS, vol. 765, pp. 386–397. Springer, Heidelberg (May 1994)
M.A. Nielsen, I.L. Chuang, Quantum information and quantum computation. Cambridge: Cambridge University Press 2(8), 23 (2000)
K. Nyberg, Linear approximation of block ciphers (rump session). In: Santis, A.D. (ed.) EUROCRYPT’94. LNCS, vol. 950, pp. 439–444. Springer, Heidelberg (May 1995)
M. Soos, K. Nohl, C. Castelluccia, Extending SAT solvers to cryptographic problems. In: Kullmann, O. (ed.) Theory and Applications of Satisfiability Testing  SAT 2009, 12th International Conference, SAT 2009, Swansea, UK, June 30  July 3, 2009. Proceedings. Lecture Notes in Computer Science, vol. 5584, pp. 244–257. Springer (2009)
R. Zong, X. Dong, X. Wang, Collision attacks on roundreduced GimliHash/AsconXof/AsconHash. Cryptology ePrint Archive, Report 2019/1115 (2019), https://eprint.iacr.org/2019/1115
Acknowledgements
The authors would like to thank all the members of the cryptanalysis party meetings, for many useful comments and discussions, in particular many thanks to Anne Canteaut, Virginie Lallemand and Thomas Fuhr for many interesting discussions over previous versions of this work. Thanks to Donghoon Chang for finding some mistakes and inaccuracies, including an error in a 32round version of our distinguisher. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 714294  acronym QUASYModo).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Tetsu Iwata
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is an extended version of the paper “New Results on Gimli: FullPermutation Distinguishers and Improved Collisions” which appeared in the proceedings of ASIACRYPT 2020 [18].
This work was carried out while André Schrottenloher was at Inria.
Appendix
Appendix
SPBox Inverse
The SPBox is a bijective operation, but its inverse is difficult to write (and it is never used).

1.
Swap x and z

2.
Perform:^{Footnote 3}
$$\begin{aligned} x_0&\leftarrow x_0' \\ y_0&\leftarrow y_0' + x_0' \\ z_0&\leftarrow z_0' + x_0' + y_0' \\ x_1&\leftarrow x_1' + z_0 \\ y_1&\leftarrow y_1' + x_1' + z_0 + (x_0 \vee z_0) \\ z_1&\leftarrow z_1' + y_1' + x_1' + z_0 + (x_0 \vee z_0) \\ x_2&\leftarrow x_2' + z_1 + (y_0 \wedge z_0) \\ y_2&\leftarrow y_2' + x_2' + z_1 + (y_0 \wedge z_0) + (x_1 \vee z_1) \\ z_2&\leftarrow z_2' + y_2' + x_2' + z_1 + (y_0 \wedge z_0) + (x_1 \vee z_1) \\ \forall 3 \le i \le 32, x_i&\leftarrow x_i' + z_{i1} + (y_{i2} \wedge z_{i2}) \\ y_i&\leftarrow y_i' + x_i + (x_{i1} \vee z_{i1}) \\ z_i&\leftarrow z_i' + y_i + (x_{i3} \wedge y_{i3}) \end{aligned}$$ 
3.
Rotate x and y: \(x_i = x_{i+24 \mod 32}\) and \(y_i = y_{i+9 \mod 32}\)
GimliHash
Representation of Full Gimli
See Fig 10.
Rights and permissions
About this article
Cite this article
FlórezGutiérrez, A., Leurent, G., NayaPlasencia, M. et al. Internal Symmetries and Linear Properties: Fullpermutation Distinguishers and Improved Collisions on Gimli. J Cryptol 34, 45 (2021). https://doi.org/10.1007/s0014502109413z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s0014502109413z
Keywords
 \(\mathsf {Gimli}\)
 Symmetries
 Symmetric cryptanalysis
 Fullround distinguisher
 Collision attacks
 Linear approximations