Design Strategies for ARX with Provable Bounds: Sparx and LAX
Abstract
We present, for the first time, a general strategy for designing ARX symmetrickey primitives with provable resistance against singletrail differential and linear cryptanalysis. The latter has been a long standing open problem in the area of ARX design. The widetrail design strategy (WTS), that is at the basis of many Sbox based ciphers, including the AES, is not suitable for ARX designs due to the lack of Sboxes in the latter. In this paper we address the mentioned limitation by proposing the long trail design strategy (LTS) – a dual of the WTS that is applicable (but not limited) to ARX constructions. In contrast to the WTS, that prescribes the use of small and efficient Sboxes at the expense of heavy linear layers with strong mixing properties, the LTS advocates the use of large (ARXbased) SBoxes together with sparse linear layers. With the help of the socalled longtrail argument, a designer can bound the maximum differential and linear probabilities for any number of rounds of a cipher built according to the LTS.
To illustrate the effectiveness of the new strategy, we propose Sparx – a family of ARXbased block ciphers designed according to the LTS. Sparx has 32bit ARXbased Sboxes and has provable bounds against differential and linear cryptanalysis. In addition, Sparx is very efficient on a number of embedded platforms. Its optimized software implementation ranks in the top 6 of the most softwareefficient ciphers along with Simon, Speck, Chaskey, LEA and RECTANGLE.
As a second contribution we propose another strategy for designing ARX ciphers with provable properties, that is completely independent of the LTS. It is motivated by a challenge proposed earlier by Wallén and uses the differential properties of modular addition to minimize the maximum differential probability across multiple rounds of a cipher. A new primitive, called LAX, is designed following those principles. LAX partly solves the Wallén challenge.
Keywords
ARX Block ciphers Differential cryptanalysis Linear cryptanalysis Lightweight Widetrail strategy1 Introduction
ARX, standing for Addition/Rotation/XOR, is a class of symmetrickey algorithms designed using only the following simple operations: modular addition, bitwise rotation and exclusiveOR. In contrast to Sboxbased designs, where the only nonlinear elements are the substitution tables (Sboxes), ARX designs rely on modular addition as the only source of nonlinearity. Notable representatives of the ARX class include the stream ciphers Salsa20 [1] and ChaCha20 [2], the SHA3 finalists Skein [3] and BLAKE [4] as well as several lightweight block ciphers such as TEA, XTEA [5], etc. Dinu et al. recently reported [6] that the most efficient software implementations on small processors belonged to ciphers from the ARX class: Chaskeycipher [7] by Mouha et al., speck [8] by the American National Security Agency (NSA) and LEA [9] by the South Korean Electronic and Telecommunications Research Institute.^{1}
For the mentioned algorithms, the choice of using the ARX paradigm was based on three observations^{2}. First, getting rid of the table lookups, associated with SBox based designs, increases the resilience against sidechannel attacks. Second, this design strategy minimizes the total number of operations performed during an encryption, allowing particularly fast software implementations. Finally, the computer code describing such algorithms is very small, making this approach especially appealing for lightweight block ciphers where the memory requirements are the harshest.
Despite the widespread use of ARX ciphers, the following problem has remained open up until now.
Open Problem
Is it possible to design an ARX cipher that is provably secure against singletrail differential and linear cryptanalysis by design?
To the best of our knowledge, there has only been one attempt at tackling this issue. In [10] Biryukov et al. have proposed several ARX constructions for which it is feasible to compute the exact maximum differential and linear probabilities over any number of rounds. However, these constructions are limited to 32bit blocks. The general case of this problem, addressing any block size, has still remained without a solution.
More generally, the formal understanding of the cryptographic properties of ARX is far less satisfying than that of, for example, SBoxbased substitutionpermutation networks (SPN). Indeed, the widetrail strategy [11] (WTS) and the widetrail argument [12] provide a way to design Sbox based SPNs with provable resilience against differential and linear attacks. It relies on bounding the number of active SBoxes in a differential (resp. linear) trail and deducing a lower bound on the best expected differential (resp. linear) probability.
Our Contribution. We propose two different strategies to build ARXbased block ciphers with provable bounds on the maximum expected differential and linear probabilities, thus providing a solution to the open problem stated above.
The first strategy is called the Long Trail Strategy (LTS). It borrows the idea of counting the number of active SBoxes from the widetrail argument but the overall principle is actually the opposite to the widetrail strategy as described in [11]. While the WTS dictates the spending of most of the computational resources in the linear layer in order to provide good diffusion between small Sboxes, the LTS advocates the use of large and comparatively expensive SBoxes in conjunction with cheaper and weaker linear layers. We formalize this method and describe the longtrail argument that can be used to bound the differential and linear trail probabilities of a block cipher built using this strategy.
Using this framework, we build a family of lightweight block ciphers called Sparx. All three instances in this family can be entirely specified using only three operations: addition modulo \(2^{16}\), 16bit rotations and 16bit XOR. These ciphers are, to the best of our knowledge, the first ARXbased block ciphers for which the probability of both differential and linear trails are bounded. Furthermore, while one may think that these provable properties imply a performance degradation, we show that it is not the case. On the contrary, Sparx ciphers have very competitive performance on lightweight processors. In fact, the most lightweight version – Sparx64 is in the top 3 for 16bit microcontrollers according to the classification method presented in [6].
Finally, we propose the LAX construction, where bit rotations are replaced with a more general linear permutation. The bounds on the differential probability are expressed as a function of the branching number of the linear layer. We note that the key insight behind this construction has been published in [13], but its realization has been left as a challenge.
Outline. First, we introduce the notations and concepts used throughout the paper in Sect. 2. In Sect. 3, we describe how an ARXbased cipher with provable bounds can be built using an SBoxbased approach and how the method used is a particular case of the more general Long Trail Strategy. Section 4 contains the specification of the Sparx family of ciphers, the description of its design rationale and a discussion about the efficiency of its implementation on microcontrollers. The LAX structure is presented in Sect. 5. Finally, Sect. 6 concludes the paper.
2 Preliminaries
We use \(\mathbb {F}_2\) to denote the set \(\{0,1\}\). Let \(f: \mathbb {F}^n_2 \rightarrow \mathbb {F}^n_2\), \((a,b) \in \mathbb {F}^n_2 \times \mathbb {F}^n_2\) and \(x \in \mathbb {F}^n_2\). We denote the probability of the differential trail \((a \overset{d}{\rightarrow }b)\) by \(\text {Pr}[f(x) \oplus f(x \oplus a) = b]\) and the correlation of the linear approximation \((a \overset{\ell }{\rightarrow }b)\) by \(\big (2~\text {Pr}[a \cdot x = b \cdot f(x)]  1\big )\) where \(y \cdot z\) is the scalar product of y and z.
In an iterated block cipher, not all differential (respectively linear) trails are possible. Indeed, they must be coherent with the overall structure of the round function. For example, it is well known that a 2round differential trail for the AES with less than 4 active SBoxes is impossible. To capture this notion, we use the following definition.
Definition 1 (Valid Trail)
Let f be an nbit permutation. A trail \(a_0 \rightarrow ... \rightarrow a_r\) for r rounds of f is a valid trail if \(\text {Pr}[a_i \rightarrow a_{i+1}] > 0\) for all i in \([0,r1]\). The set of all valid rround differential (respectively linear) trails for f is denoted \(\mathcal {V}_{\delta }(f)^{r}\) (resp. \(\mathcal {V}_{\ell }(f)^{r}\)).
As designers, we thrive to provide upper bounds for both \(\textsc {MEDCP}(f^{r})\) and \(\textsc {MELCC}(f^{r})\). Doing so allows us to compute the number of rounds f needed in a block cipher for the probability of all trails to be too low to be usable. In practice, we want \(\textsc {MEDCP}(f^{r}) \ll 2^{n}\) and \(\textsc {MELCC}(f^{r}) \ll 2^{n/2}\) where n is the block size.
While this strategy is the best known, the following limitations must be taken into account by algorithm designers.
 1.
The quantities \(\textsc {MEDCP}(f^{r})\) and \(\textsc {MELCC}(f^{r})\) are relevant only if we make the Markov assumption, meaning that the differential and linear probabilities are independent in each round. This would be true if the subkeys were picked uniformly and independently at random but, as the master key has a limited size, it is not the case.
 2.
These quantities are averages taken over all possible keys: it is not impossible that there exists a weak key and a differential trail T such that the probability of T is higher than \(\textsc {MEDCP}(f^{r})\) for this particular key. The same holds for the linear probability.
 3.
These quantities deal with unique trails. However, it is possible that several differential trails share the same input and output differences, thus leading to a higher probability for said differential transition. This socalled differential effect can be leveraged to decrease the data complexity of differential attack. The same holds for linear attacks where several approximations may form a linear hull.
Still, this type of bound is the best that can be achieved in a generic fashion (to the best of our knowledge). In particular, this is the type of bound provided by the widetrail argument used in the AES.
3 ARXBased SubstitutionPermutation Network
In this section, we present a general design strategy for building ARXbased block ciphers borrowing techniques from SPN design. The general idea is to build a SPN with ARXbased Sboxes instead of with Sboxes based on lookup tables (LUT). The proofs for the bound on the MEDCP and MELCC are inspired by the widetrail argument introduced in the design of the AES [12]. However, because of the use of large SBoxes, the method used relies on a different type of interaction between the linear and nonlinear layers. We call the corresponding design strategy the long trail strategy. It is quite general and could be also applied in other contexts e.g. for nonarx constructions.
First, we present possible candidates for the ARXbased SBox and, along the way, identify the likely reason behind the choice of the rotation constants in SPECK32. Then, we describe the long trail strategy in more details. Finally, we present two different algorithms for computing a bound for the MEDCP and MELCC of block ciphers built using a LT strategy. We also discuss how to ensure that the linear layer provides sufficient diffusion.
3.1 ARXBoxes
Definition 2
( ARX box ). An ARXbox is a permutation on m bits (where m is much smaller than the block size) which relies entirely on addition, rotation and XOR to provide both nonlinearity and diffusion. An arxbox is a particular type of SBox.
Maximum expected differential characteristic probabilities (MEDCP) and maximum expected absolute linear characteristic correlations (MELCC) of Marx2 and Speckey (\(\log _2\) scale); r is the number of rounds.
r  1  2  3  4  5  6  7  8  9  10  

Marx2  \(\textsc {MEDCP}(M^{r})\)  \(0\)  \(1\)  \(3\)  \(5\)  \(11\)  \(16\)  \(22\)  \(25\)  \(29\)  \(35\) 
\(\textsc {MELCC}(M^{r})\)  \(0\)  \(0\)  \(1\)  \(3\)  \(5\)  \(8\)  \(10\)  \(13\)  \(15\)  \(17\)  
Speckey  \(\textsc {MEDCP}(S^{r})\)  \(0\)  \(1\)  \(3\)  \(5\)  \(9\)  \(13\)  \(18\)  \(24\)  \(30\)  \(34\) 
\(\textsc {MELCC}(S^{r})\)  \(0\)  \(0\)  \(1\)  \(3\)  \(5\)  \(7\)  \(9\)  \(12\)  \(14\)  \(17\) 
3.2 Naive Approaches and Their Limitations
A very simple method to build ARXbased ciphers with provable bounds on MEDCP and MELCC is to use a SPN structure where the Sboxes are replaced by ARX operations for which we can compute the MEDCP and MELCC. This is indeed the strategy we follow but care must be taken when actually choosing the ARXbased operations and the linear layer.
Let us for example build a 128bit block cipher with an SBox layer consisting in one iteration of Speckey on each 32bit word and with an MDS linear layer, say a multiplication with the MixColumns matrix with elements in \(GF(2^{32})\) instead of \(GF(2^8)\). The MEDCP bound of such a cipher, computed using a classical widetrail argument, would be equal to 1! Indeed, there exists probability 1 differentials for 1round Speckey so that, regardless of the number of active SBoxes, the bound would remain equal to 1. Such an approach is therefore not viable.
As the problem identified above stems from the use of 1round Speckey, we now replace it with 3round Speckey where the iterations are interleaved with the addition of independent round keys. The best linear and differential probabilities are no longer equal to 1, meaning that it is possible to build a secure cipher using the same layer as before provided that enough rounds are used. However, such a cipher would be very inefficient. Indeed, the MDS bound imposes that 5 arxboxes are active every 2 rounds, so that the MEDP bound is equal to \(p_d^{5r/2}\) where r is the number of rounds and \(p_d\) is the best differential probability of the arxbox (3rounds Speckey). To push the bound below \(2^{128}\) we need at least 18 SPN rounds, meaning 54 parallel applications of the basic arxround! We will show that, with our alternative approach, we can obtain the same bounds with much fewer rounds.
3.3 The Long Trail Design Strategy
Informed by the shortcomings of the naive design strategies described in the previous section, we devised a new method to build ARXbased primitives with provable linear and differential bounds. It is based on the following observation.
Observation 1
For example, if we look at Speckey, the MEDCP for 3 rounds is \(2^{3}\) and that of 6 rounds is \(2^{15}\) which is far smaller than \((2^{3})^2 = 2^{6}\) (see Table 1). Similarly, the MELCC for 3 rounds is \(2^{1}\) and after 6 rounds it is \(2^{7} \ll (2^{1})^2\).
Bound on the number of active 8bit SBoxes in a differential (or linear) trail for the AES.
# R  1  2  3  4  5  6  7  8  9  10 
# Active SBoxes  1  5  9  25  26  30  34  50  51  55 
Definition 3 (Long Trail)
We call Long Trail (LT) an uninterrupted sequence of calls to an arxbox interleaved with key additions. No difference can be added into the trail from the outside. Such trails can happen for two reasons.
 1.
A Static Long Trail occurs with probability 1 because one output word of the linear layer is an unchanged copy of one of its input words.
 2.
A Dynamic Long Trail occurs within a specific differential trail because one output word of the linear layer consists of the XOR of one of its input words with a nonzero difference and a function of words with a zero difference. In this way the output word of the linear layer is again equal to the input word as in a Static LT, but here this effect has been obtained dynamically.
Definition 4 (Long Trail Strategy)
The Long Trail Strategy is a design guideline: when designing a primitive with a rather weak but large SBox (say, an ARXbased permutation), it is better to foster the existence of long trails rather than to have maximum diffusion in each linear layer.
This design principle has an obvious caveat: although slow, diffusion is necessary! Unlike the WTS, in this context it is better to trade some of the power of the diffusion layer in favor of facilitating the emergence of long trails.
The long trail strategy is a method for building secure and efficient ciphers using a large but weak SBox S such that we can bound the MEDCP (and MELCC) of several iterations of \(x \mapsto S(x \oplus k)\) with independent round keys. In this paper, we focus on the case where S consists of arx operations but this strategy could have broader applications such as, as briefly discussed above, the design of block ciphers operating on large blocks using the AES round function as a building block.
The long trail approach minimizes the amount of resources spent in the linear layer and does spend most of the resources on large SBoxes. Still, as discussed in the next section, the method used to bound the MEDCP and MELCC in the long trail strategy is heavily inspired by the one used in the wide trail strategy.Instead of spending most of the resources on large Sboxes, the wide trail strategy aims at designing the round transformation(s) such that there are no trails with a low bundle weight. In ciphers designed by the wide trail strategy, a relatively large amount of resources is spent in the linear step to provide high multipleround diffusion.
Long TrailBased Bounds. In what follows we only discuss differential long trails for the sake of brevity. Linear long trails are treated identically.
Definition 5 (Truncated LT Decomposition)
Consider a cipher with a round function operating on w words. A truncated differential trail is a sequence of values of \(\{0,1\}^w\) describing whether an SBox is active at a given round. The LT Decomposition of a truncated differential trail is obtained by grouping together the words of the differential trails into long trails and then counting how many active long trails of each length are present. It is denoted \(\{t_i \}_{i \ge 1}\) where \(t_i\) is equal to the number of truncated long trails with length i.
Example 1
Consider a 64bit block cipher using a 32bit SBox, one round of Feistel network as its linear layer and 4 steps without a final linear layer. Consider the differential trail \((\delta ^L_0, \delta ^R_0) \rightarrow (\delta ^L_1, \delta ^R_1) \rightarrow (0, \delta ^R_2) \rightarrow (\delta ^L_3, 0)\) (see Fig. 3 where the zero difference is dashed). Then this differential trail can be decomposed into 3 long trails represented in black, blue and red: the first one has length 1 and \(\delta _0^R\) as its input; the second one has length 2 and \(\delta _0^L\) as its input; and the third one has length 3 and \(\delta _1^L\) as its input so that the LT decomposition of this trail is \(\{t_1 = 1, t_2 = 1, t_3 = 1\}\). Using the terminology introduced earlier, the first two trails are Static LT, while the third one is Dynamic LT.
Theorem 1 (Long Trail Argument)
Proof
Let \(\varDelta _{i, s} \overset{d}{\rightarrow }\varDelta _{j, s+1}\) denote any differential trail occurring at the SBox level in one step, so that the SBox with index i at step s sees the transition \(\varDelta _{i, s} \overset{d}{\rightarrow }\varDelta _{j, s+1}\). By definition of a long trail, we have in each long trail a chain of differential trails \(\varDelta _{i_0, s_0} \overset{d}{\rightarrow }\varDelta _{i_1, s_0+1} \overset{d}{\rightarrow }... \overset{d}{\rightarrow }\varDelta _{i_t, s_0+t}\) which, because of the lack of injection of differences from the outside, is a valid trail for t iterations of the SBox. This means that the probability of any differential trail following the same sequence of Sboxes as in this long trail is upperbounded by \(\textsc {MEDCP}(S^{t})\). We simply bound the product by the product of the bounds to derive the theorem.\(\square \)
3.4 Choosing the Linear Layer: Bounding the MEDCP and MELCC while Providing Diffusion
In order to remain as general as possible, in this section we do not consider the details of a specific SBox but instead we focus on fleshing out design criteria for the linear layer. All the information for the SBox that is necessary to follow the explanation is the MEDCP and MELCC of its rfold iterations including the key additions e.g. the data provided in Table 1 for our arxbox candidates.
As the linear layers we consider may be weaker than usual designing spn, it is also crucial that we ensure that ciphers built using such a linear layer are not vulnerable to integral attacks [18], in particular those based on the division property [19]. Incidentally, this gives us a criteria quantifying the diffusion provided by several steps of the cipher.
In this section, we propose two methods for bounding the MEDCP and MELCC of several steps of a block cipher. The first one is applicable to any linear layer but is relatively inefficient, while the second one works only for a specific subset of linear layers but is very efficient.
When considering truncated differential trails, it is hard to bound the probability of the event that differences in two or more words cancel each other in the linear layer i.e. the event that a Dynamic LT occurs. Therefore, for simplicity we assume that such cancellations happen for free i.e. with probability 1. Due to this simplification, we expect our bounds to be higher (i.e. looser) than the tight bounds. In other words, we underestimate the security of the cipher. Note that we also exclude the cases where the full state at some round has zero difference as the latter is impossible due to the cipher being a permutation.
Algorithms for Bounding MEDCP and MELCC of a Cipher. In this subsection we propose generic approaches that do not depend on the number of rounds per step. In fact, to fully avoid the confusion between rounds and steps in what follows we shall simply refer to SPN rounds.
 1.
Enumerate all possible truncated trails composed of active/inactive Sboxes.
 2.
Find an optimal decomposition of each trail into long trails (LT).
 3.
Bound the probability of each trail using the product of the MEDCP (resp. MELCC) of all active long trails i.e. by applying the Long Trail Argument (see Theorem 1) on the corresponding optimal trail decomposition.
 4.
The maximum bound over all trails is the final upper bound.
This approach is feasible only for a small number of rounds, because the number of trails grows exponentially. The algorithm is based on a recursive dynamic programming approach and has time complexity \(O(wr^2)\), where w is the number of SBoxes applied in parallel in each SBox layer and r is the number of rounds.
As noted, the most complicated step in the above procedure is finding an optimal decomposition of a given truncated trail into long trails. The difficulty arises from the socalled branching: situation in which a long trail may be extended in more than one way. Recall that our definition of LT (cf. Definition 3) relies on the fact that there is no linear transformation on a path between two SBoxes in a LT. The only transformations allowed are some XORs. Therefore, branching happens only when some output word of the linear layer receives two or more active input words without modifications. In order to cut off the branching effect (and thus to make finding the optimal decomposition of a LT feasible), we can put some additional linear functions that will modify the contribution of (some of) the input words. Equivalently, when choosing a linear layer we simply do not consider layers which cause branching of LTs. As we will show later, this restriction has many advantages.
To simplify our study of the linear layer, we introduce a matrix representation for it. In a block cipher operating on w words, a linear layer may be expressed as a \(w\times w\) block matrix. We will denote zero and identity submatrices by 0 and 1 respectively and an unspecified (arbitrary) submatrices by L. This information is sufficient for analyzing the highlevel structure of a cipher. Using this notation, the linear layers to which we restrict our analysis have matrices where each column has at most one 1.
For the special subset of linear layers outlined above, we present an algorithm for obtaining MEDCP and MELCC bounds, that is based on a dynamic programming approach. Since there is no LT branching, any truncated trail consists of disjoint sequences of active SBoxes. By Observation 1, we can treat each such sequence as a LT to obtain an optimal decomposition. Because of this simplification, we can avoid enumerating all trails by grouping them in a particular way.
We proceed round by round and maintain a set of best trails up to an equivalence relation, which is defined as follows. For all SBoxes at the current last round s, we assign a number, which is equal to the length of the LT that covers this SBox, or zero if the SBox is not active. We say that two truncated trails for s steps are equivalent if the tuples consisting of those numbers (current round s and length of LT) are the same for both trails. This equivalence captures the possibility to replace some prefix of a trail by an equivalent one without breaking the validity of the trail or its LT decomposition. The total probability, however, can change. The key observation here is that from two equivalent trails we can keep only the one with the highest current probability. Indeed, if the optimal truncated trail for all r rounds is an extension of the trail for s rounds with lower probability, we can take the first s rounds from the trail with higher probability without breaking anything and obtain a better trail, which contradicts the assumed optimality.
Due to page limit constraints, the pseudocode for the algorithm is given in the full version of this paper [20].
This algorithm can be used to bound the probability of linear trails. Propagation of a linear mask through some linear layer can be described by multiplying the mask by the transposed inverse of the linear layer’s matrix. In our matrix notation we can easily transpose the matrix but inversion is harder. However, we can build the linear trails bottomup (i.e. starting from the last round): in this case we need only the transposed initial matrix. Our algorithm does not depend on the direction, so we obtain bounds on linear trails probabilities by running the algorithm on the transposed matrix using the linear bounds for the iterated Sbox.
Ensuring Resilience Against Integral Attacks. As illustrated by the structural attack against SASAS and a recent generalization [21] to ciphers with more rounds, a spn with few rounds may be vulnerable to integral attacks. This attack strategy has been further improved by Todo [19] who proposed the socalled division property as a means to track which bit should be fixed in the input to have a balanced output. He also described an algorithm allowing an attacker to easily find such distinguishers.
We implemented this algorithm to search for divisionpropertybased integral trails covering as many rounds as possible. With it, for each matrix candidate we compute a maximum number of rounds covered by such a distinguisher. This quantity can then be used by the designer of the primitive to see if the level of protection provided against this type of attack is sufficient or not.
Tracking the evolution of the division property through the linear layer requires special care. In order to do this, we first make a copy of each word and apply the required XORs from the copy to the original words. Due to such state expansion, the algorithm requires both a lot of memory and time. In fact, it is even infeasible to apply on some matrices. To overcome this issue, we ran the algorithm with reduced word size. During our experiments, we observed that such an optimization may only result in longer integral characteristics and that this side effect occurs only for very small word sizes (4 or 5 bits). In light of this, we conjecture that the values obtained in these particular cases are upper bounds and are very close to the values which could be obtained without reducing the word size.
4 The Sparx Family of Ciphers
In this Section, we describe a family of block ciphers built using the framework laid out in the previous section. The instance with block size n and key size k is denoted Sparx\(n/k\).
4.1 High Level View
The plaintexts and ciphertexts consist of \(w = n/32\) words of 32 bits each and the key is divided into \(v = k/32\) such words. The encryption consists of \(n_s\) steps, each composed of an arxbox layer of \(r_a\) rounds and a linear mixing layer. In the arxbox layer, each word of the internal state undergoes \(r_a\) rounds of Speckey, including key additions. The v words in the key state are updated once \(r_a\) arxboxes have been applied to one word of the internal state. The linear layers \(\lambda _{w}\) for \(w=2,4\) provide linear mixing for the w words of the internal state.

addition modulo \(2^{16}\), denoted \(\boxplus \),

16bit exclusiveor (XOR), denoted \(\oplus \), and

16bit rotation to the left or right by i, denoted respectively \(x \lll i\) and \(x \ggg i\).
We claim that no attack using less than \(2^k\) operations exists against Sparx\(n/k\) in neither the singlekey nor in the relatedkey setting. We also faithfully declare that we have not hidden any weakness in these ciphers. Sparx is free for use and its source code is available in the public domain^{5}.
4.2 Specification
The different Sparx instances.
Sparx\(64/128\)  Sparx\(128/128\)  Sparx\(128/256\)  

# State words w  2  4  4 
# Key words v  4  4  8 
# Rounds/Step \(r_a\)  3  4  4 
# Steps \(n_s\)  8  8  10 
Best Attack (# rounds)  15/24  22/32  24/40 
\(\min _{\text {secure}}(n_s)\)  5  5  5 
Sparx64/128. The lightest instance of Sparx is Sparx\(64/128\). It operates on two words of 32 bits and uses a 128bit key. There are 8 steps and 3 rounds per step. As it takes 5 steps to achieve provable security against linear and differential attacks, our security margin is at least equal to \(37\,\%\) of the rounds. Furthermore, while our long trail argument proves that 5 steps are sufficient to ensure that there are no singletrail differential and linear distinguishers, we do not expect this bound to be tight.
The Feistel function \(\mathcal {L^{\prime }}\) can be defined as follows. Let abcd be a 64bit word where each a, ..., d is 16bit long. Let \(t = (a \oplus b \oplus c \oplus d) \lll 8\). Then \(\mathcal {L^{\prime }}(a  b  c  d) = c \oplus t ~~ b \oplus t ~~ a \oplus t ~~ d \oplus t\). This function can also be expressed using 32bit rotations. Let xy be the concatenation of two 32bit words and \(\mathcal {L^{\prime }}_b\) denote \(\mathcal {L^{\prime }}\) without its final branch swap. Let \(t = \big ((x \oplus y) \ggg ^{32} 8\big ) \oplus \big ((x \oplus y) \lll ^{32} 8\big )\), then \(\mathcal {L^{\prime }}_b(x  y) = x \oplus t  y \oplus t\). Alternatively, we can use \(\mathcal {L}\) to compute \(\mathcal {L^{\prime }}_b\) as follows: \(\mathcal {L^{\prime }}_b(x  y) = y \oplus \mathcal {L}(x \oplus y)  x \oplus \mathcal {L}(x \oplus y)\).
4.3 Design Rationale
Choosing the Arx box We chose the round function of Speckey/Speck32 over Marx2 because of its superior implementation properties. Indeed, its smaller total number of operations means that a cipher using it needs to do fewer operations when implemented on a 16bit platform. Ideally, we would have used an arxbox with 32bit operations but, at the time of writing, no such function has known differential and linear bounds (cf. Table 1) for sufficiently many rounds.
We chose to evaluate the iterations of the arxbox over each branch rather than in parallel because such an order decreases the number of times each 32bit branch must be loaded in CPU registers. This matters when the number of registers is too small to contain both the full key and the full internal state of the cipher and does not change anything if it is not the case.
Mixing Layer, Number of Steps and Rounds per Step. Our main approach for choosing the mixing layer was exhaustive enumeration of all matrices suitable for our long trail bounding algorithm from Sect. 3.4 and selecting the final matrix according to various criteria, which we will discuss later.
For Sparx\(64/128\), there is only one linear layer fulfilling our design criteria: one corresponding to a Feistel round. For such a structure, we found that the best integral covers 4 steps (without the last linear layer) and that, with 3 rounds per step, the MEDCP and MELCC are bounded by \(2^{75}\) and \(2^{38}\). These quantities imply that no single trail differential or linear distinguisher exists for 5 or more steps of Sparx\(64/128\).
For Sparx instances with 128bit block we implemented an exhaustive search on a large subset of all possible linear layers. After some filtering, we arrived at roughly 3000 matrices. For each matrix we ran our algorithm from Sect. 3.4 to obtain bounds on MEDCP and MELCC for different values of the number of rounds per step (\(r_a\)). We also ran the algorithm for searching integral characteristics described in Sect. 3.4.
Then, we analyzed the best matrices and found that there is a matrix which corresponds to a Feistellike linear layer with the best differential/linear bound for \(r_a=4\). This choice also offered good compromise between other parameters, such as diffusion, strength of the ARXbox, simplicity and easiness/efficiency of implementation. It also generalizes elegantly the linear layer of Sparx\(64/128\). We thus settled for this Feistellike function.
For more details on the selection procedure and other interesting candidates for the linear layer we refer the reader to the full version of this paper [20].
The Linear Feistel Functions. The linear layer obtained using the steps described above is only specified at a high level, it remains to define the linear Feistel functions \(\mathcal {L}\) and \(\mathcal {L^{\prime }}\). The function \(\mathcal {L}\) that we have chosen has been used in the LaiMassey round constituting the linear layer of Noekeon [22]. We reuse it here because it is cheap on lightweight processors as it only necessitates one rotation by 8 bits and 3 XORs. It also provides some diffusion as it has branching number 3. Its alternative representation using 32bit rotations allows an optimized implementation on 32bit processors.
Used for a larger block size, the Feistel function \(\mathcal {L^{\prime }}\) is a generalization of \(\mathcal {L}\): it also relies on a LaiMassey structure as well as a rotation by 8 bits. The reason behind these choices are the same as before: efficiency and diffusion. Furthermore, \(\mathcal {L^{\prime }}\) must also provide diffusion between the branches. While this is achieved by the XORs, we further added a branch swap in the bits of highest weight. This ensures that if only one 32bit branch is active at the input of \(\mathcal {L^{\prime }}\) then two branches are active in its output. Indeed, there are two possibilities: either the output of the rotation is nonzero, in which case it gets added to the other branch and spreads to the whole state through the branch swap. Otherwise, the output is equal to 0, which means that the two 16bit branches constituting the nonzero 32bit branch hold the same nonzero value. These will then be spread over the two output 32bit branches by the branch swap. The permutation \(\mathcal {L^{\prime }}\) also breaks the 32bit word structure, which can help prevent the spread of integral patterns.
Key Schedule. The key schedules of the different versions of Sparx have been designed using the following general guidelines.
First, we look at criteria related to the implementation. To limit code size, components from the round function of Sparx are reused in the keyschedule itself. To accommodate cases where the memory requirements are particularly stringent, we allow an efficient onthefly computation of the key.
We also consider cryptographic criteria. For example, we need to ensure that the keys used within each chain of 3 or 4 arxboxes are independent from one another. As we do not have enough entropy from the master key to generate truly independent round keys, we must also ensure that the roundkeys are as different as possible from one another. This implies a fast mixing of the master key bits in the key schedule. Furthermore, in order to prevent slide attacks [23], we chose to have the round keys depend on the round index. Finally, since the subkeys are XORed in the key state, we want to limit the presence of high probability differential pattern in the key update. Diffusion in the key state is thus provided by additions modulo \(2^{16}\) rather than exclusiveor. While there may be high probability patterns for additive differences, these would be of little use because the key is added by an XOR to the state.
As with most engineering tasks, some of these requirements are at odds against each other. For example, it is impossible to provide extremely fast diffusion while also being extremely lightweight. Our designs are the most satisfying compromises we could find.
4.4 Security Analysis
Single Trail Differential/Linear Attack. By design and thanks to the long trail argument, we know that there is no differential or linear trail covering 5 steps (or more) with a useful probability for any instance of Sparx. Therefore, the 8 steps used by Sparx\(64/128\) and Sparx\(128/128\) and the 10 used by Sparx\(128/256\) are sufficient to ensure resilience against such attacks.
Attacks Exploiting a Slow Diffusion. We consider several attacks in this category, namely impossible and truncated differential attacks, meetinthe middle attacks as well as integral attacks.
When we chose the linear layers, we ensured that they prevented divisionpropertybased integral attacks, meaning that they provide good diffusion. Furthermore, the Feistel structure of the linear layer makes it easy to analyse and increases our confidence in our designs. In the case of 128bit block sizes, the Feistel function \(\mathcal {L^{\prime }}\) has branching number 3 in the sense that if only one 32bit branch is active then the two output branches are active. This prevents attacks trying to exploit patterns at the branch level. Finally, this Feistel function also breaks the 32bit word structure through a 16bit branch swap which frustrates the propagation of integral characteristics.
Meetinthemiddle attacks are further hindered by the large number of key additions. This liberal use of the key material also makes it harder for an attacker to guess parts of it to add rounds at the top or at the bottom of, say, a differential characteristic.
Best Attacks. The best attacks we could find are integral attacks based on Todo’s division property. The attack against Sparx\(64/128\) covers 15/24 rounds and recovers the key in time \(2^{101}\) using \(2^{37}\) chosen plaintexts and \(2^{64}\) blocks of memory. For 22round Sparx\(128/128\), we can recover the key in time \(2^{105}\) using \(2^{102}\) chosen plaintexts and \(2^{72}\) blocks of memory. Finally, we attack 24round Sparx\(128/256\) in time \(2^{233}\), using \(2^{104}\) chosen plaintexts and \(2^{202}\) blocks of memory. A description of these attacks as well as the description of some time/data tradeoffs are provided in the full version of this paper [20].
4.5 Software Implementation
Performance characteristics of the main components of Sparx
Component  AVR  MSP  ARM  

Cycles  Registers  Cycles  Registers  Cycles  Registers  
A  16  4 + 1  9  2  11  1 + 3 
\(A^{1}\)  19  4  9  2  12  1 + 3 
\(\lambda _{2}\) – 1step  24  8 + 1  11  4 + 3  5  2 + 1 
\(\lambda _{2}\) – 2steps  12  8  7  4 + 1  3  2 
\(\lambda _{4}\) – 1step  48  16 + 2  36  8 + 1  16  4 + 5 
\(\lambda _{4}\) – 2steps  24  16 + 2  13  8 + 1  12  4 + 4 
Implementation Aspects. In order to efficiently implement Sparx on a resource constrained embedded processor, it is important to have a good understanding of its instruction set architecture (ISA). The number of generalpurpose registers determines whether the entire cipher’s state can be fitted into registers or whether a part of it has to be spilled to RAM. Memory operations are generally slower than register operations, consume more energy and increase the vulnerability of an implementation to side channel attacks [25]. Thus, the number of memory operations should be reduced as much as possible. Ideally the state should only be read from memory at the beginning of the cryptographic operation and written back at the end. Concerning the three targets we implemented Sparx for, they have 32 8bit, 12 16bit, and 13 32bit generalpurpose registers, which result in a total capacity of 256 bytes, 192 bytes, and 416 bytes for AVR, MSP, and ARM, respectively.
The Sparx family’s simple structure consists only of three components: the arxbox A and its inverse \(A^{1}\), the linear layer \(\lambda _{2}\) or \(\lambda _{4}\) (depending on the version), and the key addition. The key addition (bitwise XOR) does not require additional registers and its execution time is proportional to the ratio between the operand width and the target device’s register width. The execution time in cycles and the number of registers required to perform A, \(A^{1}\), \(\lambda _{2}\), and \(\lambda _{4}\) on each target device are given in Table 4.
The costly operation in terms of both execution time and number of required registers is the linear layer. The critical point is reached for the 128bit linear layer \(\lambda _{4}\) on MSP, which requires 13 registers. Since this requirement is above the number of available registers, a part of the state has to be saved onto the stack. Consequently, the execution time increases by 5 cycles for each push – pop instruction pair.
Different tradeoffs between the execution time and code size for encryption of a block using Sparx\(64/128\) and Sparx\(128/128\). Minimal values are given in bold.
Implementation  Block size [bits]  AVR  MSP  ARM  

Time [cyc.]  Code [B]  RAM [B]  Time [cyc.]  Code [B]  RAM [B]  Time [cyc.]  Code [B]  RAM [B]  
1step rolled  64  1789  248  2  1088  166  14  1370  176  28 
1step unrolled  64  1641  424  1  907  250  12  1100  348  24 
2steps rolled  64  1677  356  2  1034  232  10  1331  304  28 
2steps unrolled  64  1529  712  1  853  404  8  932  644  24 
1step rolled  128  4553  504  11  2809  300  26  3463  348  44 
1step unrolled  128  4165  1052  10  2353  584  24  2784  884  40 
2steps rolled  128  4345  720  11  2593  432  18  3399  620  40 
2steps unrolled  128  3957  1820  10  2157  1004  16  2377  1692  36 
The linear transformations \(\mathcal {L}\) and \(\mathcal {L^{\prime }}\) exhibit interesting implementation properties. For each platform there is a different optimal way to perform them. The optimal way to implement the linear layers on MSP is using the representations from Figs. 5c and 7b. On ARM the optimal implementation performs the rotations directly on 32bit values. The function \(\mathcal {L}\) can be executed on AVR using 12 XOR instructions and no additional registers. On the other hand, the optimal implementation of \(\mathcal {L^{\prime }}\) on AVR requires 2 additional registers and takes 24 cycles.^{7}
The linear layer performed after the last step of Sparx can be dropped without affecting the security of the cipher, but it turns out that it results in poorer overall performances. The only case when this strategy helps is when top execution time is the main and only concern of an implementation. Thus we preferred to keep the symmetry of the step function and the overall balanced performance figures.
The salient implementationrelated feature of Sparx family of ciphers is given by the simple and flexible structure of the step function depicted in Fig. 4, which can be implemented using different optimization strategies. Depending on specific constraints, such as code size, speed, or energy requirements to name a few, the rounds inside the step function can be rolled or unrolled; one or two step functions can be computed at once. The main possible tradeoffs between the execution time and code size are explored in Table 5.
Except for the 1step implementation of Sparx\(128/128\) on MSP, which needs RAM memory to save the cipher’s state, all other RAM requirements are determined only by the process of saving the context onto the stack at the begging of the measured function. Thus, the RAM consumption of a pure assembly implementation would be zero, except for the 1step rolled and unrolled implementations of Sparx\(128/128\) on MSP.
Due to the 16bit nature of the cipher, performing A and \(A^{1}\) on a 32bit platform requires a little bit more execution time and more auxiliary registers than performing the same operations on a 16bit platform. The process of packing and unpacking a state register to extract and store back the two 16bit branches of A or \(A^{1}\) adds a performance penalty. The cost is amplified by the fact that the flexible second operand can not be used with a constant to extract the least or most significant 16 bits of a 32bit register. Thus an additional masking register is required.
The simple key schedules of Sparx\(64/128\) and Sparx\(128/128\) can be implemented in different ways. The most efficient implementation turns out to be the one using the 1iteration rolled strategy. Another interesting approach is the 4iterations unrolled strategy, which has the benefit that the final permutation is achieved for free by changing the order in which the registers are stored in the round keys. This strategy increases the code size by up to a factor of 4, while the execution time is on average 25 % better.
Although we do not provide performance figures for Sparx\(128/256\), we emphasize that the only differences with respect to implementation aspects between Sparx\(128/256\) and Sparx\(128/128\) are the key schedules and the different number of steps.
Top 10 best implementations in Scenario 1 (encryption key schedule + encryption and decryption of 128 bytes of data using CBC mode) ranked by the Figure of Merit (FOM) defined in FELICS. The results for all ciphers are the current ones from the Triathlon Competition at the moment of submission. The smaller the FOM, the better the implementation.
Rank  Cipher  Block size  Key size  Scenario 1 

FOM  
1  Speck  64  128  5.0 
2  ChaskeyLTS  128  128  5.0 
3  Simon  64  128  6.9 
4  RECTANGLE  64  128  7.8 
5  LEA  128  128  8.0 
6  Sparx SPARX  64  128  8.6 
7  Sparx SPARX  128  128  12.9 
8  HIGHT  64  128  14.1 
9  AES  128  128  15.3 
10  Fantomas  128  128  17.2 
Then we compare the performance of Sparx with the current results available on the Triathlon Competition at the time of submission.^{8} As can be seen in Table 6 the two instances of Sparx perform very well across all platforms and rank very high in the FOMbased ranking. The forerunners are the NSA designs Simon and Speck, Chaskey, RECTANGLE and LEA, but, apart from RECTANGLE, none of them provides provable bounds against differential and linear cryptanalysis.

the execution time of Sparx\(64/128\) on MSP is in the top 3 of the fastest ciphers in both scenarios thanks to its 16bit oriented operations;

the code size of the 1step rolled implementations of Sparx\(64/128\) and Sparx\(128/128\) on MSP is in the top 5 in both scenarios as well as in the small code size and RAM table for scenario 2;

the 1step rolled implementation of Sparx\(64/128\) breaks the previous minimum RAM consumption record on AVR in scenario 2;

the execution time of the 2steps implementation of Sparx\(64/128\) in scenario 2 is in the top 3 on MSP, in the top 5 on AVR, and in the top 7 on ARM; it also breaks the previous minimum RAM consumption records on AVR and MSP.
Given its simple and flexible structure as well as its very good overall ranking in the Triathlon Competition of lightweight block ciphers, the Sparx family of lightweight ciphers is suitable for applications on a wide range of resource constrained devices. The absence of lookup tables reduces the memory requirements and provides, according to [25], some intrinsic resistance against power analysis attacks.
5 Replacing Rotations with Linear Layers: The LAX Construction
In this section we outline an alternative strategy for designing an ARX cipher with provable bounds against differential and linear cryptanalysis. It is completely independent from the Long Trail Strategy outlined in the previous sections and uses the differential properties of modular addition to derive proofs of security.
5.1 Motivation
In his Master thesis [13] Wallén posed the challenge to design a cipher that uses only addition modulo2 and \(\mathrm {GF}(2)\)affine functions, and that is provably resistant against differential and linear cryptanalysis [13, Sect. 5]. In this section we partially solve this challenge by proposing a construction with provable bounds against singletrail differential cryptanalysis (DC).
5.2 Theoretical Background
Definition 6
The XOR linear correlation of addition modulo \(2^{n}\) (\(\mathrm {xlc}^{+}\)) is defined in a similar way. Efficient algorithms for the computation of \(\mathrm {xdp}^{+}\) and \(\mathrm {xlc}^{+}\) have been proposed resp. in [26, 27, 28, 29]. These results also reveal the following property. The magnitude of both \(\mathrm {xdp}^{+}\) and \(\mathrm {xlc}^{+}\) is inversely proportional to the number of bit positions at which the input/output differences (resp. masks) differ. For \(\mathrm {xdp}^{+}\), this fact is formally stated in the form of the following proposition.
Proposition 1
A similar proposition also holds for \(\mathrm {xlc}^{+}\) (see e.g. [10]). Proposition 1 provides the basis of the design strategy described in the following section.
5.3 The LAX Construction
Let L be an \(n \times n\) binary matrix that is (a) invertible and (b) has branch number \(d > 2\). With \(\ell (x)\) is denoted the multiplication of the nbit vector x by the matrix L: \(~\ell (x) = L x\). Note that due to condition (b) it follows that \(\forall x \ne 0: h(x) + h(\ell (x)) \ge d\), where h(x) is the Hamming weight of x.
5.4 Bounds on the Differential Probability of LAX
Lemma 1
For all differences \(\alpha \ne 0\), the differential \((\alpha ,\alpha \rightarrow \alpha )\) is impossible.
Proof
Let \(\mathrm {xdp}^{+}(\alpha ,\beta \rightarrow \gamma ) \ne 0\) for some differences \(\alpha \), \(\beta \) and \(\gamma \). The statement of the lemma follows from the following two properties of \(\mathrm {xdp}^{+}\) [26]. First, it must hold that \(\alpha [0]\oplus \beta [0]\oplus \gamma [0]=0\). Second, if \(\alpha [i]=\beta [i]=\gamma [i]\) for some \(0 \le i \le n2\), then it must hold that \(\alpha [i+1]\oplus \beta [i+1]\oplus \gamma [i+1]=\alpha [i]\). Since we want that \(\alpha = \beta = \gamma \), from the first property it follows that \(\alpha [0] = \beta [0] = \gamma [0]=0\). Given that, due to the second property it follows that \(\alpha [i] = \beta [i] = \gamma [i] = 0\), \(\forall i \ge 1\). Therefore the only value of \(\alpha \) for which \(\mathrm {xdp}^{+}(\alpha ,\beta \rightarrow \gamma ) \ne 0\) and \(\alpha = \beta = \gamma \) is \(\alpha = 0\). \(\square \)
Theorem 2
(Differential bound on 3 rounds of LAX2n ). The maximum differential probability of any trail on 3 rounds of LAX2n is \(2^{(d2)}\), where d is the branch number of the matrix L.
Proof
Let \((\alpha _{i1}, \beta _{i1}, \gamma _{i1})\), \((\alpha _{i}, \beta _{i}, \gamma _{i})\) and \((\alpha _{i+1}, \beta _{i+1}, \gamma _{i+1})\) be the input/output differences of the addition operations in three consecutive rounds of LAX2n and let \(p_{k} = \mathrm {xdp}^{+}(\alpha _{k}, \beta _{k}\rightarrow \gamma _{k})\) for \(k \in \{i1, i, i+1\}\) (see Fig. 10 (right)). We have to show that \(p_{i1} p_{i} p_{i+1} \le 2^{(d2)}\) or, equivalently, that \(\log _2 p_{i1} + \log _2 p_{i} + \log _2 p_{i+1} \le (d2)\). Denote with h(x) the Hamming weight of the word x and with \(h^{*}(x)\) the Hamming weight of x, excluding the MSB. Note that \(h^{*}(x) \le h(x)  1\). We consider two cases:
5.5 Experimental Results
Best differential probabilities and best absolute linear correlations (\(\log _2\) scale) for up to 12 rounds of LAX.
# Rounds  1  2  3  4  5  6  7  8  9  10  11  12  

LAX16  \(p_{\mathrm {best}}\)  \(+0\)  \(2\)  \(4\)  \(7\)  \(8\)  \(11\)  \(13\)  \(16\)  \(18\)  \(20\)  \(23\)  \(25\) 
\(c_{\mathrm {best}}\)  \(+0\)  \(+0\)  \(1\)  \(2\)  \(3\)  \(5\)  \(5\)  \(7\)  \(8\)  \(9\)  \(10\)  \(11\)  
\(p_{\mathrm {bound}}\)  \(3\)  \(6\)  \(9\)  \(12\)  
LAX32  \(p_{\mathrm {best}}\)  \(+0\)  \(2\)  \(6\)  \(9\)  \(11\)  \(16\)  \(18\)  \(20\)  \(24\)  \(28\)  \(29\)  \(34\) 
\(c_{\mathrm {best}}\)  \(+0\)  \(+0\)  \(+0\)  \(4\)  \(4\)  \(8\)  \(8\)  \(8\)  \(8\)  \(12\)  \(12\)  \(16\)  
\(p_{\mathrm {bound}}\)  \(6\)  \(12\)  \(18\)  \(24\) 
Clearly the bound from Theorem 2 does not hold for the linear case. The problem is the “threeforked branch” in the LAX round function that acts as an XOR when the inputs are linear masks rather than differences. Thus, LAX only provides differential bounds and the full solution to the Wallén challenge still remains an open problem.
6 Conclusion
In this paper we presented, for the first time, a general strategy for designing ARX primitives with provable bounds against differential (DC) and linear cryptanalysis (LC) – a long standing open problem in the area of ARX design. The new strategy, called the Long Trail Strategy (LTS) advocates the use of large and computationally expensive Sboxes in combination with very light linear layers (the socalled Long Trail Argument). This makes the LTS to be the exact opposite of the Wide Trail Strategy (WTS) on which the AES (and many other SPN ciphers) are based. Moreover, the proposed strategy is not limited to ARX designs and can easily be applied also to Sbox based ciphers.
To illustrate the effectiveness of the LTS we have proposed a new family of lightweight block ciphers, called SPARX, designed using the new approach. The family has three instances depending on the block and key sizes: Sparx\(64/128\), Sparx\(128/128\) and Sparx\(128/256\). With the help of the Long Trail Argument we prove resistance against singletrail DC and LC for each of the three instances of Sparx. In addition, we analyze the new constructions against a wide range of attacks such as impossible and truncated differentials, meetinthemiddle and integral attacks. Our analysis did not find an attack covering 5 or more rounds of any of the three instances. The latter ensures a security margin of about 37 % of Sparx.
Beside (provable) security the members of the Sparx family are also very efficient. We have implemented them in software on three resource constrained microcontrollers widely used in the Internet of Things (IoT), namely the 8bit Atmel ATmega128, the 16bit TI MSP430, and the 32bit ARM CortexM3. According to the FELICS opensource benchmarking framework our implementations of Sparx\(64/128\) and Sparx\(128/128\) rank respectively 6 and 7 in the list of top 10 most software efficient lightweight ciphers. In addition, the execution time of Sparx\(64/128\) on MSP is in the top 3 of this list. To the best of our knowledge, this paper is the first to propose a practical ARX design that has both arguments for provable security and competitive performance.
A secondary contribution of the paper is the proposal of an alternative strategy for ARX design with provable bounds against differential cryptanalysis. It is independent of the LTS and uses the differential properties of modular addition to derive proofs of security. As an illustration of this approach, the LAX family of constructions is described. The provable security of LAX against linear cryptanalysis is left as an open problem.
Footnotes
 1.
Speck and the MAC Chaskey are being considered for standardization by ISO.
 2.
For Speck, we can only a guess it is the case as the designers have not published the rationale behind their algorithm.
 3.
Both can be lowered by a factor of 2 if we choose rotations (9, 2), (9, 5), (11, 7) or (7, 11) instead of (7, 2).
 4.
This terminology is borrowed from the specification of LED [17] which also groups several calls of the round function into a step.
 5.
 6.
We submitted our implementations of Sparx to the FELICS framework. Up to date results are available at https://www.cryptolux.org/index.php/FELICS.
 7.
For more details please see the implementations submitted to the FELICS framework (https://www.cryptolux.org/index.php/FELICS).
 8.
Up to date results are available at https://www.cryptolux.org/index.php/FELICS.
Notes
Acknowledgements
The work of Daniel Dinu and Léo Perrin is supported by the CORE project ACRYPT (ID C12154009992) funded by the Fonds National de la Recherche, Luxembourg. The work of Aleksei Udovenko is supported by the Fonds National de la Recherche, Luxembourg (project reference 9037104). Vesselin Velichkov is supported by the Internal Research Project CAESAREA of the University of Luxembourg (reference I2RDIRPUL15CAES). The authors thank Anne Canteaut for useful discussions regarding error correcting codes.
References
 1.Bernstein, D.J.: New Stream Cipher Designs: The eSTREAM Finalists. LNCS, vol. 4986. Springer, Heidelberg (2008)CrossRefMATHGoogle Scholar
 2.Bernstein, D.J.: ChaCha, a variant of Salsa20. In: Workshop Record of SASC, vol. 8 (2008)Google Scholar
 3.Niels, F., Lucks, S., Schneier, B., Whiting, D., Bellare, M., Kohno, T., Callas, J., Walker, J.: The Skein hash function family. Submission to NIST (round 3) (2010)Google Scholar
 4.Aumasson, J.P., Henzen, L., Meier, W., Phan, R.C.W.: SHA3 Proposal BLAKE (2010). https://131002.net/blake/blake.pdf
 5.Needham, R.M., Wheeler, D.J.: Tea extensions. Technical report, Cambridge University, Cambridge, UK, October 1997Google Scholar
 6.Dinu, D.D., Le Corre, Y., Khovratovich, D., Perrin, L., Großschädl, J., Biryukov, A.: Triathlon of lightweight block ciphers for the internet of things. In: NIST Workshop on Lightweight Cryptography 2015, National Institute of Standards and Technology (NIST) (2015)Google Scholar
 7.Mouha, N., Mennink, B., Herrewege, A., Watanabe, D., Preneel, B., Verbauwhede, I.: Chaskey: an efficient MAC algorithm for 32bit microcontrollers. In: Joux, A., Youssef, A. (eds.) SAC 2014. LNCS, vol. 8781, pp. 306–323. Springer, Heidelberg (2014). doi: 10.1007/9783319130514_19 CrossRefGoogle Scholar
 8.Beaulieu, R., Shors, D., Smith, J., TreatmanClark, S., Weeks, B., Wingers, L.: The SIMON and SPECK Families of Lightweight Block Ciphers. IACR Cryptology ePrint Archive 2013, 404 (2013)Google Scholar
 9.Hong, D., Lee, J.K., Kim, D.C., Kwon, D., Ryu, K.H., Lee, D.G.: LEA: a 128bit block cipher for fast encryption on common processors. In: Kim, Y., Lee, H., Perrig, A. (eds.) WISA 2013. LNCS, vol. 8267, pp. 3–27. Springer, Heidelberg (2014). doi: 10.1007/9783319051499_1 CrossRefGoogle Scholar
 10.Biryukov, A., Velichkov, V., Le Corre, Y.: Automatic search for the best trails in ARX: application to block cipher Speck. In: Peyrin, T. (ed.) FSE 2016. LNCS, vol. 9783, pp. 289–310. Springer, Heidelberg (2016). doi: 10.1007/9783662529935_15 CrossRefGoogle Scholar
 11.Daemen, J., Rijmen, V.: The wide trail design strategy. In: Honary, B. (ed.) Cryptography and Coding 2001. LNCS, vol. 2260, pp. 222–238. Springer, Heidelberg (2001). doi: 10.1007/3540453253_20 CrossRefGoogle Scholar
 12.Daemen, J., Rijmen, V.: The Design of Rijndael: AESthe Advanced Encryption Standard. Springer, Heidelberg (2002)CrossRefMATHGoogle Scholar
 13.Wallén, J.: On the Differential and Linear Properties of Addition. Master’s thesis, Helsinki University of Technology (2003)Google Scholar
 14.Keliher, L., Sui, J.: Exact maximum expected differential and linear probability for 2round advanced encryption standard. IET Inf. Secur. 1(2), 53–57 (2007)CrossRefGoogle Scholar
 15.Nikolić, I.: Tiaoxin346. Submission to the CAESAR competition (2015)Google Scholar
 16.Jean, J., Nikolić, I.: Efficient design strategies based on the AES round function. In: Peyrin, T. (ed.) FSE 2016. LNCS, vol. 9783, pp. 334–353. Springer, Heidelberg (2016). doi: 10.1007/9783662529935_17 CrossRefGoogle Scholar
 17.Guo, J., Peyrin, T., Poschmann, A., Robshaw, M.: The LED block cipher. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 326–341. Springer, Heidelberg (2011). doi: 10.1007/9783642239519_22 CrossRefGoogle Scholar
 18.Knudsen, L., Wagner, D.: Integral cryptanalysis. In: Daemen, J., Rijmen, V. (eds.) FSE 2002. LNCS, vol. 2365, pp. 112–127. Springer, Heidelberg (2002). doi: 10.1007/3540456619_9 CrossRefGoogle Scholar
 19.Todo, Y.: Structural evaluation by generalized integral property. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 287–314. Springer, Heidelberg (2015). doi: 10.1007/9783662468005_12 Google Scholar
 20.Dinu, D., Perrin, L., Udovenko, A., Velichkov, V., Großschädl, J., Biryukov, A.: Design Strategies for ARX with Provable Bounds: Sparx and LAX (Full Version).Cryptology ePrint Archive, to appear 2016. http://eprint.iacr.org/
 21.Biryukov, A., Khovratovich, D.: Decomposition attack on SASASASAS. Cryptology ePrint Archive, Report 2015/646 (2015). http://eprint.iacr.org/
 22.Daemen, J., Peeters, M., Van Assche, G., Rijmen, V.: Nessie proposal: NOEKEON. In: First Open NESSIE Workshop, pp. 213–230 (2000)Google Scholar
 23.Biryukov, A., Wagner, D.: Slide attacks. In: Knudsen, L. (ed.) FSE 1999. LNCS, vol. 1636, pp. 245–259. Springer, Heidelberg (1999). doi: 10.1007/3540485198_18 CrossRefGoogle Scholar
 24.Dinu, D.D., Biryukov, A., Großschädl, J., Khovratovich, D., Le Corre, Y., Perrin, L.A.: FELICSfair evaluation of lightweight cryptographic systems. In: NIST Workshop on Lightweight Cryptography 2015, National Institute of Standards and Technology (NIST) (2015)Google Scholar
 25.Biryukov, A., Dinu, D., Großschädl, J.: Correlation power analysis of lightweight block ciphers: from theory to practice. In: Manulis, M., Sadeghi, A.R., Schneider, S. (eds.) ACNS 2016. LNCS, vol. 9696, pp. 537–557. Springer, Heidelberg (2016). doi: 10.1007/9783319395555_29 CrossRefGoogle Scholar
 26.Lipmaa, H., Moriai, S.: Efficient algorithms for computing differential properties of addition. In: Matsui, M. (ed.) FSE 2001. LNCS, vol. 2355, pp. 336–350. Springer, Heidelberg (2002). doi: 10.1007/354045473X_28 CrossRefGoogle Scholar
 27.Wallén, J.: Linear approximations of addition modulo 2^{n}. In: Johansson, T. (ed.) FSE 2003. LNCS, vol. 2887, pp. 261–273. Springer, Heidelberg (2003). doi: 10.1007/9783540398875_20 CrossRefGoogle Scholar
 28.Nyberg, K., Wallén, J.: Improved linear distinguishers for SNOW 2.0. In: Robshaw, M. (ed.) FSE 2006. LNCS, vol. 4047, pp. 144–162. Springer, Heidelberg (2006). doi: 10.1007/11799313_10 CrossRefGoogle Scholar
 29.Dehnavi, S.M., Rishakani, A.M., Shamsabad, M.R.M.: A more explicit formula for linear probabilities of modular addition modulo a power of two. Cryptology ePrint Archive, Report 2015/026 (2015). http://eprint.iacr.org/
 30.Kwon, D., Kim, J., Park, S., Sung, S.H., Sohn, Y., Song, J.H., Yeom, Y., Yoon, E.J., Lee, S., Lee, J., Chee, S., Han, D., Hong, J.: New block cipher: ARIA. In: Lim, J.I., Lee, D.H. (eds.) ICISC 2003. LNCS, vol. 2971, pp. 432–445. Springer, Heidelberg (2004). doi: 10.1007/9783540246916_32 CrossRefGoogle Scholar