Keywords

1 Introduction

ARX stands for Addition/Rotation/XOR and denotes a class of cryptographic algorithms based on the simple arithmetic operations: modular addition, bitwise rotation (and bitwise shift) and exclusive-OR. Although the acronym has gained popularity only recently, algorithms using these operations have been designed ever since the 80s.

Some notable historical examples of ARX designs are the block ciphers FEAL (1987), RC5 (1994), and TEA (1994) (with its modified versions XTEA (1997) and XXTEA (1998)). More recent proposals include the stream cipher Salsa20 (2008) and its variant ChaCha (2008); the hash functions BLAKE (2008) (using a modified version of ChaCha) and Skein [12] (2008) (with its underlying block cipher Threefish); the hash function for short messages SipHash (2012) and the block cipher Speck [2] (2013) (both using a variant of Threefish’s MIX operation); the lightweight block cipher LEA (2013) and the MAC algorithm for 32-bit microcontrollers Chaskey (2014) (based on a reduced word-size variant of SipHash’s round function).

All mentioned ARX designs are also called pure, since they are exclusively composed of the three basic ARX operations. In addition, there is also the sub-class of augmented ARX designs that consists of a combination of the ARX operations with other bitwise operations such as Boolean operators, Boolean functions, etc. The most eminent representatives of this group are the hash functions from the MD and SHA families.

As evidenced by the long list of proposals, there is a steady interest in the ARX design philosophy. The reason is the simplicity and efficiency in both software and hardware of these designs. In recent years ARX algorithms have become especially attractive in the area of lightweight cryptography for environments with highly constrained resources. According to new results from the Framework for Fair Evaluation of Lightweight Cryptographic Systems (FELICS) [6], presented at the NIST Lightweight Cryptography Workshop 2015 [25], the most efficient lightweight designs have ARX structure.

The ARX class of primitives is often seen as an alternative to the well-established class of S-box based algorithms, among whose most notable representatives are the block cipher AES [8] and the historically significant block cipher DES [24]. While primitives from this class make use of substitution tables (S-boxes) as a source of non-linearity, the only non-linear component in ARX is the modular addition operation. Due to the latter, these primitives are also less vulnerable to cache-timing and side-channel attacks.

While ARX algorithms provide level of security comparable to S-box based ones, they suffer from a major drawback – the methods for their analysis and design are far less rigorous and mature. For S-box based ciphers it is possible to compute provable bounds on the security against the two most powerful cryptanalytic attacks – differential cryptanalysis [3] and linear cryptanalysis [19] (see e.g. [7]). In contrast, the state of the art in the design of ARX can be summarized in the following heuristic common-sense rule: mix the basic arithmetic operations in a reasonable way and iterate them over sufficient number of rounds. While this strategy seems to be largely successful in practice, it is based more on experience and intuition, rather than on sound scientific arguments.

In this paper we address the mentioned problem by proposing for the first time an algorithm that finds the best differential and linear trails of an ARX cipher for a given number of rounds. It is based on a branch-and-bound search strategy similar to Matsui’s search algorithm that was applied to DES [18] and is inspired by the threshold search technique proposed in [5]. While the latter uses heuristics in order to find high-probability trails that are not necessarily optimal, our algorithm does not use any heuristics and finds optimal results.

The trails found with the described method are optimal under the Markov assumption [14, Sect. 3, Theorem 2] (see also [8, Sect. 6.2,pp. 84]). The Markov assumption ensures that (a) the analyzed primitive is a Markov cipher in the sense of the definition in [14, Sect. 3] and (b) it can be assumed that its round keys are chosen at random independently (i.e. the Hypothesis of independent round keys [8, Sect.  8.7.2] holds). The Markov assumption allows to treat the rounds of an iterated cipher independently and thus to compute the differential probability (resp. absolute linear correlation) of an r-round trail as the product of the probabilities (resp. absolute correlations) of its corresponding 1-round trails. For ciphers that do not satisfy the Markov assumption, fixed keys may exist for which the probability (resp. correlation) of the best differential (resp. linear) trail may significantly deviate from the optimal one as computed with our algorithm.

As a demonstration of the effectiveness of the technique we apply it to block cipher Speck and we report for the first time all provably best (under the Markov assumption) differential trails for reduced number of rounds. We also demonstrate that in some cases the threshold search algorithm returns sub-optimal results. These new results are summarized in Table  1.

Table 1. Probabilities of the best (under the Markov assumption) differential trails for Speck found with Best Search (BS) (Sect. 5) versus best probabilities found with Threshold Search (TS) [4]. The column # lists the number of trails having the best probability. The column R contains the number of rounds.

As noted, the results shown in Table 1 are to be interpreted under the Markov assumption. In Appendix 8 we show for the first time that Speck is not, in fact, a Markov cipher. We stress, however, that making the Markov assumption even for non-Markov ciphers is the best that a cryptanalyst can do in order to be able to analyze such constructions. Furthermore, we have experimentally checked that the reported differentials hold for most of the keys and therefore the results shown in Table 4 are meaningful from a practitioner’s perspective.

The new technique can also be used to design new ARX primitives with provable security bounds against linear and differential cryptanalysis – a long standing problem in the area of ARX design. Our main contributions can be summarized as follow:

  1. 1.

    An algorithm for finding the best differential and linear trails in ARX ciphers that satisfy the Markov Assumption.

  2. 2.

    The probabilities of the best differential trails for up to 10, 9, 8, 7, and 7 rounds of Speck32, Speck48, Speck64, Speck96 and Speck128 respectively, together with the exact number of differential trails that have the best probability.

  3. 3.

    A better choice of rotation constants for Speck w.r.t. single-trail differential cryptanalysis.

  4. 4.

    Bounds on the security of Speck, under the Markov assumption, against differential cryptanalysis, based on the reported best trails.

  5. 5.

    Two atomic ARX constructions with provable bounds against single-trail differential and linear cryptanalysis.

The paper is organized as follows. We begin in Sect. 2 with a review of previous work on techniques for searching for differential and linear trails in ARX. Section 3 provides basic definitions and propositions, necessary to follow the exposition in subsequent sections. A general strategy for searching for the best trails in ARX is described in Sect. 4 and the results from its application to Speck are given in Sect. 5. Two new primitives – MARX and Speckey – with provable bounds against single trail differential and linear cryptanalysis are proposed in Sects. 6 and 7 concludes the paper. The notation used in the paper is summarized in Table 2.

Table 2. Notation.

2 Previous Work

Finding high probability (resp. high absolute correlation) trails for ARX has traditionally been a difficult task. The lack of S-boxes in this class of primitives does not allow to efficiently compute the probabilities (resp. correlations) of all possible differential transitions (resp. linear approximations) by the means of the difference distribution table – DDT (resp. linear approximation table – LAT) of the non-linear elements. This makes the construction of trails in ARX a tedious and especially error-prone process as shown in [15]. Furthermore, while most S-box designs are word-based with relatively small word sizes of 4 and 8 bits, all ARX designs are bit-based with typical size of the words 32 and 64 bits. As a consequence it is not possible to apply elegant design strategies such as the wide-trail [7] to design primitives with provable bounds against differential and linear cryptanalysis. Indeed the design of such an ARX construction is still an open problem.

The described difficulties in the analysis and design of ARX have been addressed by several researchers in the past. Depending on the angle from which they approach the problem, their work can broadly be divided into three categories: bottom-up, top-down and approximation-based techniques. We briefly describe these categories below.

Bottom-up Techniques. This category is by far the largest and encompasses methods for the (automatic) construction of differential and linear trails in ARX. Arguably the first such techniques date back to the collisions on the MD and SHA families of hash functions by Wang et al. [3436]. While these results were reportedly developed by hand, subsequent methods were proposed for the fully automatic construction of differential paths in ARX all of which were applied to augmented ARX designs such as SHA1, SHA2, MD4 and MD5. In [16] was proposed a method for the automatic construction of differential trails in pure ARX designs and applied to the hash function Skein. While many of the mentioned techniques are general and potentially applicable to any ARX primitive, all of them were applied exclusively to hash functions. To fill the gap, in [5] was proposed the threshold search method for searching for differential trails in ARX ciphers such as TEA, XTEA and Speck. This method was subsequently extended to the case of differentials in [4]. Most recently, in 2015, two new techniques for automatic search for linear trails have been proposed. One has been applied to Speck [37], while the other is dedicated to authenticated encryption schemes [11].

Top-down Techniques. Rather than constructing a trail one round at a time as in the bottom-up approach, top-down techniques consider the cipher as a whole. More precisely, the cipher is represented either as a system of Boolean equations or as a system of mixed-integer inequalities. Each solution to the system corresponds to a valid trail. In the first case, the Boolean equations are transformed into a conjunctive normal form (CNF) formula, whose satisfying assignment/s are found with a SAT solver. In the second case, the problem of searching for trails is effectively transformed into a mixed-integer linear problem (MILP) that is usually solved by dedicated MILP solvers using linear-programming based branch-and-bound algorithms. The SAT solver approach has been used to find the best differential trails for several rounds of stream cipher Salsa20 and for proving security bounds for the authenticated encryption cipher NORX. As to the MILP-based methods, up to now they have been successful mainly in the analysis of S-box designs [23, 28]. The only applications of MILP to ARX that we are aware of are the results on the augmented ARX cipher Simon [28] and a very recent paper [13] on Speck appearing in this volume of FSE’16.

Approximation-based Techniques. In both top-down and bottom-up approaches, complex techniques for analysis of existing algorithms are developed. In contrast, in what we call approximation-based techniques, the problem is turned around: new primitives are developed so that they are easy to be analyzed by design. The main idea is to replace the non-linear component of ARX – the modular addition – by a simpler non-linear approximation that can efficiently and accurately be analyzed with existing methods. A design based on this strategy is the authenticated encryption scheme NORX [1]. In it the addition operation is replaced by the first-order approximation \(a \oplus b \oplus (a \wedge b) \ll 1 \approx a \boxplus b\), which effectively limits the carry propagation to a sliding window of 2 bits. The latter significantly facilitates the analysis of the scheme and also makes it hardware efficient.

From the above overview of existing results it is clear that the question of finding optimal trails in pure ARX ciphers has remained largely unexplored so far. The only results in this direction that we are aware of are [21], which applies a SAT solver approach and the MILP-based technique in [13]. While the latter is potentially capable of finding optimal trails, its running time is not well understood. To speed up the search, the authors apply a splicing heuristic and their objective is finding better trails than existing ones rather than finding optimal trails. We address this limitation with the method described in the following sections.

3 Preliminaries

In this section we state basic definitions and propositions, that will be used in later sections. We begin with the definitions of the differential probability \(\mathrm {xdp}^{+}\) and the linear correlation \(\mathrm {xlc}^{+}\).

Definition 1

( \(\mathrm {xdp}^{+}\) ). The XOR differential probability (DP) of addition modulo \(2^{w}\) (\(\mathrm {xdp}^{+}\)) is the probability with which input XOR differences \(\alpha \) and \(\beta \) propagate to output XOR difference \(\gamma \) through the modular addition operation. The probability \(\mathrm {xdp}^{+}\) is computed over all pairs of w-bit inputs (xy):

$$\begin{aligned} {\mathrm {xdp}^{+}}({\alpha }, {\beta } \rightarrow {\gamma }) = 2^{-2w} \cdot {\#\{(x,y) : ((x \oplus {\alpha }) + (y \oplus {\beta })) \oplus (x + y) = {\gamma }\}}. \end{aligned}$$
(1)

The linear correlation \(\mathrm {xlc}^{+}\) is defined in a similar way:

Definition 2

( \(\mathrm {xlc}^{+}\) ). The XOR linear correlation (LC) of addition modulo \(2^{w}\) (\(\mathrm {xlc}^{+}\)) is the correlation of the linear approximation \((\alpha ^{T} x) \oplus (\beta ^{T} y) = (\gamma ^{T} z)\), where \(x,y,z:~ x + y = z \mod 2^{w}\) are w-bit values and \(\alpha \), \(\beta \) and \(\gamma \) are w-bit linear masks, all represented as binary vectors of dimension \(w \times 1\). The operation \(\varGamma ^{T} a\) denotes the dot product between the transposed vector \(\varGamma \) (the mask) and the vector a. The correlation \(\mathrm {xlc}^{+}\) is computed over all pairs of w-bit inputs (xy):

$$\begin{aligned} {\mathrm {xlc}^{+}}({\alpha }, {\beta } \rightarrow {\gamma }) = 2^{-2w+1} \cdot {\#\{(x,y) : (\alpha ^{T} x) \oplus (\beta ^{T} y) = (\gamma ^{T} z)\}} - 1. \end{aligned}$$
(2)

The absolute value of the linear correlation is denoted by \(|\mathrm {xlc}^{+}|\).

The probability \(\mathrm {xdp}^{+}\) has the following property noted in [5, Sect. 2, Proposition 1]:

Proposition 1

(Monotonicity of \(\mathrm {xdp}^{+}\) ). Let \(\alpha \), \(\beta \) and \(\gamma \) be w-bit XOR differences. Denote with \({\tilde{p}_i}\) (\(w \ge i \ge 1\)) the probability \(\mathrm {xdp}^{+}(\alpha [i-1:0], \beta [i-1:0] \rightarrow \gamma [i-1:0])\) of the partial differential composed of the i LS bits of \(\alpha \), \(\beta \), \(\gamma \). Then the probability \(\mathrm {xdp}^{+}\) is monotonously decreasing with the word size of the differences in the direction LSB to MSB:

$$\begin{aligned} {\tilde{p}_{1}} \ge {\tilde{p}_{2}} \ldots \ge {\tilde{p}_{w - 1}} \ge {\tilde{p}_{w}} = {\mathrm {xdp}^{+}}({\alpha }, {\beta } \rightarrow {\gamma }). \end{aligned}$$
(3)

Similar property holds also for \(|\mathrm {xlc}^{+}|\), but in this case the correlation decreases from MSB to LSB of the masks:

Proposition 2

(Monotonicity of \(\mathrm {xlc}^{+}\) ). Let \(\alpha \), \(\beta \) and \(\gamma \) be w-bit linear masks. Denote with \({\tilde{c}_i}\) (\(w-1 \ge i \ge 0\)) the absolute value of the correlation \(\mathrm {xlc}^{+}(\alpha [w-1:i], \beta [w-1:i] \rightarrow \gamma [w-1:i])\) of the partial linear approximation composed of the \(w - i\) MS bits of \(\alpha \), \(\beta \), \(\gamma \). Then the absolute correlation \(|\mathrm {xlc}^{+}|\) is monotonously decreasing with the word size of the masks in the direction MSB to LSB:

$$\begin{aligned} {\tilde{c}_{w-1}} \ge {\tilde{c}_{w-2}} \ldots \ge {\tilde{c}_{1}} \ge {\tilde{c}_{0}} = |{\mathrm {xlc}^{+}}({\alpha }, {\beta } \rightarrow {\gamma })|. \end{aligned}$$
(4)

The DP and LC of modular addition have been thoroughly studied in the literature and optimal methods for their computation have been proposed by several authors: [17, 22, 33](for \(\mathrm {xdp}^{+}\)) and [10, 20, 26, 27, 32, 33] (for \(\mathrm {xlc}^{+}\)). All cited methods are linear in the size of the differences (resp. masks).

In the following sections, for computing \(\mathrm {xdp}^{+}\) we use the method proposed in [17] and for \(\mathrm {xlc}^{+}\) we use the algorithm described in [10].

4 Best Trail Search for ARX

In this section we describe for the first time a Matsui-like algorithm for finding the best differential and linear trails in ARX ciphers for which the Markov assumption holds. Our technique belongs to the class of bottom-up approaches. It is based on Matsui’s branch-and-bound algorithm [18], originally designed for the class of S-box ciphers, and is inspired by the threshold search algorithm proposed in [5].

To search for the best trail on n rounds of a cipher, Matsui’s algorithm is initialized with the best probabilities \(B_1,B_2,\ldots ,B_{n-1}\) for the first \(n-1\) rounds and an over-estimation \({\overline{B}_n} \le B_n\) of the best probability \(B_n\) for n rounds (the bound). The search proceeds recursively over the rounds starting from the first (\(r = 1\)) and gradually builds a trail until the n-th round is reached. At every round \(1 \le r \le n\) the probability \(\prod ^{r}_{i=1} p_i\) of the partially constructed trail up to round r is multiplied by the best probability \(B_{n-r}\) for the remaining \(n-r\) rounds to obtain an estimate for the full trail. If \(B_{n-r}\prod ^{r}_{i=1} p_i < {\overline{B}_n}\) (i.e. the estimate is lower than the bound), the algorithm backtracks to the previous round. In this way branches of the recursion tree, that are not prospective, are cut. At the last round the probability of the full trail is compared to the bound and if it is bigger, the bound is set to the new probability: \({\overline{B}_n} \leftarrow \prod ^{n}_{i=1} p_i\). The procedure terminates when the bound \({\overline{B}_n}\) can not be updated any more. As long as the condition \({\overline{B}_n} \le B_n\) is preserved, the returned result is guaranteed to be optimal. The probabilities (resp. correlations) \(p_i\) are computed by means of the DDT (resp. LAT) of the cipher’s S-box.

In [5] was proposed a variant of Matsui’s algorithm applicable to the class of ARX ciphers, called threshold search. The main idea is to consider addition modulo \(2^{w}\) as a large S-box of size \(2^{2w} \times 2^{w}\). Since computation of the full DDT of this S-box is infeasible for typical word sizes of \(w \ge 16\) bits, the authors propose to use a DDT with reduced size, called partial DDT (or pDDT). The pDDT is composed of (a subset of) all differential transitions that have probability larger than- or equal to a predefined probability threshold. The value of the threshold and the maximum allowed size of the pDDT are chosen heuristically depending on the analyzed primitive. Another proposed heuristic is a limit on the Hamming weight of the differences.

If an input difference with no matching output difference in the pDDT is encountered during the search, a second pDDT is computed on-the-fly. The latter is composed of transitions that (a) have probabilities that are likely to improve the probability of the best trail found so far and (b) are guaranteed to result in input differences to the next round, that have at least one matching output difference in the initial pDDT (as illustrated by the The Highways and Country Roads Analogy [5]). Due to the use of the mentioned heuristics, the trails found by the threshold search algorithm are not necessarily optimal.

Inspired by the threshold search approach, we propose a new variant of Matsui’s algorithm for the class of ARX. In contrast to [5] our technique does not use any heuristics and finds optimal results. The main new idea is to add a second recursion at bit-level over the bits of the differences (resp. linear masks) in addition to the original recursion over the rounds. This modification preserves the optimality of the search due to the monotonicity properties of modular addition stated as Propositions 1 and 2 in Sect. 3. These properties allow us, at every round r, to compute the probability of the partially constructed trail at the bit-level using the partially constructed differences (resp. masks) at round r. Unprospective branches of the search tree are thus effectively cut not only at round-level, but also at bit-level.

In more detail, let \(\alpha _r[0:i]\), \(\beta _r[0:i]\) and \(\gamma _r[0:i]\) be resp. input and output differences to the modular addition at round r, that are partially constructed up to bit i (i.e. only the \(i+1\) LS bits of the words are assigned). Let \({\tilde{p}_r}\) be the probability of the corresponding partially constructed differential: \((\alpha _r[0:i],\beta _r[0:i])\rightarrow \gamma _r[0:i]\). Then at round r and bit i, the algorithm checks whether the following condition holds: \(B_{n-r}{\tilde{p}_r}\prod ^{r-1}_{i=1} p_i \ge {\overline{B}_n}\) i.e. if the product of the probability \(\prod ^{r-1}_{i=1}p_i\) of the partially constructed trail up to round \(r-1\) and the probability \({\tilde{p}_r}\) of the partially constructed differential up to bit i at round r and the best probability \(B_{n-r}\) for the remaining \(n-r\) rounds is still at least as good as the bound \({\overline{B}_n}\). If yes, then the search proceeds recursively to the next bit position \(i+1\) or, if \(i = w\), to the next round \(r+1\). Otherwise, it backtracks to the previous bit or, if \(i = 0\), to the previous round.

With the described strategy, we effectively deal with the problem of having to store huge number of possible transitions through the addition operation. Consequently it is not necessary to maintain a (partial) DDT or to use additional heuristics such as probability and Hamming weight thresholds to limit the search and storage space. Moreover, our algorithm is conceptually closer to Matsui’s original proposal than the threshold search. In his paper [18], Matsui also describes a second level of recursion over the 8 S-boxes of DES (cf. procedure Round-2-j in [18, Sect. 4, p. 371]). With it the probability of a partial trail is computed up to round \(r-1\) and up to S-box i at round r, where \(1 \le i \le 8\). This S-box level recursion is analogous to the proposed bit-level recursion for modular addition.

In the following sections we use the block cipher Speck to illustrate the application of the new technique in practice.

5 Application to Speck

5.1 Description of Speck

Speck is a family of lightweight block ciphers proposed in [2]. It is composed of the five instances Speck32, \(\textsc {Speck}48\), \(\textsc {Speck}64\), \(\textsc {Speck}96\) and \(\textsc {Speck}128\), corresponding resp. to the block sizes 32, 48, 64, 96 and 128 bits. Note that the instance SpeckN has N / 2-bit word size. In the following, with \(\textsc {Speck}\) we denote any of the five variants if not otherwise specified.

Speck is a pure ARX cipher with a Feistel-like structure in which both branches are modified at every round. Let \(X_{r-1,\mathrm {L}}\) and \(X_{r-1,\mathrm {R}}\) be respectively the right and left N / 2-bit input words to the r-th round of SpeckN (\(r \ge 1\)) and let \(k_r\) be the N / 2-bit round key applied at round r (see Fig. 1 (Left)). Then the output words \(X_{r,\mathrm {L}}\), \(X_{r,\mathrm {R}}\) from round r (input words to round \(r+1\)) are computed as follow:

$$\begin{aligned} X_{r,\mathrm {L}}&= ((X_{r-1,\mathrm {L}} \ggg r_1) \boxplus X_{r-1,\mathrm {R}}) \oplus k_r,\end{aligned}$$
(5)
$$\begin{aligned} X_{r,\mathrm {R}}&= (X_{r-1,\mathrm {R}} \lll r_2) \oplus X_{r,\mathrm {L}}. \end{aligned}$$
(6)

The rotation constants \(r_1,r_2\) are specified as: \(r_1 = 7, r_2 = 2\) for Speck32 and \(r_1 = 8, r_2 = 3\) for all other versions. The round function of Speck is depicted in Fig. 1 (Left).

Fig. 1.
figure 1

Left: The round function of Speck. Middle: Propagation of differences: \(\alpha _r = \gamma _{r-1} \ggg r_1\)\(\beta _r = \gamma _{r-1} \oplus (\beta _{r-1} \lll r_2)\). Right: Propagation of linear masks: \(\alpha _r = \varGamma _{r-1,\mathrm {L}} \ggg r_1\),  \(\beta _r = \varGamma _{r-1,\mathrm {R}} \oplus (\varGamma _{i,\mathrm {R}} \ggg r_2)\),  \(\gamma _r = \varGamma _{i,\mathrm {L}} \oplus \varGamma _{i,\mathrm {R}}\). The \(\bullet \) sign denotes a “three-forked branch” and acts as a XOR on the linear masks [18]. Differences \(\mathbf {\gamma _r}\) (resp. masks \(\mathbf {\beta _r}\), \(\mathbf {\gamma _r}\)) in bold can be freely chosen.

Every instance of the Speck family supports several key sizes and the total number of rounds depends on the key size. A summary of the parameters (block size, key size, number of rounds) of all instances of the family is presented in Table 3.

Table 3. Speck parameters: block size (bits), key size (bits), number of rounds.

The key schedule of Speck is based on a simple ARX function that is iterated a fixed number of times. We omit its description herein, as it is not relevant to the presented results. For the detailed description of the Speck family we refer the reader to the original proposal [2].

5.2 Best Trail Search for Speck

In this section we apply the technique described in Sect. 4 in order to find the best (under the Markov assumption) linear and differential trails of reduced-round variants of Speck.

Differential Trail Search. The pseudo-code of the algorithm for the best differential trail search applied to Speck is shown in Algorithm 1. It has three parts: first round (lines (4)–(14)), middle rounds (lines (16)–(25)) and last round (lines (27)–(37)). Every part is composed of two blocks corresponding to the two levels of recursion. In the first round the procedure starts by recursing over the bits of the differences (lines (10)–(14)) beginning with the LSB. When the MSB is reached (line (5)) (i.e. the differences \(\alpha _r\), \(\beta _r\), \(\gamma _r\) are fully constructed), the procedure switches back to the first block (lines (5)–(8)), where it recurses into the next round (line (8)). The logic for the middle and last rounds is the same with the exception that the bit level recursion is over the bits of the output difference \(\gamma _r\) only (lines (22)–(25) and (34)–(37) resp.) and not over the bits of all differences as in the first round. The reason is that the input differences \(\alpha _r\) and \(\beta _r\) to the addition in the middle and last rounds are fixed from the previous round by the following relation: \(\alpha _r = \gamma _{r-1} \ggg r_1\)\(\beta _r = \gamma _{r-1} \oplus (\beta _{r-1} \lll r_2)\) (see line (7) and Fig. 1 (middle)). In addition, at the last round there is no further round level recursion, but instead the bound \({\overline{B}_n}\) is updated (line (32)).

figure a

We estimate the complexity of the differential search algorithm as follows. Let \(m_1 \le 2^{3w}\) be the number of differences \(\alpha _1\), \(\beta _1\) and \(\gamma _1\) in the first round, for which the probability of the differential \((\alpha _1,\beta _1\rightarrow \gamma _1)\) is higher than \({{\overline{B}_n}}/{B_{n-1}}\): \(m_1 = \#\{(\alpha _1,\beta _1,\gamma _1):\mathrm {xdp}^{+}(\alpha _1,\beta _1\rightarrow \gamma _1) \ge {{\overline{B}_n}}/{B_{n-1}}\}\). Analogously, let \(m_r \le 2^{w}\) be the number of differences \(\gamma _r\) in any middle or last round \(r \ge 2\) for which, for fixed \(\alpha _r\) and \(\beta _r\), the probability of the differential \((\alpha _r,\beta _r\rightarrow \gamma _r)\) is higher than \({{\overline{B}_n}}/(B_{n-r}\prod ^{r}_{i=1}p_i)\): \(m_r = \#\{\gamma _r:\mathrm {xdp}^{+}(\alpha _r,\beta _r\rightarrow \gamma _r) \ge {{\overline{B}_n}}/(B_{n-r}\prod ^{r}_{i=1}p_i)\}\), and let m be the maximum among these values: \(m = \max _{n \ge r \ge 2}~(m_r)\). Then the complexity of Algorithm 1 has the form \(\mathcal {O}(\prod ^{n}_{r=1} m_r) \le \mathcal {O}(m_1 m^{r-1})\), which is significantly lower than the complexity of full search \(2^{3w} 2^{w(r-1)} = 2^{w(r+2)}\) as indicated by our experiments. However, the precise quantification of the values \(m_r\), \(r \ge 1\) is difficult, since they change dynamically during the search. The latter is a separate problem in itself, that can be investigated in future research.

Linear Trail Search. The algorithm for linear search for Speck is analogous to the differential case with one significant difference, arising from the way in which linear masks propagate through the round function (see Fig. 1 (right)). Recall that in the differential search, the differences \(\alpha _r\) and \(\beta _r\) in the middle and last rounds are fixed from the previous round. In contrast, in the linear case only the mask \(\alpha _r\) is fixed (with the relation \(\alpha _r = \gamma _{r-1} \lll r_1\)), while \(\beta _r\) depends on the right output masks \(\varGamma _{r-1,\mathrm {R}}\) and \(\varGamma _{r,\mathrm {R}}\) resp. from the previous and current round: \(\beta _r = \varGamma _{r-1,\mathrm {R}} \oplus (\varGamma _{r,\mathrm {R}} \ggg r_2)\). Due to this fact, in the middle and last rounds the linear search algorithm performs a recursion over the bits of one more variable (\(\beta _r\)) in addition to \(\gamma _r\). Furthermore, since the mask \(\varGamma _{r-1,\mathrm {R}}\) can be freely chosen in the first round, an additional iteration over all such masks is performed. The latter is independent of the bound \({\overline{B}_n}\) and therefore represents a fixed cost of \(2^{w}\) additional iterations. All this added complexity makes the linear search algorithm feasible only for the version Speck32.

Due to the mentioned differences, the complexity of the linear search is significantly higher than the differential search. Let \(m_1 \le 2^{3w}\) be the number of masks \(\alpha _1\), \(\beta _1\) and \(\gamma _1\) in the first round, for which the absolute correlation of the linear approximation \((\alpha _1,\beta _1\rightarrow \gamma _1)\) is higher than \({{\overline{B}_n}}/{B_{n-1}}\): \(m_1 = \#\{(\alpha _1,\beta _1,\gamma _1):|\mathrm {xlc}^{+}(\alpha _1,\beta _1\rightarrow \gamma _1)| \ge {{\overline{B}_n}}/{B_{n-1}}\}\). Let \(m_r \le 2^{2w}\) be the number of masks \(\beta _r\) and \(\gamma _r\) in any middle or last round \(r \ge 2\) for which, for fixed \(\alpha _r\), the absolute correlation of the linear approximation \((\alpha _r,\beta _r\rightarrow \gamma _r)\) is higher than \({{\overline{B}_n}}/(B_{n-r}\prod ^{r}_{i=1}c_i)\): \(m_r = \#\{(\beta _r,\gamma _r):|\mathrm {xlc}^{+}(\alpha _r,\beta _r\rightarrow \gamma _r)| \ge {{\overline{B}_n}}/(B_{n-r}\prod ^{r}_{i=1}c_i)\}\), and let \(m = \max _{n \ge r \ge 2}~(m_r)\). Then the complexity of the linear search algorithm has the form: \(\mathcal {O}(2^{w}\prod ^{n}_{r=1} m_r) \le \mathcal {O}(2^{w} m_1 m^{r-1})\), which is much less than the complexity of full search \(2^{4w} 2^{2w(r-1)} = 2^{2w(r+1)}\). In the former, notice the factor \(2^{w}\) due to the additional iteration over all w-bit masks \(\varGamma _{r-1,\mathrm {R}}\) in the first round. Again, similarly to the differential case, the precise quantification of the values \(m_r\), \(r \ge 1\) in the linear case is difficult.

While the higher complexity of the linear search algorithm makes it infeasible for versions of Speck other than Speck32, Algorithm 1 is quite practical as shown by the results reported in the following section.

Table 4. Probabilities and running times for the best (under the Markov assumption) differential trails for Speck obtained with Algorithm 1 (\(\log _2\) scale). Platforms: Intel® Core™ E5-2637 CPU 3.50GHz or HPC cluster for \(\ge 7\) rounds. The column t provides the time needed to find a single best trail in s/m/h/d = seconds/minutes/hours/day, where 1 day = 24 h. Note: times are rounded up.

5.3 Results

With Algorithm 1 we find the best differential trails for reduced round variants of all versions of Speck under the Markov assumption. Table 1 compares our results to the ones obtained with the threshold search algorithm with the parameters given in [4, Sect. 6, Table 6]: probability threshold \(p_{\mathrm {thres}} = 2^{-5}\), Hamming weight threshold \(\mathrm {hw}_{\mathrm {thres}} = 7\) and maximum pDDT size \(2^{30}\). From the table it can be seen that for certain number of rounds Algorithm 1 significantly improves the probabilities found with threshold search.

The execution times of Algorithm 1 for different number of rounds are shown in Table 4. Most of the measurements were done on a PC with Intel® Core™ E5-2637 CPU 3.50GHz. Exceptions are the results for more than 7 rounds and block sizes larger than 48 bits, which were obtained using a parallel version of Algorithm 1 executed on the HPC cluster of the University of Luxembourg [29]. The memory requirements in all cases are negligible.

A final note on the search strategy used for obtaining the times in Table 4: when searching for the best probability for n rounds, we initialize the bound \({\overline{B}_n}\) to the best probability for \((n-1)\) rounds: \({\overline{B}_n} \leftarrow B_{n-1}\). If no trail with this probability is found, the bound is decreased by a factor of 2: \({\overline{B}_n} \leftarrow {\overline{B}_n}/2\). This process continues until a trail with probability equal to the bound is found. Thus the times shown in Table 4 are measured from the start of the program to the moment when the first trail is found.

5.4 Towards Security Bounds for Speck

The results from Table 4 can be used to trivially obtain upper bounds (under the Markov assumption) on the security of Speck against single-trail differential cryptanalysis. For example, given the probability \(p_r\) of the best trail on r rounds and the probability \(p_s\) of the best trail on s rounds, the product \(p_r p_s\) gives an upper bound on the probability of any trail on \(r+s\) rounds. The latter is equivalent to the statement that any trail on \(r+s\) rounds has probability at least \(p_r p_s\) or lower. We use this approach to compute upper bounds (under the Markov assumption) on the probabilities of the best trails on all versions of Speck. The results are shown in Table 5.

Table 5. Upper bounds on the best (under the Markov assumption) probabilities of differential trails on Speck computed using the best probabilities from Table 4 (\(\log _2\) scale).

In view of the probabilities of the best found trails on Speck reported in [4, Sect. 6, Table 6], the bounds in Table 5 are not tight.

5.5 On the Choice of Rotation Constants

We investigated the way in which the choice of the rotation constants \(r_1\) and \(r_2\) (see Fig. 1 (Left)) of Speck32, Speck48 and Speck64 influences the DP of the best trails. For that purpose we assume that the exact values of the constants are not as important as their relative difference. Under this assumption, we fixed \(r_2\) to its original value and varied \(r_1\) over the first 16 possibilities. For each choice, we determined the probability of the best differential trail for 9 rounds of Speck32, 7 rounds of Speck48 and 6 rounds of Speck64 using Algorithm 1. The results are presented in Table 6.

Table 6. Best differential probabilities (DP) for 9 rounds of Speck32, 7 rounds of Speck48 and 6 rounds of Speck64 for 16 choices of the rotation constant \(r_1\) with \(r_2\) fixed to its original value (Fig. 1 (Left)) (\(\log _2\) scale).

From Table 6 it can be seen that the original choice of rotation constants: \(r_1 = 7\) and \(r_2 = 2\) for Speck32 and \(r_1 = 8\) and \(r_2 = 3\) for Speck48 is not optimal w.r.t. the probability of the best differential trail. In the former case, it results in DP of \(2^{-30}\) over 9 rounds, while the optimal choice: \(r_1 = 9\) and \(r_2 = 2\) results in probability \(2^{-31}\). In the latter case, the original rotation constants (8, 3) result in DP of \(2^{-19}\) over 7 rounds, while the choices (4, 3), (5, 3), (7, 3) and (13, 3) result in lower probability \(2^{-21}\). This may hint that we have found better rotation constants for Speck. To be certain however, similar experiments for the linear case must also be conducted. In addition, the implementation cost of each pair of constants must be taken into account. Therefore the optimal choice of \(r_1\) and \(r_2\) requires further investigation.

6 MARX and Speckey: ARX Primitives with Provable Bounds

A limitation of Algorithm 1 is that its complexity significantly increases with the number of rounds and word sizes as indicated by Table 4. To address this problem, in this section we propose two new primitives – MARX and Speckey for which it is feasible to compute the probabilities and linear correlations of the best trails for any number of rounds and which satisfy the Markov assumption. Both primitives have 32-bit state and 32-bit round key.

MARX (from MIX + ARX) is based on the round function of Threefish-256 [12] (with its basic component – the MIX operation) with 8-bit words. This round function is wrapped within a key addition on the input and on the output and is iterated over a fixed number of rounds. Speckey, as the name suggests, is based on block cipher Speck. More precisely, it is Speck32 with modified key addition. The round functions of MARX and Speckey are shown on Fig. 2.

Fig. 2.
figure 2

Left: MARX (from MIX + ARX), based on the round function of Threefish-256 [12] with 32-bit state, 32-bit round key and 8-bit words; Right: Speckey – a variant of Speck32 with modified key addition.

To choose the rotation constants of MARX, we exhaustively searched all possible pairs of constants and applied Algorithm 1 and its linear search version to the resulting variants. Based on the results we selected the constants \(r_1 = 2\) and \(r_2 = 5\), as they provided differential probability (DP) \(\le 2^{-32}\) and absolute linear correlation (LC) \(\le 2^{-17}\) over a minimal number of rounds, namely 12. As to the word permutation, before settling for the one used in Threefish-256, we also considered a Feistel-like variant in which the words are circularly rotated right by one. However this variant required more rounds to reach full diffusion (best DP \(2^{-32}\) and best absolute LC \(2^{-17}\)) compared to Threefish-256 – on average two more rounds were necessary.

The best DP and LC of MARX and Speckey are shown in Table 7.

Table 7. Best differential probabilities (DP) and absolute linear correlations (LC) of MARX and Speckey (\(\log _2\) scale).

The main advantage of MARX and Speckey over Speck32 is that due to the full state key addition at the beginning of every round, these two primitives belong to the class of key-alternating ciphers [9, Sect. 5.1, Definition 2], which is a sub-class of Markov ciphers and therefore satisfies the Markov assumption. In addition, due to the 8 bit modular addition, MARX may be a more suitable choice for devices with 8-bit microprocessors. A disadvantage is that MARX needs two more rounds to achieve full diffusion compared to Speck32 (see Table 7) and that both MARX and Speckey use more operations per round compared to Speck32. In Appendix 9 is described a variant of MARX, called MARX2, that achieves full diffusion in the same number of rounds as Speck32 at the expense of two additional rotation operations.

Finally, we stress that the proposed new primitives are intended to serve mainly as an example of how the best trail search algorithms can be used to design new ARX constructions with provable properties. At present, MARX and Speckey have not undergone sufficient analysis against other cryptanalytic techniques for us to have enough confidence in their cryptographic properties.

7 Conclusion

In this paper we proposed for the first time an adaptation of Matsui’s algorithm for finding the best differential and linear trails in ARX ciphers. We showed the practical application of the new method on reduced round variants of block ciphers from the Speck family and we reported the first provably best differential trails on these variants. The new results were used to compute the first bounds (under the Markov assumption) on the security of Speck against single-trail differential cryptanalysis. In addition, we also reported better choices of the rotation constants for Speck w.r.t. single-trail differential cryptanalysis. Finally, we proposed two new ARX primitives – MARX and Speckey – which satisfy the Markov assumption and have provable bounds against single-trail differential and linear cryptanalysis – a long standing open problem in the area of ARX design. The source code of the tools for best trail search for Speck, Speckey and MARX is publicly available as part of the YAARX Toolkit [30] and a snapshot of the source tree is uploaded on the CryptoLUX website [31].