1 Introduction

The practical security of a symmetric-key primitive is determined by evaluating its resistance against an almost exhaustive list of known cryptanalytic techniques. Therefore, it is of essential importance to generalize existing cryptanalytic methods or develop new techniques. Sometimes the boundary between the two can be quite blurred. For example, the development of the invariant attacks [1,2,3], polytopic cryptanalysis [4], division properties [5, 6], rotational cryptanalysis [7, 8], etc., in recent years belongs to these two approaches.

Another approach is to employ known techniques in combination to enhance the effectiveness of the individual attacks. The boomerang [9] and differential-linear cryptanalysis are the best examples. In particular, during the past four years, we have seen significant advancements in the development of the differential-linear cryptanalysis introduced by Langford and Hellman at CRYPTO 1994 [10], which combines the power of the two most important techniques (differential and linear attacks) for symmetric-key cryptanalysis. Our work starts with an attempt to further extend the differential-linear framework by replacing the differential part of this cryptanalytic technique with rotational-XOR differentials.

Rotational and Rotational-XOR Cryptanalysis. Rotational cryptanalysis was first formally introduced in [8] by Khovratovich and Nikolic, where the evolution of the so-called rotational pair \((x, x \lll t)\) through a target cipher was analyzed. The rotational properties of the building blocks of ARX primitives were then applied to the rotational rebound attack on the hash function Skein [11], and later were refined to consider a chain of modular additions [12]. Recently, cryptanalytic results of ARX-based permutations Chaskey and Chacha with respect to rotational cryptanalysis were reported [13, 14]. Apart from the ARX constructions, permutations built with logical operations without modular additions, also known as AND-RX or LRX [15] primitives, are particularly interesting with respect to rotational attacks. In 2010, Morawiecki et al. applied this technique to distinguish the round-reduced Keccak-f[1600] permutation by feeding in rotational pairs and observing the bias of the XOR of the \((i+t)\)-th and i-th bits of the corresponding outputs, where t is the rotation offset and the addition should be taken modulo the size of the rotated word [16]. We will come back to Morawiecki et al.’s technique and show that it has an intimate relationship with the so-called rotational differential-linear cryptanalysis we proposed in Sect. 3. To thwart rotational attacks, constants which are not rotation-invariant can be injected into the data path. Still, in certain cases, it is possible to overcome this countermeasure with some ad-hoc techniques.

Later, Ashur and Liu  [7] generalized the concept of rotational pair by considering the propagation of a data pair \((x, x')\) that is related by the so-called rotational-XOR (RX) difference \((x \lll t) \oplus x' = \delta \). The cryptanalytic technique based on RX-difference was named as rotational-XOR cryptanalysis. Note that when the RX-difference of the pair \((x, x')\) is zero, it degenerates to a rotational pair. RX cryptanalysis integrates the effect of constants into the analysis, and it has been successfully applied to many ARX or AND-RX designs [17, 18]. Hereafter, we refer both rotational and rotational-XOR cryptanalysis as rotational cryptanalysis, or in a general sense, rotational cryptanalysis contains all the statistical attacks requiring chosen data (e.g., plaintexts) with certain rotational relationships.

Differential-linear Cryptanalysis. Given an encryption function E, we divide it into two consecutive subparts \(E_0\) and \(E_1\). Let \(\delta \rightarrow \varDelta \) be a differential for \(E_0\) with probability p, and \(\varGamma \rightarrow \gamma \) be a linear approximation for \(E_1\) with bias \(\epsilon _{\varGamma , \gamma } = \Pr [ \varGamma \cdot y \oplus \gamma \cdot E_1(y) = 0 ] - \frac{1}{2}\). Then, the overall bias \({\mathcal {E}}_{\delta , \gamma }\) of the differential-linear distinguisher can be estimated with the piling-up lemma [19] as

$$\begin{aligned} {\mathcal {E}}_{\delta , \gamma } = \Pr [ \gamma \cdot ( E(x) \oplus E(x \oplus \delta ) ) = 0 ] - \frac{1}{2} = (-1)^{\varGamma \cdot \varDelta } \cdot 2 p \epsilon _{\varGamma , \gamma }^2, \end{aligned}$$
(1)

since \(\gamma \cdot ( E(x) \oplus E(x \oplus \delta ) )\) can be decomposed into the XOR sum of the following three terms:

$$\begin{aligned} {\left\{ \begin{array}{ll} \varGamma \cdot ( E_0(x) \oplus E_0(x \oplus \delta ) ),\\ \varGamma \cdot E_0(x \oplus \delta ) \oplus \gamma \cdot E(x \oplus \delta ),\\ \varGamma \cdot E_0(x) \oplus \gamma \cdot E(x). \end{array}\right. } \end{aligned}$$

The derivation of Eq. (1) not only relies on the independence of \(E_0\) and \(E_1\), but also the assumption

$$\begin{aligned} \Pr [ \varGamma \cdot ( E_0(x) \oplus E_0(x \oplus \delta )) = 0 ~|~ E_0(x) \oplus E_0(x \oplus \delta ) \ne \varDelta ] = \frac{1}{2}, \end{aligned}$$
(2)

under which we have \(\Pr [ \varGamma \cdot ( E_0(x) \oplus E_0(x \oplus \delta ) ) = 0 ] = \frac{1}{2} + \frac{(-1)^{\varGamma \cdot \varDelta }}{2}p\).

However, it has long been observed that Eq. (2) may fail in many cases, and multiple linear approximations have to be taken into account to make the estimates more accurate [10, 20, 21]. In [22], Blondeau, Leander, and Nyberg presented a closed formula for the overall bias \({\mathcal {E}}_{\delta , \gamma }\) based on the link between differential and linear attacks [23] under the sole assumption that \(E_0\) and \(E_1\) are independent. However, this closed formula is generally not applicable in practice even if \(E_0\) and \(E_1\) are independent, since it requires the computation of the exact bias \(\epsilon _{\delta , v} = \Pr [ v \cdot ( E_0(x) \oplus E_0(x \oplus \delta ) ) = 0 ] - \frac{1}{2}\) for all v.Footnote 1 Moreover, in some cases, the dependency between \(E_0\) and \(E_1\) can be significant. Inspired by the boomerang-connectivity table (BCT) and its successful applications in the context of boomerang attacks [24], Bar-On, Dunkelman, Keller, and Weizman introduced the differential-linear connectivity table (DLCT) [25], where the target cipher is decomposed as \(E = E_1\circ E_\text {m} \circ E_0\) and the actual differential-linear probability of the middle part \(E_\text {m}\) is determined by experiments, fully addressing the issue of dependency in the switch between \(E_0\) and \(E_1\). (The effect of multiple characteristics and approximations still has to be handled by the framework of Blondeau et al. [22].) Beierle, Leander, and Todo presented several improvements to the framework of differential-linear attacks with a special focus on ARX ciphers at CRYPTO 2020 [26]. At EUROCRYPT 2021, Coutinho and Neto proposed a new technique for finding better linear approximations in ARX ciphers, leading to further improvement in the cryptanalysis of ChaCha [27]. Most recently, Broll et al. proposed several new improvements, and improved attacks on Chaskey and Serpent are obtained [28].

Our Contribution. We start from the natural idea to extend the framework of differential-linear attacks by replacing the differential part with rotational-XOR differentials. Specifically, given a pair of data with RX-difference \(\delta = (x \lll t)\oplus x'\) and a linear mask \(\gamma \), a rotational differential-linear distinguisher of a cipher E exploits the bias of \(\gamma \cdot ({\texttt {rot}}(E(x))\oplus E({\texttt {rot}}(x) \oplus \delta ))\), where \({\texttt {rot}}(\cdot )\) is some rotation-like operation.

We then present an informal formula similar to Eq. (1) to estimate the bias of a rotational differential-linear distinguisher by the probability of the rotational-XOR differential covering \(E_0\) and the biases of the linear approximation and its rotated version covering \(E_1\), where \(E = E_1 \circ E_0\). This formula, as in the case of ordinary differential-linear cryptanalysis, requires certain assumptions that may not hold in practice.

Following Blondeau, Leander, and Nyberg’s method [22], we derive the closed formulas for the bias of a rotational differential-linear distinguisher in both the standard and multidimensional cases, completely generalizing the results on ordinary differential-linear distinguishers due to Blondeau, Leander, and Nyberg to the case of rotational differential-linear cryptanalysis. While these formulas are of theoretical interest, they can be hardly applied in practice since this type of formulas involve the computation of correlations of exponentially many trails which is impossible in most situations.

Then, we focus our attention on the special case of rotational differential-linear cryptanalysis where the output linear mask \(\gamma \) is a unit vector. In this case, the bias \( \Pr [ e_i \cdot ({\texttt {rot}}(f(x))\oplus f({\texttt {rot}}(x)\oplus \delta )) = 0 ] - \frac{1}{2}\) is

$$\begin{aligned} \Pr \left[ (E(x))_j \oplus (E(x'))_i = 0 \right] - \frac{1}{2}= \frac{1}{2} - \Pr \left[ (E(x))_j \ne (E(x'))_i \right] , \end{aligned}$$
(3)

for some i and j, where \(x' = {\texttt {rot}}(x)\oplus \delta \). With this formulation, we immediately realize that Morawiecki et al.’s approach [16] gives rise to an efficient method for evaluating the biases of rotational differential-linear distinguishers, as well as ordinary differential-linear distinguishers whose output linear masks are unit vectors. We generalize some results from Morawiecki et al.’s work and arrive at formulas which are able to predict \(\Pr [ (f(x))_j \ne f(x')_i ]\) based on the information \(\Pr [ x_j \ne x_i]\) for many common operations f appearing in ARX designs. In particular, we give the explicit formula for computing the differential-linear and rotational differential-linear probability for an n-bit modular addition with O(n) operations, while a direct application of Bar-On et al.’s approach [25] based on the fast Fourier transformation (FFT) by treating the modular addition as an \(2n \times n\) S-box would require a complexity of \({\mathcal {O}}(2^{2n})\). The probability evaluation can be iteratively applied for an ARX or AND-RX construction. Nevertheless, we note that the accuracy of the probability evaluation is affected by the dependency among the neighbor bits.

We apply the technique of rotational differential-linear cryptanalysis to the cryptographic permutations involved in FRIET, Xoodoo and Alzette. For FRIET, we find a 6-round rotational differential-linear distinguisher with a correlation \(2^{-5.81}\), and it can be extended to a practical 8-round rotational differential-linear distinguisher with a correlation of \(2^{-17.81}\). As a comparison, the correlation of the best known 8-round linear trail of FRIET is \(2^{-40}\). Moreover, our 6-round distinguisher for FRIET can be further extended to a 13-round one. For Xoodoo, we identify a 4-round rotational differential-linear distinguisher with a correlation 1, while previous best result for Xoodoo is a 3-round differential with a probability \(2^{-36}\). For Alzette, the 64-bit ARX-box, we find a 4-round differential-linear distinguisher with a correlation \(2^{-0.27}\) and a 4-round rotational differential-linear distinguisher with a correlation \(2^{-11.37}\). A summary of the results is shown in Table 1, where all distinguishers with practical complexities are experimentally verified.

Table 1 A summary of the results. R-DL = rotational differential-linear, DL = differential-linear, LC = linear characteristic DC = differential characteristic

From the above summarization, we can see that the rotational differential-linear technique can be notably effective against unkeyed cryptographic permutations constructed from modulo additions and basic bitwise operations like AND and XOR. To investigate the applicability of the rotational differential-linear technique with respect to S-box-based designs and keyed primitives, we experimentally apply the method to Midori. Along the way, we give some insight into the difficulties of applying this technique to such primitives. Finally, we propose several open problems deserving further investigations.

Remark 1

This paper is the journal version of [32]. The main difference can be summarized as follows. First of all, this work solves the open problem proposed in [32], establishing the theoretical link between the rotational-XOR differential and linear approximations and deriving the closed formula for the bias of rotational differential-linear distinguishers, completely generalizing the results on ordinary differential-linear distinguishers due to Blondeau, Leander, and Nyberg [22] to the case of rotational differential-linear cryptanalysis. Secondly, this work investigates the possibility of applying the rotational differential-linear technique to S-box-based designs and keyed primitives, discusses the source of difficulties and proposes new open problems deserving further investigation.

Outline. Section 2 introduces the notations and preliminaries for rotational-XOR and linear cryptanalysis. We propose the rotational differential-linear cryptanalysis and establish the theoretical link between the rotational-XOR cryptanalysis and linear cryptanalysis in Sect. 3. This is followed by Sect. 4 where we explore the methods for evaluating the biases of rotational differential-linear distinguishers. In Sect. 5 and Sect. 6, we apply the techniques developed in previous sections to AND-RX and ARX primitives. In Sect. 7, we conclude the paper and discuss the possibilities of applying the rotational differential-linear technique to S-box-based designs and keyed primitives.

2 Notations and Preliminaries

Let \({\mathbb {F}}_2 = \{0, 1\}\) be the field with two elements. We denote by \(x_i\) the i-th bit of a bit string \(x \in {\mathbb {F}}_2^n\). For a vectorial Boolean function \(F: {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2^m\) with \(y = F(x) \in {\mathbb {F}}_2^m\), its i-th output bit \(y_i\) is denoted by \((F(x))_i\). For an n-bit string x, we use the indexing scheme \(x = (x_{n-1}, \cdots ,x_1, x_0)\). In addition, concrete values in \({\mathbb {F}}_2^n\) are specified in hexadecimal notations. For example, we use 1111 to denote the binary string \((\mathtt {0001~0001~0001~0001})_2\).

The XOR-difference and rotational-XOR difference with offset t of two bit strings x and \(x'\) in \({\mathbb {F}}_2^n\) are defined as \(x \oplus x'\) and \((x \lll t) \oplus x'\), respectively. For the rotational-XOR difference \(\delta = (x \lll t) \oplus x'\), we may omit the rotation offset and write \(\delta = \overleftarrow{x} \oplus x'\) or \(\delta = {\texttt {rot}}(x) \oplus x'\) to make the notation more compact when it is clear from the context. Moreover, by abusing the notation, \(\overleftarrow{x}\) and \({\texttt {rot}}(x)\) may rotate the entire string x or rotate the substrings of x to the left separately with a common offset, depending on the context. For instance, in the analysis of Keccak-f, we rotate each lane of the state by certain amount [16]. Correspondingly, \(\overrightarrow{x}\) and \({\texttt {rot}}^{-1}(x)\) rotate x or its substrings to the right. Similar to differential cryptanalysis with XOR-difference, we can define the probability of an RX-differential as follows.

Definition 1

(RX-differential probability) Let \(f: {\mathbb {F}}_{2}^n\rightarrow {\mathbb {F}}_{2}^n\) be a vectorial Boolean function. Let \(\alpha \) and \(\beta \) be n-bit words. Then, the RX-differential probability of the RX-differential \(\alpha \rightarrow \beta \) for f is defined as

$$\begin{aligned} \Pr [\alpha \xrightarrow {\textrm{RX}} \beta ] = 2^{-n}\# \left\{ x \in {\mathbb {F}}_2^n : {\texttt {rot}}(f(x))\oplus f({\texttt {rot}}(x)\oplus \alpha ) = \beta \right\} \end{aligned}$$

Finally, the definitions of correlation, bias, and some lemmas concerning Boolean functions together with the piling-up lemma are needed.

Definition 2

([33, 34]) The correlation of a Boolean function \(f: {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2\) is defined as \(\textrm{cor}(f) =2^{-n} (\# \{ x \in {\mathbb {F}}_2^n: f(x) = 0 \} - \# \{ x \in {\mathbb {F}}_2^n: f(x) = 1 \})\).

Definition 3

([33, 34]) The bias \(\epsilon (f)\) of a Boolean function \(f: {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2\) is defined as \(2^{-n} \# \{ x \in {\mathbb {F}}_2^n: f(x) = 0 \} - \frac{1}{2}\).

From Definition 2 and Definition 3, we can see that \(\textrm{cor}(f) = 2\epsilon (f)\).

Definition 4

Let \(f : {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2\) be a Boolean function. The Walsh-Hadamard transformation takes in f and produces a real-valued function \({\hat{f}} : {\mathbb {F}}_2^n \rightarrow [-2^n, 2^n] \subseteq {\mathbb {R}}\) such that

$$\begin{aligned} \forall w \in {\mathbb {F}}_2^n,~~ {\hat{f}}(w) = \sum _{x \in {\mathbb {F}}_2^n} f(x) (-1)^{x \cdot w}. \end{aligned}$$

Definition 5

Let \(f : {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2\) and \(g :{\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2\) be two Boolean functions. The convolutional product of f and g is a Boolean function defined as

$$\begin{aligned} \forall y \in {\mathbb {F}}_2^n,~~(f \star g)(y) = \sum _{x \in {\mathbb {F}}_2^n} g(x) f(x \oplus y). \end{aligned}$$

Lemma 1

([34], Corollary 2) Let \({\hat{f}}\) be the Walsh-Hadamard transformation of f. Then, the Walsh-Hadamard transformation of \({\hat{f}}\) is \(2^nf\).

Lemma 2

([34], Proposition 6) \({\widehat{(f \star g)}} (z) = {\hat{f}}(z) {\hat{g}}(z)\) and thus \({\widehat{(f \star f)}} = ({\hat{f}})^2\).

Lemma 3

(Piling-up Lemma [19]) Let \(Z_0\), \(\cdots \), \(Z_{m-1}\) be m independent binary random variables with \(\Pr [Z_i = 0] = p_i\). Then, we have that

$$\begin{aligned} \Pr \left[ Z_0 \oplus \cdots \oplus Z_{m-1} = 0 \right] = \frac{1}{2} + 2^{m-1} \prod _{i=0}^{m-1}\left( p_i - \frac{1}{2}\right) , \end{aligned}$$

or alternatively, \( 2 \Pr \left[ Z_0 \oplus \cdots \oplus Z_{m-1} = 0 \right] - 1 = \prod _{i=0}^{m-1}\left( 2p_i - 1 \right) . \)

3 Rotational Differential-Linear Cryptanalysis

A natural extension of the differential-linear cryptanalysis is to replace the differential part of the attack by rotational-XOR (RX) differentials. Let \(E = E_1 \circ E_0\) be an encryption function. Assume that we have an RX-differential \(\delta \rightarrow \varDelta \) covering \(E_0\) with \(\Pr [ {\texttt {rot}}(E_0(x)) \oplus E_0 ({\texttt {rot}}(x) \oplus \delta ) = \varDelta ] = p\) and a linear approximation \(\varGamma \rightarrow \gamma \) of \(E_1\) such that

$$\begin{aligned} {\left\{ \begin{array}{ll} \epsilon _{\varGamma , \gamma } = \Pr \left[ \varGamma \cdot y \oplus \gamma \cdot E_1(y) = 0 \right] - \frac{1}{2}\\ \epsilon _{{\texttt {rot}}^{-1}(\varGamma ), {\texttt {rot}}^{-1}(\gamma )} = \Pr \left[ {\texttt {rot}}^{-1}(\varGamma ) \cdot y \oplus {\texttt {rot}}^{-1}(\gamma ) \cdot E_1(y) = 0 \right] - \frac{1}{2} \end{array}\right. }, \end{aligned}$$

where the probabilities are computed over randomly chosen y. This configuration is shown in Fig. 1.

Fig. 1
figure 1

A high-level illustration of a rotational-differential linear approximation

Let \(x' = {\texttt {rot}}(x) \oplus \delta \). If the assumption

$$\begin{aligned} \Pr \left[ \varGamma \cdot ( {\texttt {rot}}(E_0(x)) \oplus E_0 (x') ) = 0 ~|~ {\texttt {rot}}(E_0(x)) \oplus E_0 (x') \ne \varDelta \right] = \frac{1}{2} \end{aligned}$$
(4)

holds, we have

$$\begin{aligned} \Pr \left[ \varGamma \cdot ( {\texttt {rot}}(E_0(x)) \oplus E_0 (x') ) = 0 \right] = \frac{1}{2} + \frac{(-1)^{\varGamma \cdot \varDelta }}{2}p. \end{aligned}$$

Since \(\gamma \cdot \big ( {\texttt {rot}}(E(x))\oplus E(x') \big )\) can be written as

$$\begin{aligned} \gamma \cdot {\texttt {rot}}\big (E(x) \big ) \oplus \underline{{{\varGamma \cdot {\texttt {rot}}\big (E_0(x)\big )}} \oplus {{\varGamma \cdot \big ({\texttt {rot}}(E_0(x))\oplus E_0(x')\big )}} \oplus {{\varGamma \cdot E_0(x')}}} \oplus \gamma \cdot E(x'), \end{aligned}$$

where the underlined part cancels out, and thus

$$\begin{aligned} \begin{aligned} \gamma \cdot \bigg ( {\texttt {rot}}(E(x))\oplus E(x') \bigg )&= {\texttt {rot}}\bigg ({\texttt {rot}}^{-1}(\gamma )\cdot E(x) \oplus {\texttt {rot}}^{-1}(\varGamma )\cdot E_0(x) \bigg )\\&\oplus \varGamma \cdot \bigg ({\texttt {rot}}(E_0(x))\oplus E_0(x') \bigg )\\&\oplus \varGamma \cdot E_0(x')\oplus \gamma \cdot E(x'). \end{aligned} \end{aligned}$$

Consequently, the bias of the rotational differential-linear distinguisher can be estimated by piling-up lemma as

$$\begin{aligned} {\mathcal {E}}_{\delta ,\gamma }^{\text {R-DL}} = \Pr [ \gamma \cdot ( \overleftarrow{E}(x)\oplus E(x') ) = 0 ] - \frac{1}{2} = (-1)^{\varGamma \cdot \varDelta } ~ 2 p ~\epsilon _{\varGamma , \gamma }~ \epsilon _{{\texttt {rot}}^{-1}(\varGamma ), {\texttt {rot}}^{-1}(\gamma )}, \end{aligned}$$
(5)

and the corresponding correlation of the distinguisher is

$$\begin{aligned} {\mathcal {C}}_{\delta ,\gamma }^{\text {R-DL}} = 2{\mathcal {E}}_{\delta ,\gamma }^{\text {R-DL}} = (-1)^{\varGamma \cdot \varDelta } ~ 4 p~ \epsilon _{\varGamma , \gamma } ~\epsilon _{{\texttt {rot}}^{-1}(\varGamma ), {\texttt {rot}}^{-1}(\gamma )}. \end{aligned}$$
(6)

We can distinguish E from random permutations if the absolute value of \({\mathcal {E}}_{\delta ,\gamma }^{\mathrm{R-DL}}\) or \({\mathcal {C}}_{\delta ,\gamma }^{\text {R-DL}}\) is sufficiently high. Note that if we set the rotation offset to zero, the rotational differential-linear attack is exactly the ordinary differential-linear cryptanalysis. Therefore, the rotational differential-linear attack is a strict generalization of the ordinary differential-linear cryptanalysis.

A rotational differential-linear distinguisher can be extended by appending linear approximations at the end. Given a rotational differential-linear distinguisher of a function f with a bias

$$\begin{aligned} \epsilon _{\delta ,\gamma } = \Pr \left[ \gamma \cdot ({\texttt {rot}}(f(x)) \oplus f({\texttt {rot}}(x)\oplus \delta )) = 0 \right] - \frac{1}{2} , \end{aligned}$$

and a linear approximation \((\gamma ,\mu )\) over a function g with

$$\begin{aligned} {\left\{ \begin{array}{ll} \epsilon _{\gamma ,\mu } = \Pr \left[ \gamma \cdot x \oplus \mu \cdot g(x) = 0 \right] - \frac{1}{2},\\ \epsilon _{{\texttt {rot}}^{-1}(\gamma ), {\texttt {rot}}^{-1}(\mu )} = \Pr \left[ {\texttt {rot}}^{-1}(\gamma ) \cdot x \oplus {\texttt {rot}}^{-1}(\mu ) \cdot g(x) = 0 \right] - \frac{1}{2},\\ \end{array}\right. } \end{aligned}$$

we can compute the bias of the rotational differential-linear distinguisher of \(h = g\circ f\) with input RX-difference \(\delta \) and output linear mask \(\mu \) by the piling-up lemma. Since

$$\begin{aligned} \begin{aligned} \mu \cdot ({\texttt {rot}}(h(x)) \oplus h({\texttt {rot}}(x)\oplus \delta ))&= \gamma \cdot ({\texttt {rot}}(f(x)) \oplus f({\texttt {rot}}(x)\oplus \delta )) \\&\oplus \gamma \cdot {\texttt {rot}}(f(x)) \oplus \mu \cdot {\texttt {rot}}(h(x)) \\&\oplus \gamma \cdot f({\texttt {rot}}(x)\oplus \delta ) \oplus \mu \cdot h({\texttt {rot}}(x)\oplus \delta ) \end{aligned}, \end{aligned}$$

the bias of the rotational differential-linear distinguisher can be estimated as

$$\begin{aligned} \Pr \left[ \mu \cdot ({\texttt {rot}}(h(x)) \oplus h({\texttt {rot}}(x)\oplus \delta )) = 0 \right] - \frac{1}{2} = 4\epsilon _{\delta ,\gamma }~\epsilon _{\gamma ,\mu }~\epsilon _{{\texttt {rot}}^{-1}(\gamma ),{\texttt {rot}}^{-1}(\mu )}.\nonumber \\ \end{aligned}$$
(7)

However, as in ordinary differential-linear attacks, the assumption described by Eq. (4) may not hold in practice, and we prefer a closed formula for the bias \({\mathcal {E}}_{\delta ,\gamma }^{\text {R-DL}}\) without this assumption for much the same reasons leading to Blondeau et al.’s work [22]. Also, we would like to emphasize that if Eqs. (5) and (7) are used to estimate the bias, we should verify the results experimentally whenever possible.

3.1 Link Between RX-Cryptanalysis and Linear Cryptanalysis

In [22], Blondeau et al. proved the following theorem based on the general link between differential and linear cryptanalysis [23].

Theorem 1

([22]) If \(E_0\) and \(E_1\) are independent, the bias of a differential-linear distinguisher with input difference \(\delta \) and output linear mask \(\gamma \) can be computed as

$$\begin{aligned} {\mathcal {E}}_{\delta , \gamma } = \sum _{v \in {\mathbb {F}}_2^n} \epsilon _{\delta , v} c_{v, \gamma }^2, \end{aligned}$$
(8)

for all \(\delta \ne 0\) and \(\gamma \ne 0\), where

$$\begin{aligned} {\left\{ \begin{array}{ll} \epsilon _{\delta , v} = \Pr [ v \cdot ( E_0(x) \oplus E_0(x \oplus \delta ) ) = 0 ] - \frac{1}{2}\\ c_{v, \gamma } = \textrm{cor}( v \cdot y \oplus \gamma \cdot E_1(y) ) \end{array}\right. }. \end{aligned}$$

To replay Blondeau et al.’s technique in an attempt to derive the rotational differential-linear counterpart of Eq. (8), we have to first establish the relationship between rotational differential-linear cryptanalysis and linear cryptanalysis.

Let \(F : {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2^n\) be a vectorial Boolean function. The cardinality of the set

$$\begin{aligned} \left\{ x \in {\mathbb {F}}_2^n : \overleftarrow{F}(x) \oplus F(\overleftarrow{x} \oplus a) = b \right\} \end{aligned}$$

is denoted by \(\xi _F(a, b)\), and the correlation of \( u \cdot x \oplus v \cdot F(x)\) is \(\textrm{cor}(u \cdot x \oplus v \cdot F(x)).\) Let \(\underrightarrow{\overleftarrow{F}}: {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2^n\) be the vectorial Boolean function mapping x to \(\overleftarrow{F}(\overrightarrow{x})\). It is easy to show that

$$\begin{aligned} \textrm{cor} \left( u \cdot x \oplus v \cdot \underrightarrow{\overleftarrow{F}}(x)\right) = \textrm{cor}\left( \overrightarrow{u} \cdot x \oplus \overrightarrow{v} \cdot F(x)\right) . \end{aligned}$$

In what follows, we are going to establish the relationship between

$$\begin{aligned} \xi _F(a, b), ~~\textrm{cor}(u \cdot x \oplus v \cdot F(x)),~~\textrm{and}~~ \textrm{cor}(\overrightarrow{u} \cdot x \oplus \overrightarrow{v} \cdot F(x)). \end{aligned}$$

Definition 6

Given a vectorial Boolean function \(F : {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2^n\), the Boolean function \(\theta _F: {\mathbb {F}}_2^{2n} \rightarrow {\mathbb {F}}_2\) is defined as

$$\begin{aligned} \theta _F(x, y) = {\left\{ \begin{array}{ll} 1~~\textrm{if}~~y = F(x),\\ 0~~\textrm{otherwise}. \end{array}\right. } \end{aligned}$$
(9)

Lemma 4

Let \(F : {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2^n\) be a vectorial Boolean function. Then for any \((a, b) \in {\mathbb {F}}_2^{2n}\), we have \(\xi _F(a, b) = ( \theta _{\underrightarrow{\overleftarrow{F}}} \star \theta _F )(a, b) \).

Proof

According to Definition 5, we have

$$\begin{aligned} \left( \theta _{\underrightarrow{\overleftarrow{F}}} \star \theta _F \right) (a, b)= & {} \sum _{x||y \in {\mathbb {F}}_2^{2n}} \theta _{\underrightarrow{\overleftarrow{F}}} (x, y) \theta _F(a \oplus x, b \oplus y)\\= & {} \sum _{x \in {\mathbb {F}}_2^n } \sum _{y \in {\mathbb {F}}_2^n} \theta _{\underrightarrow{\overleftarrow{F}}} (x, y) \theta _F(a \oplus x, b \oplus y)\\= & {} \sum _{x \in {\mathbb {F}}_2^n } \theta _{\underrightarrow{\overleftarrow{F}}} \left( x, \underrightarrow{\overleftarrow{F}}(x)\right) \theta _F \left( a \oplus x, b \oplus \underrightarrow{\overleftarrow{F}}(x)\right) \\= & {} \sum _{x \in {\mathbb {F}}_2^n } \theta _F \left( a \oplus x, b \oplus \underrightarrow{\overleftarrow{F}}(x)\right) \\= & {} \# \left\{ x \in {\mathbb {F}}_2^n: b \oplus \underrightarrow{\overleftarrow{F}}(x) = F( a \oplus x) \right\} \\= & {} \xi _F(a, b). \end{aligned}$$

\(\square \)

Lemma 5

Let \(F : {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2^n\) be a vectorial Boolean function. Then for any \((a, b) \in {\mathbb {F}}_2^{2n}\), we have \( \textrm{cor}(a \cdot x \oplus b \cdot F(x)) = {2^{-n}} {\hat{\theta }}_{F}(a, b) \), and thus

$$\begin{aligned} \textrm{cor}\left( a \cdot x \oplus b \cdot \underrightarrow{\overleftarrow{F}}(x)\right) = \frac{1}{2^n} {\hat{\theta }}_{\underrightarrow{\overleftarrow{F}}}(a, b). \end{aligned}$$

Proof

According to Definition 4, we have

$$\begin{aligned} \begin{aligned} {\hat{\theta }}_F(a, b)&= \sum _{x||y \in {\mathbb {F}}_2^{2n}} \theta _F(x, y) (-1)^{(x||y) \cdot (a||b)}\\&= \sum _{x \in {\mathbb {F}}_2^n}\sum _{y \in {\mathbb {F}}_2^n} \theta _F(x, y) (-1)^{ a \cdot x \oplus b \cdot y } \\&= \sum _{x \in {\mathbb {F}}_2^n} (-1)^{ a \cdot x \oplus b \cdot F(x) } \\&= 2^n \textrm{cor}(a \cdot x \oplus b \cdot F(x)). \end{aligned} \end{aligned}$$

\(\square \)

For a vectorial Boolean function \(F: {\mathbb {F}}_2^n\rightarrow {\mathbb {F}}_2^n\), we will denote

$$\begin{aligned} \textrm{cor}( \overrightarrow{u} \cdot x \oplus \overrightarrow{v} \cdot F(x) ) \textrm{cor}( u \cdot x \oplus v \cdot F(x) ) \end{aligned}$$

by \(\lambda _F(u, v)\). When F is clear from the context, we may omit F and use \(\lambda (u, v)\) for the sake of simplicity.

Theorem 2

The link between RX-differentials and linear approximations can be summarized as

$$\begin{aligned} \xi _F(a, b) = \sum _{u,v \in {\mathbb {F}}_2^n} (-1)^{u \cdot a \oplus v \cdot b} \lambda _F(u, v) = {\hat{\lambda }}_F(a, b). \end{aligned}$$
(10)

Also, we have

$$\begin{aligned} 2^{2n}\lambda _{F}(u, v) = \sum _{a, b \in {\mathbb {F}}_2^n} (-1)^{u\cdot a \oplus v \cdot b} \xi _F(a, b) = {\hat{\xi }}_F(u,v). \end{aligned}$$
(11)

Proof

According to Lemma 4,

$$\begin{aligned} \xi _F(a, b) = ( \theta _{\underrightarrow{\overleftarrow{F}}} \star \theta _F )(a, b). \end{aligned}$$
(12)

Applying Lemma 1 to Eq. (12) gives

$$\begin{aligned} 2^{2n} \xi _F(a, b) = \widehat{\widehat{( \theta _{\underrightarrow{\overleftarrow{F}}} \star \theta _F )}}(a, b). \end{aligned}$$

Then, according to Lemma 2, we have

$$\begin{aligned} 2^{2n} \xi _F(a, b) = \widehat{\widehat{( \theta _{\underrightarrow{\overleftarrow{F}}} \star \theta _F )}}(a, b) = \widehat{ {\hat{\theta }}_{\underrightarrow{\overleftarrow{F}}} {\hat{\theta }}_F }(a, b). \end{aligned}$$

Since \({\hat{\theta }}_{\underrightarrow{\overleftarrow{F}}} {\hat{\theta }}_F = {2^{2n}} \textrm{cor}(u \cdot x \oplus v \cdot \underrightarrow{\overleftarrow{F}}(x)) \textrm{cor}(u \cdot x \oplus v \cdot F(x))\) due to Lemma 5,

$$\begin{aligned} \begin{aligned} 2^{2n} \xi _F(a, b)&= \widehat{ {\hat{\theta }}_{\underrightarrow{\overleftarrow{F}}} {\hat{\theta }}_F }(a, b)\\&= 2^{2n}\sum _{u||v \in {\mathbb {F}}_2^{2n}} (-1)^{(u||v) \cdot (a || b)} \textrm{cor}(u \cdot x \oplus v \cdot \underrightarrow{\overleftarrow{F}}(x)) \textrm{cor}(u \cdot x \oplus v \cdot F(x))\\&= 2^{2n}\sum _{u, v\in {\mathbb {F}}_2^{n}} (-1)^{u \cdot a \oplus v \cdot b } \textrm{cor}(u \cdot x \oplus v \cdot \underrightarrow{\overleftarrow{F}}(x)) \textrm{cor}(u \cdot x \oplus v \cdot F(x))\\&= 2^{2n}\sum _{u, v\in {\mathbb {F}}_2^{n}} (-1)^{u \cdot a \oplus v \cdot b } \textrm{cor}(\overrightarrow{u} \cdot x \oplus \overrightarrow{v} \cdot F(x)) \textrm{cor}(u \cdot x \oplus v \cdot F(x))\\&= 2^{2n}\sum _{u, v\in {\mathbb {F}}_2^{n}} (-1)^{u \cdot a \oplus v \cdot b } \lambda _F(u, v)\\&= 2^{2n} {\hat{\lambda }}_F(a, b), \end{aligned} \end{aligned}$$

that is,

$$\begin{aligned} \xi _F(a, b) ={\hat{\lambda }}_F(a, b). \end{aligned}$$
(13)

Applying Lemma 1 to Eq. (13) gives \({\hat{\xi }}_F(u,v) = 2^{2n} \lambda _F(u, v)\). \(\square \)

If the function F is rotation invariant, i.e., \(\overleftarrow{F(x)} = F(\overleftarrow{x})\), then we have \(\textrm{cor}( \overrightarrow{u} \cdot x \oplus \overrightarrow{v} \cdot F(x) ) =\textrm{cor}( u \cdot x \oplus v \cdot F(x) )\). As a result, the theoretical link between rotational-XOR and linear cryptanalysis degenerates to the link between ordinary differential cryptanalysis and linear cryptanalysis. Based on the link between differential and linear cryptanalysis, Blondeau et al. derive a closed formula for the bias of an ordinary differential-linear distinguisher as shown in Eq. (8). In addition, Theorem 2 implies the following corollary.

Corollary 1

\(\Pr \left[ a \xrightarrow [F]{\text {RX}} b \right] = 2^{-n} \sum _{u,v \in {\mathbb {F}}_2^n} (-1)^{u \cdot a \oplus v \cdot b} \lambda _F(u, v) \).

Proof

Let \(\xi _F(a, b)\) denote the cardinality of \( \left\{ x \in {\mathbb {F}}_2^n : \overleftarrow{F}(x) \oplus F(\overleftarrow{x} \oplus a) = b \right\} \). Then, \(\Pr \left[ a \xrightarrow [F]{\text {RX}} b \right] = 2^{-n} \xi _F(a,b)\), where \(\xi _F(a,b) = \sum _{u,v \in {\mathbb {F}}_2^n} (-1)^{u \cdot a \oplus v \cdot b} \lambda _F(u, v)\) according to Theorem 2. \(\square \)

Corollary 2

\(\lambda _{F}(u, v) = \frac{1}{2^n} \sum _{a, b \in {\mathbb {F}}_2^n} (-1)^{u\cdot a \oplus v \cdot b} \Pr \left[ a \xrightarrow [F]{\text {RX}} b \right] \).

Proof

It comes from Eq. (11) of Theorem 2. \(\square \)

3.2 The Bias of a Rotational Differential-Linear Distinguisher

We now try to mimic Blondeau et al.’s approach to obtain a closed formula for the bias of a rotational differential-linear distinguisher. Note that this attempt was failed in [32], and it was noted that this was due to a fundamental difference between rotational-XOR differentials and ordinary differentials: the output RX-difference is not necessarily zero when the input RX-difference \({\texttt {rot}}(x) \oplus x'\) is zero. In the following, we show that the difficulty brought by the difference is only technical and does not prevent us from deriving the closed formula.

Definition 7

Let \(V \subseteq {\mathbb {F}}_2^n\) be a linear space and \(\delta \in {\mathbb {F}}_2^n\) be a given vector. The probability of an RX-differential from \(\delta \) to V is defined as

$$\begin{aligned} \Pr \left[ \delta \xrightarrow [F]{\text {RX}} V \right] = \sum _{b \in V} \Pr \left[ \delta \xrightarrow [F]{\text {RX}} b \right] . \end{aligned}$$

Definition 8

Let \(F: {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2^n\) be a vectorial Boolean function. The probability of the RX-differential from a linear space \(U \subseteq {\mathbb {F}}_2^n\) to a linear space \(V \subseteq {\mathbb {F}}_2^n\) for F is defined as

$$\begin{aligned} \Pr \left[ U \xrightarrow [F]{\text {RX}} V \right]&= \frac{1}{2^n \cdot |U|} \# \left\{ (x, a) \in {\mathbb {F}}_2^n \times U: \overleftarrow{F}(x) \oplus F(\overleftarrow{x} \oplus a) \in V \right\} \\&= \frac{1}{2^n \cdot |U|} \# \left\{ (x, a, b) \in {\mathbb {F}}_2^n \times U \times V: \overleftarrow{F}(x) \oplus F(\overleftarrow{x} \oplus a) = b \right\} \\&= \frac{1}{|U|}\sum _{a \in U} \sum _{b \in V} \Pr \left[ a \xrightarrow [F]{\text {RX}} b \right] = \frac{1}{|U|}\sum _{a \in U} \Pr \left[ a \xrightarrow [F]{\text {RX}} V \right] . \end{aligned}$$

Denote by \(\textrm{sp}(\delta )\) the linear space spanned by \(\delta \). According to Definition 8 and Definition 7, we have

$$\begin{aligned} \Pr \left[ \textrm{sp}(\delta ) \xrightarrow [F]{\text {RX}} V \right] = \frac{1}{2} \Pr \left[ \delta \xrightarrow [F]{\text {RX}} V \right] + \frac{1}{2} \Pr \left[ 0 \xrightarrow [F]{\text {RX}} V \right] , \end{aligned}$$

which implies that

$$\begin{aligned} \Pr \left[ \delta \xrightarrow [F]{\text {RX}} V \right] = 2 \Pr \left[ \textrm{sp}(\delta ) \xrightarrow [F]{\text {RX}} V \right] - \Pr \left[ 0 \xrightarrow [F]{\text {RX}} V \right] . \end{aligned}$$
(14)

Note that an additive subgroup \({\mathcal {H}}\) of \({\mathbb {F}}_2^n\) is also a linear subspace of \({\mathbb {F}}_2^n\). Thus, we can define the orthogonal space of \({\mathcal {H}}\) as \( {\mathcal {H}}^{\perp } = \{x \in {\mathbb {F}}_2^n: \forall y \in {\mathcal {H}}, x \cdot y = 0\} \).

Lemma 6

([35]) Let \({\mathcal {H}}\) be an additive subgroup of \({\mathbb {F}}_2^n\), and \(f: {\mathbb {F}}_2^n \rightarrow {\mathbb {R}}\) be a function. Then,

$$\begin{aligned} f(x) = \sum _{ h \in {\mathcal {H}}} (-1)^{x \cdot h} = {\left\{ \begin{array}{ll} |{\mathcal {H}}|, &{}x \in {\mathcal {H}}^{\perp }\\ 0, &{}x \notin {\mathcal {H}}^{\perp } \end{array}\right. }. \end{aligned}$$

Proof

Let \(\{h_1, \cdots , h_c\}\) be a basis of \({\mathcal {H}}\), and thus \( {\mathcal {H}} = \{ \tau _1 h_1 + \cdots + \tau _c h_c: (\tau _1, \cdots , \tau _c) \in {\mathbb {F}}_2^c \} \) has totally \(2^c\) elements. Consequently, we have

$$\begin{aligned} \sum _{h \in {\mathcal {H}}}(-1)^{x \cdot h}&= \sum _{(\tau _1, \cdots , \tau _c) \in {\mathbb {F}}_2^c} (-1)^{x \cdot (\tau _1 h_1 + \cdots + \tau _c h_c)}\\&= \sum _{(\tau _1, \cdots , \tau _c) \in {\mathbb {F}}_2^c} (-1)^{x \cdot {\tau _1 h_1}} \cdots (-1)^{x \cdot {\tau _c h_c}}\\&= \sum _{\tau _1 \in {\mathbb {F}}_2} (-1)^{x \cdot {\tau _1 h_1}} \cdots \sum _{\tau _c \in {\mathbb {F}}_2} (-1)^{x \cdot {\tau _c h_c}}\\&= (1 + (-1)^{x\cdot h_1}) \cdots (1 + (-1)^{x \cdot h_c}), \end{aligned}$$

which equals to \(|{\mathcal {H}}| = 2^c\) if and only if \(x \cdot h_1 = \cdots = x \cdot h_c = 0\). \(\square \)

By setting \({\mathcal {H}} = {\mathbb {F}}_2^n\), Lemma 6 implies the following corollary.

Corollary 3

\( 2^{-n}\sum _{ u \in {\mathbb {F}}_2^n} (-1)^{x \cdot u} = \delta (x)\), where

$$\begin{aligned} \delta (x) = {\left\{ \begin{array}{ll} 1,~~x=0\\ 0,~~x \ne 0 \end{array}\right. }. \end{aligned}$$

Theorem 3

Let U and V be linear spaces in \({\mathbb {F}}_2^n\), then we have

$$\begin{aligned} \Pr \left[ U^{\perp } \xrightarrow [F]{\text {RX}} V^{\perp } \right] = \frac{1}{|V|} \sum _{\begin{array}{c} u \in U \\ v\in V \end{array}} \lambda (u, v), \end{aligned}$$

where \(\lambda (u, v) = \textrm{cor}( \overrightarrow{u} \cdot x \oplus \overrightarrow{v} \cdot F(x) ) \textrm{cor}( u \cdot x \oplus v \cdot F(x) )\).

Proof

According to Definition 8 and Corollary 1, we have

$$\begin{aligned} \Pr \left[ U^{\perp } \xrightarrow [F]{\text {RX}} V^{\perp } \right]&= \frac{1}{|U^\perp |} \sum _{\begin{array}{c} a \in U^\perp \\ b\in V^\perp \end{array}} \Pr \left[ a \xrightarrow [F]{\text {RX}} b \right] \\&= \frac{1}{|U^\perp |} \sum _{\begin{array}{c} a \in U^\perp \\ b\in V^\perp \end{array}} \frac{1}{2^n} \sum _{\begin{array}{c} u \in {\mathbb {F}}_2^n \\ v\in {\mathbb {F}}_2^n \end{array}} (-1)^{u \cdot a \oplus v \cdot b} \lambda (u, v)\\&= \frac{1}{2^n} \cdot \frac{1}{|U^\perp |} \sum _{\begin{array}{c} u \in {\mathbb {F}}_2^n \\ v\in {\mathbb {F}}_2^n \end{array}} \lambda (u, v) \sum _{a \in U^\perp } (-1)^{u \cdot a} \sum _{b \in V^\perp } (-1)^{v \cdot b}. \end{aligned}$$

Applying Lemma 6 gives

$$\begin{aligned} \Pr \left[ U^{\perp } \xrightarrow [F]{\text {RX}} V^{\perp } \right]&= \frac{1}{2^n} \cdot \frac{1}{|U^\perp |} \cdot |U^\perp | \cdot |V^\perp | \sum _{\begin{array}{c} u \in U \\ v\in V \end{array} } \lambda (u, v) \\&= \frac{1}{|V|} \sum _{\begin{array}{c} u \in U \\ v\in V \end{array} } \lambda (u, v). \end{aligned}$$

\(\square \)

Lemma 7

Let \(\lambda (u, v)\) denote \(\textrm{cor}( \overrightarrow{u} \cdot x \oplus \overrightarrow{v} \cdot F(x) ) \textrm{cor}( u \cdot x \oplus v \cdot F(x) )\). Then, for \(u \ne 0\), \(\lambda (u, 0) = 0\), and \(\lambda (0, 0) = 1\).

Proof

\( \lambda (u, 0) = \left( \frac{1}{2^n} \sum _{a \in {\mathbb {F}}_2^n} (-1)^{u \cdot a} \right) \left( \frac{1}{2^n} \sum _{b \in {\mathbb {F}}_2^n} (-1)^{\overrightarrow{u} \cdot b} \right) \). According to Corollary 3, \(\lambda (u, 0) = \delta (u) \delta (\overrightarrow{u}) = \delta (u)\). \(\square \)

Lemma 8

For \(\varDelta \), \(w \in {\mathbb {F}}_2^n\), we have

$$\begin{aligned} \Pr \left[ \varDelta \xrightarrow [F]{\text {RX}} \textrm{sp}(w)^\perp \right] = \frac{1}{2} \sum _{u \in \textrm{sp}(\varDelta )^\perp } \lambda _F(u, w) - \frac{1}{2} \sum _{u \in {\mathbb {F}}_2^n\backslash \textrm{sp}(\varDelta )^\perp } \lambda _F(u, w) + \frac{1}{2}. \end{aligned}$$
(15)

Proof

According to Eq. (14), we have

$$\begin{aligned}\tiny \Pr \left[ \varDelta \xrightarrow [F]{\text {RX}} \textrm{sp}(w)^\perp \right]&= 2 \Pr \left[ \textrm{sp}(\varDelta ) \xrightarrow [F]{\text {RX}} \textrm{sp}(w)^\perp \right] - \Pr \left[ 0 \xrightarrow [F]{\text {RX}} \textrm{sp}(w)^\perp \right] \\&= 2 \cdot \frac{1}{2} \sum _{\begin{array}{c} u \in \textrm{sp}(\varDelta )^\perp \\ v\in \textrm{sp}(w) \end{array}} \lambda (u, v) - \frac{1}{2} \sum _{\begin{array}{c} u \in {\mathbb {F}}_2^n\\ v\in \textrm{sp}(w) \end{array}} \lambda (u, v) ~~~~~~~~~\mathrm{(Theorem}~3\text {)}\\&= \frac{1}{2} \sum _{\begin{array}{c} u \in \textrm{sp}(\varDelta )^\perp \\ v\in \textrm{sp}(w) \end{array}} \lambda (u, v) - \frac{1}{2} \left( \sum _{\begin{array}{c} u \in {\mathbb {F}}_2^n\\ v\in \textrm{sp}(w) \end{array}} \lambda (u, v) - \sum _{\begin{array}{c} u \in \textrm{sp}(\varDelta )^\perp \\ v\in \textrm{sp}(w) \end{array}} \lambda (u, v) \right) \\&= \frac{1}{2} \sum _{\begin{array}{c} u \in \textrm{sp}(\varDelta )^\perp \\ v\in \textrm{sp}(w) \end{array}} \lambda (u, v) - \frac{1}{2} \sum _{\begin{array}{c} u \in {\mathbb {F}}_2^n\backslash \textrm{sp}(\varDelta )^\perp \\ v\in \textrm{sp}(w) \end{array}} \lambda (u, v) \end{aligned}$$

Since \(\textrm{sp}(w) = \{ 0, w\}\), \(\Pr \left[ \varDelta \xrightarrow [F]{\text {RX}} \textrm{sp}(w)^\perp \right] \) is equal to

$$\begin{aligned} \frac{1}{2} \sum _{\begin{array}{c} u \in \textrm{sp}(\varDelta )^\perp \end{array}} \lambda (u, w) - \frac{1}{2} \sum _{\begin{array}{c} u \in {\mathbb {F}}_2^n\backslash \textrm{sp}(\varDelta )^\perp \end{array}} \lambda (u, w) + \frac{1}{2} \sum _{\begin{array}{c} u \in \textrm{sp}(\varDelta )^\perp \end{array}} \lambda (u, 0) - \frac{1}{2} \sum _{\begin{array}{c} u \in {\mathbb {F}}_2^n\backslash \textrm{sp}(\varDelta )^\perp \end{array}} \lambda (u, 0). \end{aligned}$$

Then, applying Lemma 7 gives

$$\begin{aligned} \Pr \left[ \varDelta \xrightarrow [F]{\text {RX}} \textrm{sp}(w)^\perp \right]&= \frac{1}{2} \sum _{\begin{array}{c} u \in \textrm{sp}(\varDelta )^\perp \end{array}} \lambda (u, w) - \frac{1}{2} \sum _{\begin{array}{c} u \in {\mathbb {F}}_2^n\backslash \textrm{sp}(\varDelta )^\perp \end{array}} \lambda (u, w) + \frac{1}{2}. \end{aligned}$$

\(\square \)

Theorem 4

If two parts \(E_0\) and \(E_1\) of an n-bit block cipher \(E = E_1 \circ E_0\) are RX-differentially independent, that is, for all \((a, b) \in {\mathbb {F}}_2^n\times {\mathbb {F}}_2^n\),

$$\begin{aligned} \Pr \left[ a \xrightarrow [E]{\text {RX}} b \right] = \sum _{\varDelta \in {\mathbb {F}}_2^n} \Pr \left[ a \xrightarrow [E_0]{\text {RX}} \varDelta \right] \cdot \Pr \left[ \varDelta \xrightarrow [E_1]{\text {RX}} b \right] , \end{aligned}$$

then we have

$$\begin{aligned} \Pr \left[ \delta \xrightarrow [E]{\text {RX}} \textrm{sp}(w)^\perp \right] - \frac{1}{2} = \sum _{u \in {\mathbb {F}}_2^n} \left( \Pr \left[ \delta \xrightarrow [E_0]{\text {RX}} \textrm{sp}(u)^\perp \right] - \frac{1}{2} \right) \cdot \lambda _{E_1}(u, w). \end{aligned}$$

Proof. Substituting Eq. (15) into the right-hand side of

$$\begin{aligned} \Pr \left[ \delta \xrightarrow [E]{\text {RX}} \textrm{sp}(w)^\perp \right] - \frac{1}{2} = \sum _{\varDelta \in {\mathbb {F}}_2^n} \Pr \left[ \delta \xrightarrow [E_0]{\text {RX}} \varDelta \right] \Pr \left[ \varDelta \xrightarrow [E_1]{\text {RX}} \textrm{sp}(w)^\perp \right] - \frac{1}{2} \end{aligned}$$

gives

$$\begin{aligned} \frac{1}{2} \left( \sum _{ \begin{array}{c} \varDelta \in {\mathbb {F}}_2^n\\ u \in \textrm{sp}(\varDelta )^\perp \end{array} } \Pr \left[ \delta \xrightarrow [E_0]{\text {RX}} \varDelta \right] \lambda (u, w) - \sum _{\begin{array}{c} \varDelta \in {\mathbb {F}}_2^n\\ u \in {\mathbb {F}}_2^n\backslash \textrm{sp}(\varDelta )^\perp \end{array} } \Pr \left[ \delta \xrightarrow [E_0]{\text {RX}} \varDelta \right] \lambda (u, w) \right) . \end{aligned}$$
(16)

Since \({\mathbb {S}} = \{ (u, \varDelta ) : \varDelta \in {\mathbb {F}}_2^n, u \in \textrm{sp}(\varDelta )^\perp \} = \{ (u, \varDelta ) : u \in {\mathbb {F}}_2^n, \varDelta \in \textrm{sp}(u)^\perp \}\) and thus \(({\mathbb {F}}_2^n, {\mathbb {F}}_2^n) \backslash {\mathbb {S}} = \{ (u, \varDelta ) : \varDelta \in {\mathbb {F}}_2^n, u \in {\mathbb {F}}_2^n\backslash \textrm{sp}(\varDelta )^\perp \} = \{ (u, \varDelta ) : u \in {\mathbb {F}}_2^n, \varDelta \in {\mathbb {F}}_2^n\backslash \textrm{sp}(u)^\perp \} \), Eq. (16) can be written as

$$\begin{aligned}&\frac{1}{2} \left( \sum _{ \begin{array}{c} u \in {\mathbb {F}}_2^n\\ \varDelta \in \textrm{sp}(u)^\perp \end{array} } \Pr \left[ \delta \xrightarrow [E_0]{\text {RX}} \varDelta \right] \lambda (u, w) - \sum _{\begin{array}{c} u \in {\mathbb {F}}_2^n\\ \varDelta \in {\mathbb {F}}_2^n\backslash \textrm{sp}(u)^\perp \end{array} } \Pr \left[ \delta \xrightarrow [E_0]{\text {RX}} \varDelta \right] \lambda (u, w) \right) \\&= \frac{1}{2} \left( \sum _{ \begin{array}{c} u \in {\mathbb {F}}_2^n \end{array} } \Pr \left[ \delta \xrightarrow [E_0]{\text {RX}} \textrm{sp}(u)^\perp \right] \lambda (u, w) - \sum _{\begin{array}{c} u \in {\mathbb {F}}_2^n \end{array} } \Pr \left[ \delta \xrightarrow [E_0]{\text {RX}} {\mathbb {F}}_2^n\backslash \textrm{sp}(u)^\perp \right] \lambda (u, w) \right) \\&= \sum _{ \begin{array}{c} u \in {\mathbb {F}}_2^n \end{array} } \left( \Pr \left[ \delta \xrightarrow [E_0]{\text {RX}} \textrm{sp}(u)^\perp \right] - \frac{1}{2}\right) \lambda (u, w). \end{aligned}$$

3.2.1 The Multidimensional Case

Let U and W be subspaces of \({\mathbb {F}}_2^n\), we define the bias of the rotational differential-liner distinguisher in the multidimensional case by

$$\begin{aligned} {\mathcal {E}}_{U,W}^{\text {R-DL}} = \Pr \left[ U^\perp \backslash \{ 0\} \xrightarrow [E]{\text {RX}} W^\perp \right] - \frac{1}{|W|}. \end{aligned}$$

Lemma 9

If two parts \(E_0\) and \(E_1\) of an n-bit block cipher \(E = E_1 \circ E_0\) are RX-differentially independent, that is, for all \((a, b) \in {\mathbb {F}}_2^n\times {\mathbb {F}}_2^n\),

$$\begin{aligned} \Pr \left[ a \xrightarrow [E]{\text {RX}} b \right] = \sum _{\varDelta \in {\mathbb {F}}_2^n} \Pr \left[ a \xrightarrow [E_0]{\text {RX}} \varDelta \right] \cdot \Pr \left[ \varDelta \xrightarrow [E_1]{\text {RX}} b \right] , \end{aligned}$$

then for all u, \(w \in {\mathbb {F}}_2^n\), we have \(\lambda _{E}(u, w) = \sum _{v \in {\mathbb {F}}_2^n} \lambda _{E_0}(u, v) \lambda _{E_1}(v, w)\).

Proof

According to Corollary 2, we have

$$\begin{aligned} \lambda _{E}(u, w) = \frac{1}{2^n} \sum _{a, b \in {\mathbb {F}}_2^n} (-1)^{u\cdot a \oplus w \cdot b} \Pr \left[ a \xrightarrow [E]{\text {RX}} b \right] . \end{aligned}$$

Since \(E = E_1 \circ E_0\) are RX-differentially independent, gives

$$\begin{aligned} \lambda _{E}(u, w) = \frac{1}{2^n} \sum _{a, b \in {\mathbb {F}}_2^n} (-1)^{u\cdot a \oplus w \cdot b} \sum _{c \in {\mathbb {F}}_2^n} \Pr \left[ a \xrightarrow [E_0]{\text {RX}} c \right] \cdot \Pr \left[ c \xrightarrow [E_1]{\text {RX}} b \right] . \end{aligned}$$

Applying Corollary 1, \(\lambda _{E}(u, w)\) can be computed as

$$\begin{aligned}&\frac{1}{2^{2n}}\sum _{c \in {\mathbb {F}}_2^n} \sum _{m, v \in {\mathbb {F}}_2^n} \sum _{a \in {\mathbb {F}}_2^n}(-1)^{(u \oplus m)\cdot a \oplus c \cdot v} \lambda _{E_0}(m, v) \sum _{b \in {\mathbb {F}}_2^n} (-1)^{w \cdot b} \Pr \left[ c \xrightarrow [E_1]{\text {RX}} b \right] \\&= \frac{1}{2^{3n}} \sum _{m, v, s, p \in {\mathbb {F}}_2^n} \lambda _{E_0}(m, v) \lambda _{E_1}(p, s) \sum _{a \in {\mathbb {F}}_2^n}(-1)^{(u \oplus m)\cdot a} \sum _{b \in {\mathbb {F}}_2^n}(-1)^{(w \oplus s)\cdot b} \sum _{c \in {\mathbb {F}}_2^n} (-1)^{(v\oplus p)\cdot c}\\&= \sum _{m, v, s, p \in {\mathbb {F}}_2^n} \lambda _{E_0}(m, v) \lambda _{E_1}(p, s) \delta (u \oplus m) \delta (w \oplus s) \delta (v\oplus p) ~~~~~~~~~\mathrm{(Corollary}~3\text {)} \\&= \sum _{v \in {\mathbb {F}}_2^n} \lambda _{E_0}(u, v) \lambda _{E_1}(v, w) \end{aligned}$$

\(\square \)

Theorem 5

If two parts \(E_0\) and \(E_1\) of an n-bit block cipher \(E = E_1 \circ E_0\) are RX-differentially independent, that is, for all \((a, b) \in {\mathbb {F}}_2^n\times {\mathbb {F}}_2^n\),

$$\begin{aligned} \Pr \left[ a \xrightarrow [E]{\text {RX}} b \right] = \sum _{\varDelta \in {\mathbb {F}}_2^n} \Pr \left[ a \xrightarrow [E_0]{\text {RX}} \varDelta \right] \cdot \Pr \left[ \varDelta \xrightarrow [E_1]{\text {RX}} b \right] , \end{aligned}$$

then we have

$$\begin{aligned} {\mathcal {E}}_{U,W}^{\text {R-DL}} = \frac{2}{|W|} \sum _{v \in {\mathbb {F}}_2^n} \epsilon _{U, v}^{\text {R-DL}} C_{v,W}^{\text {R-DL}} \end{aligned}$$

where \(\epsilon _{U, v}^{\text {R-DL}} = \Pr \left[ U^\perp \backslash \{ 0\} \xrightarrow [E_0]{\text {RX}} \textrm{sp}(v)^\perp \right] \) and \( C_{v,W}^{\text {R-DL}} = \sum _{ w \in W \backslash \{0\} } \lambda _{E_1}(v, w) \).

Proof

According to the Theorem 3, we have

$$\begin{aligned} \Pr \left[ U^\perp \xrightarrow [E_0]{\text {RX}} \textrm{sp}(w)^\perp \right]&= \frac{1}{2}\sum _{ \begin{array}{c} u \in U \\ v\in \textrm{sp}(w) \end{array} }\lambda _{E_0}(u, v)\\&= \frac{1}{2}\sum _{u \in U}\lambda _{E_0}(u, w)+\frac{1}{2}\sum _{u \in U}\lambda _{E_0}(u, 0)\\&= \frac{1}{2}\sum _{u \in U}\lambda _{E_0}(u, w)+\frac{1}{2}. \end{aligned}$$

Thus,

$$\begin{aligned} 2\Pr \left[ U^\perp \xrightarrow [E_0]{\text {RX}} \textrm{sp}(w)^\perp \right] -1=\sum _{u \in U}\lambda _{E_0}(u, w). \end{aligned}$$
(17)

For any subspaces U, W of \({\mathbb {F}}_2^n\), we have

$$\begin{aligned}&\Pr \left[ U^\perp \xrightarrow [E]{\text {RX}} W^\perp \right] \\&= \frac{1}{|W|} \sum _{ \begin{array}{c} u \in U \\ w\in W \end{array} }\lambda _{E}(u, w)~~~~~~~~~\mathrm{(Theorem}~3\text {)}\\&= \frac{1}{|W|}\sum _{ \begin{array}{c} u \in U \\ w\in W \\ v \in {\mathbb {F}}_2^n \end{array} }\lambda _{E_0}(u, v)\lambda _{E_1}(v, w)~~~~~~~~~\mathrm{(Lemma}~9\text {)}\\&= \frac{1}{|W|}\sum _{v \in {\mathbb {F}}_2^n} \sum _{u \in U} \lambda _{E_0}(u, v) \sum _{w \in W }\lambda _{E_1}(v, w)\\&= \sum _{v \in {\mathbb {F}}_2^n} \frac{1}{|W|} \left( 2\Pr \left[ U^\perp \xrightarrow [E_0]{\text {RX}} \textrm{sp}(v)^\perp \right] -1\right) \sum _{w \in W }\lambda _{E_1}(v, w)~~~~~~~~~\mathrm{(Eq.}~17\text {)} \end{aligned}$$

Thus, when \(U^\perp =0=({\mathbb {F}}_2^n)^\perp \),

$$\begin{aligned} \Pr \left[ U^\perp \xrightarrow [E]{\text {RX}} W^\perp \right] =\sum _{v \in {\mathbb {F}}_2^n} \frac{1}{|W|} \left( 2\Pr \left[ 0 \xrightarrow [E_0]{\text {RX}} \textrm{sp}(v)^\perp \right] -1 \right) \sum _{w \in W }\lambda _{E_1}(v, w) \end{aligned}$$

According to Definition 8, for any F, the following relation holds:

$$\begin{aligned} (|U^\perp | -1)\Pr \left[ U^\perp \backslash \{ 0\} \xrightarrow [F]{\text {RX}} W^\perp \right]&= (|U^\perp | -1) \left( \frac{1}{|U^\perp | -1} \sum _{a \in U^\perp \backslash \{ 0\} } \Pr \left[ a \xrightarrow [F]{\text {RX}} W^\perp \right] \right) \\&= \sum _{a \in U^\perp } \Pr \left[ a \xrightarrow [F]{\text {RX}} W^\perp \right] -\Pr \left[ 0 \xrightarrow [F]{\text {RX}} W^\perp \right] \\&= |U^\perp |\Pr \left[ U^\perp \xrightarrow [F]{\text {RX}} W^\perp \right] -\Pr \left[ 0 \xrightarrow [F]{\text {RX}} W^\perp \right] . \end{aligned}$$

Then,

$$\begin{aligned}&(|U^\perp | -1)\Pr \left[ U^\perp \backslash \{ 0\} \xrightarrow [E]{\text {RX}} W^\perp \right] \\&\quad = |U^\perp | \sum _{v \in {\mathbb {F}}_2^n} \frac{1}{|W|}(2\Pr \left[ U^\perp \xrightarrow [E_0]{\text {RX}} \textrm{sp}(v)^\perp \right] -1)\sum _{w \in W }\lambda _{E_1}(v, w)\\&\qquad - \sum _{v \in {\mathbb {F}}_2^n} \frac{1}{|W|}\left( 2\Pr \left[ 0 \xrightarrow [E_0]{\text {RX}} \textrm{sp}(v)^\perp \right] -1 \right) \sum _{w \in W }\lambda _{E_1}(v, w)\\&\quad = \frac{1}{|W|}\sum _{v \in {\mathbb {F}}_2^n} \left( 2 \left( |U^\perp |\Pr \left[ U^\perp \xrightarrow [E_0]{\text {RX}} \textrm{sp}(v)^\perp \right] -\Pr \left[ 0 \xrightarrow [E_0]{\text {RX}} \textrm{sp}(v)^\perp \right] \right) -(|U^\perp | -1) \right) \\&\sum _{w \in W }\lambda _{E_1}(v, w)\\&\quad = \frac{1}{|W|}\sum _{v \in {\mathbb {F}}_2^n} \left( 2(|U^\perp | -1)\Pr \left[ U^\perp \backslash \{ 0\} \xrightarrow [E_0]{\text {RX}} \textrm{sp}(v)^\perp \right] -\left( |U^\perp | -1 \right) \right) \sum _{w \in W }\lambda _{E_1}(v, w). \end{aligned}$$

Dividing both sides by \(|U^\perp | -1\) gives

$$\begin{aligned} \Pr \left[ U^\perp \backslash \{ 0\} \xrightarrow [E]{\text {RX}} W^\perp \right] = \frac{2}{|W|}\sum _{v \in {\mathbb {F}}_2^n} \left( \Pr \left[ U^\perp \backslash \{ 0\} \xrightarrow [E_0]{\text {RX}} \textrm{sp}(v)^\perp \right] -\frac{1}{2}\right) \sum _{w \in W }\lambda _{E_1}(v, w). \end{aligned}$$

Denote \(\Pr \left[ U^\perp \backslash \{ 0\} \xrightarrow [E_0]{\text {RX}} \textrm{sp}(v)^\perp \right] -\frac{1}{2}\) by g(v). Then, \(\Pr \left[ U^\perp \backslash \{ 0\} \xrightarrow [E]{\text {RX}} W^\perp \right] \) is

$$\begin{aligned} \frac{2}{|W|}\sum _{v \in {\mathbb {F}}_2^n} g(v) \sum _{w \in W, w \ne 0 }\lambda _{E_1}(v, w) + \frac{2}{|W|}\sum _{v \in {\mathbb {F}}_2^n} g(v) \sum _{w \in W}\lambda _{E_1}(v, 0). \end{aligned}$$
(18)

According to Lemma 7,

$$\begin{aligned} \frac{2}{|W|}\sum _{v \in {\mathbb {F}}_2^n} g(v) \sum _{w \in W}\lambda _{E_1}(v, 0) =\frac{2}{|W|} g(0), \end{aligned}$$
(19)

where \( g(0) = \Pr \left[ U^\perp \backslash \{ 0\} \xrightarrow [E_0]{\text {RX}} \textrm{sp}(0)^\perp \right] -\frac{1}{2} = 1 - \frac{1}{2} = \frac{1}{2}. \) Consequently, substituting Eq. (19) into Eq. (18) gives

$$\begin{aligned}&\Pr \left[ U^\perp \backslash \{ 0\} \xrightarrow [F]{\text {RX}} W^\perp \right] \\ =&\frac{2}{|W|}\sum _{v \in {\mathbb {F}}_2^n} \left( \Pr \left[ U^\perp \backslash \{ 0\} \xrightarrow [E_0]{\text {RX}} \textrm{sp}(v)^\perp \right] -\frac{1}{2}\right) \sum _{w \in W, w \ne 0 }\lambda _{E_1}(v, w)+\frac{1}{|W|}. \end{aligned}$$

\(\square \)

We would like to remark that while these closed formulas are of theoretical interest, typically it is impossible to apply them in practice since they require the computation of the correlations of an exponentially large number of trails. Next, we consider a special case where the estimation of the overall bias can be computed efficiently.

3.3 Morawiecki et al.’s Technique Revisited

In [16], Morawiecki et al. performed a rotational cryptanalysis on the Keccak-f permutation E. In this attack, the probability of

$$\begin{aligned} \Pr \left[ (E(x))_{i-t} \ne (E(x\lll t))_{i} \right] \end{aligned}$$

was exploited to distinguish the target. In what follows, we show that Morawiecki et al.’s technique can be regarded as a special case of the rotational differential-linear framework.

Eventually, what we exploit in a rotational differential-linear attack associated with an input RX-difference \(\delta \in {\mathbb {F}}_2^n\) and an output linear mask \(\gamma \in {\mathbb {F}}_2^n\) is the abnormally high absolute bias or correlation of the Boolean function

$$\begin{aligned} \gamma \cdot ( {\texttt {rot}}(E(x)) \oplus E( {\texttt {rot}}(x) \oplus \delta ) ). \end{aligned}$$

Following the notation of [22], let \(\textrm{sp}(\gamma ) \subseteq {\mathbb {F}}_2^n \) be the linear space spanned by \(\gamma \), and \( \textrm{sp}(\gamma )^{\perp } = \{ u \in {\mathbb {F}}_2^n: \forall v \in \textrm{sp}(\gamma ), u \cdot v = 0 \} \) be the orthogonal space of \(\textrm{sp}(\gamma )\).

We then define two sets \({\mathbb {D}}_0\) and \({\mathbb {D}}_1\) which form a partition of \({\mathbb {F}}_2^n\):

$$\begin{aligned} \left\{ \begin{array}{l} {\mathbb {D}}_0 = \left\{ x \in {\mathbb {F}}_2^n : {\texttt {rot}}(E(x)) \oplus E( {\texttt {rot}}(x) \oplus \delta ) \in \textrm{sp}(\gamma )^{\perp } \right\} \\ {\mathbb {D}}_1 = \left\{ x \in {\mathbb {F}}_2^n : {\texttt {rot}}(E(x)) \oplus E( {\texttt {rot}}(x) \oplus \delta ) \in {\mathbb {F}}_2^n - \textrm{sp}(\gamma )^{\perp } \right\} \end{array}\right. . \end{aligned}$$

Under the above notations, for any \(x \in {\mathbb {D}}_0\), \(\gamma \cdot ( {\texttt {rot}}(E(x)) \oplus E( {\texttt {rot}}(x) \oplus \delta ) ) = 0\) and for any \(x \in {\mathbb {D}}_1\), \(\gamma \cdot ( {\texttt {rot}}(E(x)) \oplus E( {\texttt {rot}}(x) \oplus \delta ) ) = 1\).

Thus, the higher the absolute value of

$$\begin{aligned} | {\mathbb {D}}_0 |- |{\mathbb {D}}_1| = 2^n \textrm{cor}( \gamma \cdot ( {\texttt {rot}}(E(x)) \oplus E( {\texttt {rot}}(x) \oplus \delta ) ) ), \end{aligned}$$

the more effective the attack is.

If \(\gamma = e_i\) is the i-th unit vector, we have \(\textrm{sp}(\gamma ) = \{ 0, e_i \}\) and \(\textrm{sp}(\gamma )^{\perp }\) contains all vectors whose i-th bit is 0. In this case,

$$\begin{aligned} | {\mathbb {D}}_0| - | {\mathbb {D}}_1|&= 2^n - 2 | {\mathbb {D}}_1| \\&= 2^n-2^{n+1} \left( \textrm{Pr}[ e_i \cdot ( {\texttt {rot}}(E(x)) \oplus E( {\texttt {rot}}(x) \oplus \delta )) = 1 ] \right) \\&= 2^n-2^{n+1} \left( \textrm{Pr}[ (E(x))_{i-t} \ne (E({\texttt {rot}}(x) \oplus \delta ))_i ] \right) \\&= 2^n-2^{n+1} \left( \textrm{Pr}[ (E(x))_{i-t} \ne (E(x')_i ] \right) . \end{aligned}$$

Therefore, the effectiveness of the rotational differential-linear attack can be completely characterized by \(\textrm{Pr}[ (E(x))_{i-t} \ne (E(x'))_i].\) In the next section, we show how to compute this type of probabilities for the target cipher.

4 Evaluate the Bias of Rotational Differential-Linear Distinguishers with Output Masks Being Unit Vectors

According to the previous section, for a rotational differential-linear distinguisher with an input RX-difference \(\delta \) and output linear mask \(e_i\), the bias of the distinguisher can be completely determined by

$$\begin{aligned} \textrm{Pr}\left[ (E(x))_{i-t} \ne (E(x'))_i \right] ,~\textrm{where}~x' = x\lll t \oplus \delta , \end{aligned}$$

and we call it the rotational differential-linear probability or R-DL probability. Note that for a random pair \((x, x' = x\lll t \oplus \delta )\) with rotational-XOR difference \(\delta \in {\mathbb {F}}_2^n\), we have

$$\begin{aligned} \Pr \left[ x_{i-t} \ne x'_i \right] = \frac{1 + (-1)^{1 - \delta _i}}{2}, \end{aligned}$$

for \(0 \le i < n\). Therefore, what we need is a method to evaluate the probability

$$\begin{aligned} \Pr \left[ (F(x))_{i-t} \ne (F(x'))_i \right] \end{aligned}$$

for \(0 \le i < m-1\), where \(F : {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2^m\) is a vectorial Boolean function that represents a component of E. Then, with certain independence assumptions, we can iteratively determine the probability \(\textrm{Pr}[ (E(x))_{i-t} \ne (E(x'))_i]\).

Observation 1

Let \(F : {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2^m\) be a vectorial Boolean function. Assume that the input pair \((x,x')\) satisfies \(\Pr \left[ x_{i-t} \ne x'_i \right] = p_i\) for \(0 \le i < n\), where \(x, x' \in {\mathbb {F}}_2^n\). For \(u \in {\mathbb {F}}_2^n\), we define the set \({\mathcal {S}}_{u} = \{(x,x') \in {\mathbb {F}}_2^n \times {\mathbb {F}}_2^n : (x\lll t)\oplus x' = u\} \) with \(\# {\mathcal {S}}_u = 2^n\). Let \(y_i\) and \(y'_i\) be the i-th bit of F(x) and \(F(x')\) respectively for \(0 \le i < m\). Then, we have

$$\begin{aligned} \Pr \left[ y_{i-t} \ne y_i' \right]&= \sum _{u\in {\mathbb {F}}_2^n}\Pr \left[ y_{i-t} \ne y_i' |(x,x')\in {\mathcal {S}}_u \right] \Pr \left[ (x,x')\in {\mathcal {S}}_u \right] \\&=\sum _{u\in {\mathbb {F}}_2^n}\Pr \left[ y_{i-t} \ne y_i' |(x,x')\in {\mathcal {S}}_u \right] \prod _{i=0}^{n-1}((1-u_i)-(-1)^{u_i}p_i)\\&= \frac{1}{2^n} \sum _{u\in {\mathbb {F}}_2^n} \# \{(x,x')\in {\mathcal {S}}_u : y_{i-t} \ne y_i' \} \prod _{i=0}^{n-1}((1-u_i)-(-1)^{u_i}p_i). \end{aligned}$$

The observation is inspired by Morawiecki et al.’s work on rotational cryptanalysis [16] where, given a rotational pair, the bias of the output pair being unequal at certain bit is calculated for one-bit AND, NOT, and XOR. In the following, we reformulate and generalize their propagation rules in terms of rotational differential-linear probability. Note that all these rules can be derived from Observation 1.

Proposition 1

(AND-rule) Let a, b, \(a'\), and \(b'\) be n-bit strings with \(\Pr \left[ a_{i-t} \ne a_i' \right] = p_i\) and \(\Pr \left[ b_{i-t} \ne b_i' \right] = q_i\). Then,

$$\begin{aligned} \Pr \left[ (a \wedge b)_{i-t} \ne (a' \wedge b')_i \right] = \frac{1}{2}(p_i + q_i - p_iq_i). \end{aligned}$$

Proposition 2

(XOR-rule) Let a, b, \(a'\), and \(b'\) be n-bit strings with \(\Pr \left[ a_{i-t} \ne a_i' \right] = p_i\) and \(\Pr \left[ b_{i-t} \ne b_i' \right] = q_i\). Then,

$$\begin{aligned} \Pr \left[ (a \oplus b)_{i-t} \ne (a' \oplus b')_i \right] = p_i + q_i - 2p_iq_i. \end{aligned}$$

Proposition 3

(NOT-rule) Let a and b be n-bit strings with \(\Pr \left[ a_{i-t} \ne b_i \right] = p_i\). Then, \(\Pr \left[ {\bar{a}}_{i-t} \ne {\bar{b}}_i \right] = p_i\).

Next, we consider constant additions. Let \((x,x') \in {\mathbb {F}}_2^{2n}\) be a data pair with \(\Pr \left[ x_{i-t}\ne x'_i \right] = p_i\) for some integer t and \(c \in {\mathbb {F}}_2^n\) be a constant. Then, \(\Pr \left[ (x\oplus c)_{i-t}\ne (x'\oplus c)_i \right] = \Pr \left[ x_{i-t} \oplus x'_i \ne c_{i-t} \oplus c_i \right] \). In [16], only the cases where \(c_{i-t} \oplus c_i = 1\) or \(c_{i-t} = c_i = 0\) are considered. We generalize the rule for constant addition from [16] to the following proposition with all possibilities taken into account.

Proposition 4

(Adjusted C-rule) Let a and \(a'\) be n-bit strings with \(\Pr \left[ a_{i-t} \ne a'_i \right] = p_i\) and \(c \in {\mathbb {F}}_2^n\) be a constant. Then, we have

$$\begin{aligned} \Pr \left[ (a \oplus c)_{i-t} \ne (a' \oplus c)_i \right] = {\left\{ \begin{array}{ll} 1-p_i, ~~~~c_{i-t} \oplus c_i = 1\\ p_i, ~~~~~~~~~c_{i-t} \oplus c_i = 0 \end{array}\right. } \end{aligned}$$

4.1 Propagation of R-DL Probabilities in Arithmetic Operations

For functions with AND-RX or LRX construction, such as the permutation Keccak-f, the propagation of the R-DL probability can be evaluated by the propositions previously shown, under the independency assumptions on the neighboring bits. However, when dependency takes over, even if a function can be expressed as a Boolean circuit, a direct application of the AND, XOR, NOT, and adjusted C-rule may lead to errors that accumulated during the iterated evaluation. One such example is the modular addition. In the following, we will derive the propagation rules of the differential-linear (DL) probability and R-DL probability for an n-bit modular addition.

Lemma 10

(carry-rule) Let \(\varsigma : {\mathbb {F}}_{2}^3\rightarrow {\mathbb {F}}_2\) be the carry function

$$\begin{aligned} \varsigma (x_0,x_1,x_2) = x_0x_1\oplus x_1x_2\oplus x_0x_2. \end{aligned}$$

Let a, b, c, \(a'\), \(b'\), and \(c'\) be binary random variables with

$$\begin{aligned} p_0 = \Pr \left[ a\ne a' \right] , p_1 = \Pr \left[ b\ne b' \right] , p_2 = \Pr \left[ c\ne c' \right] . \end{aligned}$$

Then, we have that

$$\begin{aligned} \Pr \left[ \varsigma (a,b,c)\ne \varsigma (a',b',c')\right] = p_0p_1p_2 - \frac{p_0p_1+p_0p_2+p_1p_2}{2}+ \frac{p_0+p_1+p_2}{2}. \end{aligned}$$

Proof

We prove the carry-rule with Observation 1 by enumerating \(u \in {\mathbb {F}}_2^3\). For \(u = (0,0,0)\), \(\Pr \left[ \varsigma (a,b,c)\ne \varsigma (a',b',c')|a=a',b=b',c=c' \right] = 0\). For \(u = (0,0,1)\), \(\Pr \left[ \varsigma (a,b,c)\ne \varsigma (a',b',c')|a=a',b=b',c \ne c' \right] = \Pr \left[ a \oplus b = 1 \right] = 1/2\) and \(\prod _{i=0}^{2}((1-u_i)+(-1)^{1-u_i}p_i) = (1-p_0)(1-p_1)p_2\).

Similarly, one can derive the expression for all \(u\in {\mathbb {F}}_{2^3}\), and we omit the details.The overall probability of the event \(ab\oplus ac\oplus bc \ne a'b'\oplus a'c'\oplus b'c'\) is \(p_0p_1p_2 - (p_0p_1+p_0p_2+p_1p_2)/2+(p_0+p_1+p_2)/2\).\(\square \)

Based on the carry-rule, we can immediately prove the following two theorems on the DL and R-DL probabilities for n-bit modulo additions.

Theorem 6

(\(\boxplus \)-rule for DL) Let xy and \(x',y'\) be n-bit string, such that \(\Pr \left[ x_i\ne x'_i \right] = p_i\) and \(\Pr \left[ y_i\ne y'_i \right] = q_i\). Then, the differential-linear probability for modular addition can be computed as

$$\begin{aligned} \Pr \left[ (x\boxplus y)_i \ne (x'\boxplus y')_i \right] = p_i + q_i + s_i - 2p_iq_i - 2p_is_i - 2q_is_i + 4p_iq_is_i \end{aligned}$$

where \(s_0 = 0\) and

$$\begin{aligned} \begin{aligned} s_{i+1} = p_{i}q_{i}s_{i} - \frac{p_{i}q_{i}+p_{i}s_{i}+q_{i}s_{i}}{2} + \frac{p_{i}+q_{i}+s_{i}}{2}, i\le n-1 \\ \end{aligned} \end{aligned}$$

Proof

For inputs x and y, denote the carry by

$$\begin{aligned} c = (x\boxplus y)\oplus x\oplus y= (c_{n-1},\cdots ,c_1,c_0), \end{aligned}$$

where \(c_0=0, c_{i+1} = x_{i}y_{i}\oplus x_{i}c_{i}\oplus y_{i}c_{i}\). Similarly, for \(x'\) and \(y'\), denote the carry by \(c'= (c'_{n-1},\cdots ,c'_1,c'_0)\). Let \(s_i\) denote the probability \(\Pr \left[ c_i\ne c'_i \right] \). Then, \(s_0 = 0\) and for \(i \ge 1\), the event \(c_i\ne c'_i\) is equivalent to

$$\begin{aligned} x_{i-1}y_{i-1}\oplus x_{i-1}c_{i-1}\oplus y_{i-1}c_{i-1} \ne x'_{i-1}y'_{i-1}\oplus x'_{i-1}c'_{i-1}\oplus y'_{i-1}c'_{i-1}. \end{aligned}$$

Therefore, \(s_i\) can be computed as

$$\begin{aligned} p_{i-1}q_{i-1}s_{i-1} - \left( p_{i-1}q_{i-1}+p_{i-1}q_{i-1}+q_{i-1}s_{i-1}\right) /2+\left( p_{i-1}+q_{i-1}+s_{i-1}\right) /2 \end{aligned}$$

according to Lemma 10. Since \(x\boxplus y = x \oplus y \oplus c\), and \(x'\boxplus y' = x' \oplus y' \oplus c'\), with the XOR-rule, we have

$$\begin{aligned} \Pr \left[ (x\boxplus y)_i \ne (x'\boxplus y')_i \right] = p_i + q_i + s_i - 2p_iq_i - 2p_is_i - 2q_is_i + 4p_iq_is_i. \end{aligned}$$

\(\square \)

Example 1

Consider an 8-bit modular addition with input difference being \(a = \texttt {7}\) and \(b = \texttt {7}\). Then, for \(0\le i \le 7\), we have

$$\begin{aligned} p_0 = p_1 = p_2 = 1, p_3 = p_4 = p_5 = p_6 = p_7 = 0,\\ q_0 = q_1 = q_2 = 1, q_3 = q_4 = q_5 = q_6 = q_7 = 0. \end{aligned}$$

The output DL-probabilities are given in Table 2 according to the \(\boxplus \)-rule. The probabilities predicted in the table are verified by running through the 16-bit input space. In addition, we verified the \(\boxplus \)-rule in DL with all input differences on an 8-bit modular addition. Under the precision level given in Table 2, the experiments match the theoretical prediction perfectly. In fact, we performed the experiments for all possible input difference \((a, b) \in {\mathbb {F}}_2^8 \times {\mathbb {F}}_2^8\), and the experimental results perfectly match the theoretical predictions. The source code for the experiments can be obtained at https://github.com/YunwenL/Rotational-cryptanalysis-from-a-differential-linear-perspective.

Table 2 DL-probabilities of an 8-bit modular addition with input differences \(a=b = 7\) by theoretical evaluation, which are confirmed by experiments

As for the rotational differential-linear cryptanalysis of an n-bit modular addition, a left rotation by t bits is applied to the operands. Firstly, we present the \(\boxplus \)-rule for RX-difference with a rotation offset \(t= 1\).

Theorem 7

(\(\boxplus \)-rule for RL, \(t=1\)) Given random n-bit strings xy and \(x',y'\) such that \(x' = (x\lll 1) \oplus a, y' = (y\lll 1) \oplus b\), where \(\Pr \left[ x_{i-1}\ne x'_i \right] = p_i, \Pr \left[ y_{i-1}\ne y'_i \right] = q_i\). Then, with the assumption

$$\begin{aligned} s_0 = \Pr \left[ c_{n-1}\ne c'_0 \right] = \Pr \left[ x_{n-2}y_{n-2}\oplus x_{n-2}c_{n-2}\oplus y_{n-2}c_{n-2}=0 \right] \approx 1/2, \end{aligned}$$

the rotational differential-linear probability of the modular addition can be computed as

$$\begin{aligned} \Pr \left[ (x\boxplus y)_{i-1} \ne (x'\boxplus y')_i \right] = p_i + q_i+ s_i - 2p_iq_i - 2p_is_i - 2q_is_i + 4p_iq_is_i, \end{aligned}$$

where \(s_1= \Pr \left[ c_0\ne c'_1 \right] = \Pr \left[ x'_0y'_0\ne 0 \right] = 1/4\),

$$\begin{aligned} s_{i+1} = p_{i}q_{i}s_{i} - \frac{p_{i}q_{i}+p_{i}s_{i}+q_{i}s_{i}}{2} + \frac{p_{i}+q_{i}+s_{i}}{2}, 2 \le i \le n-1. \end{aligned}$$

Proof

Denote \(x = (x_{n-1},\cdots ,x_1,x_0)\), \(y = (y_{n-1},\cdots ,y_1,y_0)\), and

$$\begin{aligned} x' = (x'_{n-1},\cdots ,x'_1,x'_0) = (x_{n-2}\oplus a_{n-1},\cdots ,x_0\oplus a_1,x_{n-1}\oplus a_0),\\y' = (y'_{n-1},\cdots ,y'_1,y'_0) = (y_{n-2}\oplus b_{n-1},\cdots ,y_0\oplus b_1,y_{n-1}\oplus b_0). \end{aligned}$$

Let \(c = (c_{n-1},\cdots ,c_0) = (x\boxplus y)\oplus x\oplus y\) and \(c' = (c'_{n-1},\cdots ,c'_0) = (x'\boxplus y')\oplus x'\oplus y'\) be the two carries, where \(c_0 = 0\), \(c_{i+1} = \varsigma (x_i, y_i, c_i) = x_iy_i \oplus y_ic_i \oplus x_ic_i\), \(c_0' = 0\), and \(c_{i+1}'= \varsigma (x_i', y_i', c_i') = x_i'y_i' \oplus y_i'c_i' \oplus x_i'c_i'\). Since

$$\begin{aligned} \Pr \left[ (x\boxplus y)_{i-1} \ne (x'\boxplus y')_i \right] = \Pr \left[ x_{i-1} \oplus y_{i-1} \oplus c_{i-1} \ne x_{i}' \oplus y_{i}' \oplus c_{i}' \right] , \end{aligned}$$

applying the XOR-rule given by Proposition 2 gives

$$\begin{aligned} \Pr \left[ (x\boxplus y)_{i-1} \ne (x'\boxplus y')_i \right] = p_i + q_i + s_i - 2p_iq_i - 2p_is_i - 2q_is_i + 4p_iq_is_i. \end{aligned}$$

Let \(s_i\) denote the probability \(\Pr \left[ c_{i-1}\ne c'_i \right] \). Then, \(s_1= \Pr \left[ c_0\ne c'_1 \right] = \Pr \left[ c_1' \ne 0 \right] = \Pr \left[ x'_0y'_0\ne 0 \right] = 1/4\). For \(i>1\), \(s_{i}\) is equal to

$$\begin{aligned} \begin{aligned} \Pr \left[ c_{i-1}\ne c'_i \right]&= \Pr \left[ x_{i-2}y_{i-2}\oplus x_{i-2}c_{i-2}\oplus y_{i-2}c_{i-2} \ne x'_{i-1}y'_{i-1}\oplus x'_{i-1}c'_{i-1}\oplus y'_{i-1}c'_{i-1}\right] \\&= p_{i-1}q_{i-1}s_{i-1} - \frac{p_{i-1}q_{i-1}+p_{i-1}s_{i-1}+q_{i-1}s_{i-1}}{2} + \frac{p_{i-1}+q_{i-1}+s_{i-1}}{2} \end{aligned} \end{aligned}$$

according to the carry-rule given by Lemma 10. \(\square \)

Example 2

Consider an 8-bit modular addition with input RX-difference (left rotate by 1-bit) being \(a = \texttt {7}\) and \(b = \texttt {7}\), which implies that

$$\begin{aligned} p_0 = p_1 = p_2 = 1, p_3 = p_4 = p_5 = p_6 = p_7 = 0,\\ q_0 = q_1 = q_2 = 1, q_3 = q_4 = q_5 = q_6 = q_7 = 0. \end{aligned}$$

The R-DL probability of the i-th output bit, \(0\le i < 8\) is given in Table 3. The probabilities predicted for \(i\ge 2\) are verified by running through the 16-bit input space, and the probability for \(i=0\) is \(2^{-1.01132}\) by experiment.

Table 3 RL-probabilities of an 8-bit modular addition with input differences \(a, b = 7\)

The experiments on an 8-bit modular addition show that the theoretical estimation of the DL and R-DL probabilities match the experiments well, except that the approximation in R-DL probability for the least significant bit has a marginal error in precision.

With a similar deduction, we give the following theorem for computing the R-DL probability through a modular addition under the condition that \({\texttt {rot}}(x) = x\lll t\), for an integer \(2\le t \le n-1\).

Theorem 8

(\(\boxplus \)-rule for RL for arbitrary \(t>1\)) Given random n-bit strings xy and \(x',y'\) such that \(x' = x\lll t \oplus a, y' = y\lll t \oplus b\), where \(\Pr \left[ x_{i-t}\ne x'_i \right] = p_i, \Pr \left[ y_{i-t}\ne y'_i \right] = q_i\). Then, with the assumption

$$\begin{aligned} {\left\{ \begin{array}{ll} s_0 = \Pr \left[ c_{n-t}\ne c'_0 \right] = \Pr \left[ x_{n-t-1}y_{n-t-1}\oplus x_{n-t-1}c_{n-t-1}\oplus y_{n-t-1}c_{n-t-1} \ne 0 \right] \approx 1/2\\ s_t = \Pr \left[ c_0 \ne c'_t \right] = \Pr \left[ x'_{t-1}y'_{t-1}\oplus x'_{t-1}c'_{t-1}\oplus y'_{t-1}c'_{t-1} \ne 0 \right] \approx 1/2 \end{array}\right. } \end{aligned}$$

the rotational differential-linear probability of the modular addition for \(i\ge 0\) can be computed as

$$\begin{aligned} \Pr \left[ (x\boxplus y)_{i-t} \ne (x'\boxplus y')_i \right] = p_i + q_i + s_i - 2p_iq_i - 2p_is_i - 2q_is_i + 4p_iq_is_i, \end{aligned}$$

where for \(1 \le i \le n-1\), and \(i\ne t\),

$$\begin{aligned} s_{i+1} = p_{i}q_{i}s_{i} - \frac{p_{i}q_{i}+p_{i}s_{i}+q_{i}s_{i}}{2} + \frac{p_{i}+q_{i}+s_{i}}{2}. \end{aligned}$$

Proof

Denote \(x = (x_{n-1},\cdots ,x_1,x_0)\), \(y = (y_{n-1},\cdots ,y_1,y_0)\), then

$$\begin{aligned} x' = ((x'_{n-1},\cdots ,x'_1,x'_0) = (x_{n-1-t}\oplus a_{n-1},\cdots ,x_{n-t+1}\oplus a_1,x_{n-t}\oplus a_0)\\y' = ((y'_{n-1},\cdots ,y'_1,y'_0) = (y_{n-1-t}\oplus b_{n-1},\cdots ,y_0\oplus b_1,y_{n-1}\oplus b_0). \end{aligned}$$

Let \(c = (c_{n-1},\cdots , c_1,c_0)\) and \(c'=(c'_{n-1},\cdots , c'_1,c'_0)\) be the carries. Let \(s_i\) denote the probability \(\Pr \left[ c_{i-t}\ne c'_i \right] \). According to the assumptions, \(s_0 = s_t = 1/2\). For all \(i \notin \{0,t \}\),

$$\begin{aligned} \begin{aligned} s_i&= \Pr \left[ c_{i-t}\ne c'_i \right] \\&= \Pr \left[ x'_{i-1}y'_{i-1}\oplus x'_{i-1}c'_{i-1}\oplus y'_{i-1}c'_{i-1} \right. \\&\left. \ne x_{n-t+i-1}y_{n-t+i-1}\oplus x_{n-t+i-1}c_{n-t+i-1} \oplus c_{n-t+i-1}y_{n-t+i-1}\right] \\&= p_{i-1}q_{i-1}s_{i-1} - \frac{p_{i-1}q_{i-1}+p_{i-1}s_{i-1}+q_{i-1}s_{i-1}}{2} + \frac{p_{i-1}+q_{i-1}+s_{i-1}}{2}. \end{aligned} \end{aligned}$$

Then, we have

$$\begin{aligned} \Pr \left[ (x\boxplus y)_{i-t} \ne (x'\boxplus y')_i \right] = p_i + q_i + s_i - 2p_iq_i - 2p_is_i - 2q_is_i + 4p_iq_is_i. \end{aligned}$$

\(\square \)

The \(\boxplus \)-rules for DL and R-DL allows us to compute the partial DLCT of an n-bit modular addition accurately and efficiently. A naive application of Bar-On et al.’s approach [25] based on the fast Fourier transformation (FFT) by treating the modular addition as a \(2n \times n\) S-box would require a complexity of \({\mathcal {O}}(2^{2n})\), where it requires a complexity of \({\mathcal {O}}(n 2^{2n})\) to obtain the n rows of the DLCT whose output masks are the unit vectors. In contrast, with the \(\boxplus \)-rule for DL, given the input difference, the DL-probability for all output masks that are unit vectors can be evaluated in \({\mathcal {O}}(n)\) operations, which achieves an exponential speed-up.

Remark 2

Theorems 7 and 8 have several restrictions. Firstly, they hold with some assumptions on certain carry bits and the independence of the neighboring bits, which may introduce inaccuracies into the evaluations of the correlations. Nevertheless, the theorems match the experimental results quite well. Secondly, these theorems only consider the cases where the output masks are unit vectors. We leave the problem of evaluating the correlations with arbitrary output masks and weakening or getting rid of the assumptions as future work.

4.2 Finding Input Differences for Local Optimization

According to Propositions 1 and 2, for x and y in \({\mathbb {F}}_2\), if \(\Pr \left[ x\ne x' \right] = p_1, \Pr \left[ y\ne y' \right] = p_2\), we have

$$\begin{aligned} \Pr \left[ xy\ne x'y' \right] = \frac{1}{2}(p_1+p_2-p_1p_2),\quad \Pr \left[ x\oplus y \ne x'\oplus y' \right] = p_1+p_2-2p_1p_2. \end{aligned}$$

Obviously, \(\Pr \left[ xy\ne x'y' \right] \) is in the interval [0, 0.5] and \(\Pr \left[ x\oplus y \ne x'\oplus y' \right] \) is in the interval [0, 1]. Moreover, a behavior of \(\Pr \left[ x\oplus y \ne x'\oplus y' \right] \) is that it collapses to \(\frac{1}{2}\) (e.g., correlation zero) whenever one of \(p_1\) and \(p_2\) is \(\frac{1}{2}\). This observation suggests that the input probabilities should be biased from \(\frac{1}{2}\) as much as possible. Otherwise, the probabilities will rapidly collapse to \(\frac{1}{2}\) for all one-bit output masks after a few iterative evaluations of the round function.

In order to find distinguishers that cover as many rounds of a function F as possible, our strategy is to look for an input RX-difference \(\delta \), such that the DL or R-DL probability after one or a few propagations still has a relatively large imbalance for all the output masks whose Hamming weights are one. Therefore, we can define the objective function to maximize the summation of the absolute biases:

$$\begin{aligned} \sum _{i}\left( |\Pr \left[ e_i\cdot ({\texttt {rot}}(f(x))\oplus f({\texttt {rot}}(x)\oplus \delta )) = 0 \right] - 1/2|\right) . \end{aligned}$$
(20)

For 8-bit modular additions, we observed that the absolute DL and R-DL bias are relatively large when the input RX-differences are either with a large Hamming weight or a small weight. For instance, with RX-difference \((x\lll 1)\oplus x'\), when the input differences are \(a = \texttt {0}\) and \(b = \texttt {1}\), the RL-probabilities are given as follows for \(e_i\) with \(i=0,1,\dots ,7\).

$$\begin{aligned} 2^{-1}, 2^{-2}, 2^{-3}, 2^{-4}, 2^{-5}, 2^{-6}, 2^{-7}, 2^{-8}. \end{aligned}$$

Whereas for \(a = \texttt {ff}\) and \(b = \texttt {ff}\), the RL-probabilities are given as follows for \(e_i, i=0,1,\dots ,7.\)

$$\begin{aligned} 2^{-1}, 2^{-2}, 2^{-0.678072}, 2^{-0.29956}, 2^{-0.142019}, 2^{-0.0692627}, 2^{-0.0342157}, 2^{-0.0170064}. \end{aligned}$$

When the size of the operands are large (e.g., \(n = 32\)), it is difficult to find the optimal input difference manually. Next, we show the optimal input RX-difference with respect to the objective function given by Eq. (20) in a 32-bit modular addition. See Appendix A for the search of such differences.

Example 3

Consider the R-DL probability for a 32-bit modular addition with \({\texttt {rot}}(x) = x\lll 1\). With input RX-differences

$$\begin{aligned} a = \texttt {7ffffffc}, b = \texttt {7ffffffe}, \end{aligned}$$

the objective function in Eq. 20 is maximized, and the R-DL probabilities \(\Pr \big [e_i\cdot ({\texttt {rot}}(x\boxplus y) \oplus (({\texttt {rot}}(x)\oplus a)\boxplus ({\texttt {rot}}(y)\oplus b)))=1 \big ]\) for \(0\le i \le 31\) are shown as follows.

i

0

1

2

3

4

5

6

7

\(p_i\)

0.5

0.75

0.5

0.75

0.875

0.9375

0.96875

0.984375

i

8

9

10

11

12

13

14

15

\(p_i\)

0.992188

0.996094

0.998047

0.999023

0.999512

0.999756

0.999878

0.999939

i

16

17

18

19

20 – 31

   

\(~~p_i~~\)

0.999969

0.999985

0.999992

0.999996

1

   

5 Applications to AND-RX Primitives

In this section, we apply the rotational differential-linear technique to the AND-RX permutations involved in FRIET and Xoodoo, and significant improvements are obtained. To confirm the validity of the results, all distinguishers with practical complexities are experimentally verified, and the source code is available.Footnote 2

5.1 Distinguishers for Round-Reduced FRIET

FRIET is an authenticated encryption scheme with built-in fault detection mechanisms proposed by Simon et al. at EUROCRYPT 2020 [29]. FRIET is a permutation-based design, and in this work, we only analyze its underlying permutation FRIET-P, which is an iterative design with 24 rounds.

The core permutation FRIET-P employed in FRIET operates on a \(4 \times 128 = 512\)-bit state arranged into a rectangular with 4 rows (called limbs) and 128 columns (called slices) as shown in Fig. 2. The permutation FRIET-P is an iterative design with its round function \(g_{rc_i}\) visualized in Fig. 3, where a, b, and \(c \in {\mathbb {F}}_2^{128}\) are the four limbs (see Fig. 2) of the input state and \(rc_i\) is the round constant for the i-th round.

Fig. 2
figure 2

View of the state

By design, the round function \(g_{rc_i}\) is slice-wise code-abiding for the parity code \([4, 3, 2]_{{\mathbb {F}}_2}\), meaning that every slice of the output state is a code word if every slice of the input state is a code word. Mathematically, it means that \(a + b + c = d\) implies \(a'+b'+c' = d'\). This slice-wise code-abiding property is inherited by the permutation FRIET-P \(= g_{rc_{t-1}} \circ \cdots \circ g_{rc_1} \circ g_{rc_0}\). Consequently, faults will be detected if some output slice is not a code word when all of the slices of the input state are code words. Note that the behavior of the permutation FRIET-PC is identical to FRIET-P by design if we ignore the limb d.

Fig. 3
figure 3

Round functions of Friet-PC and Friet-P

5.1.1 Practical Distinguishers for FRIET-PC

Since a distinguisher for the permutation FRIET-PC directly translates to a distinguisher for FRIET-P, we focus on the permutation FRIET-PC. Let (abc) and \((a',b',c')\) in \({\mathbb {F}}_2^{128 \times 3}\) be the input pair of the permutation with RX-differences

$$\begin{aligned} \varDelta _a= (a\lll t) \oplus a', ~~\varDelta _b = (b\lll t) \oplus b' ,~~\varDelta _c = (c\lll t) \oplus c' . \end{aligned}$$

In our analysis, we only consider input RX-differences such that \(wt(\varDelta _a)+wt(\varDelta _b)+wt(\varDelta _c) \le 1\).

According to the adjusted C-rule (see Proposition 4), the constant addition injects an RX-difference \(c\oplus (c\lll t)\) to the state, and alters the R-DL-probabilities when the corresponding bits in \(c\oplus (c\lll t)\) is nonzero. A rule-of-thumb for choosing the rotational amount is to minimize the weight of the RX-difference introduced by the round constants, so that the effect of the constants on destroying the rotational propagation is presumably decreased. The first six round constants of FRIET-PC are (in Hexadecimal)

$$\begin{aligned} \texttt {1111}, \texttt {11100000},\texttt {1101},\texttt {10100000},\texttt {101},\texttt {10110000}. \end{aligned}$$

To minimize the Hamming weight of the RX-differences from the round constants, one of the best rotational operations is to rotate left by 4 bits, such that the consecutive nonzero nibbles cancel themselves as many as possible. Then, the injected RX-differences due to the round constants are

$$\begin{aligned} \texttt {10001}, \texttt {100100000},\texttt {10111},\texttt {111100000},\texttt {1111},\texttt {111010000}. \end{aligned}$$

With the AND-rule, XOR-rule and adjusted C-rule, the R-DL probability can be evaluated given the input RX-differences with \(w_h(\varDelta _a)+w_h(\varDelta _b)+w_h(\varDelta _c) \le 1\) and the output linear mask \(e_i\). Table 4 shows the rotational differential-linear distinguishers with the largest absolute correlation we found in reduced-round FRIET-PC, where \(\varDelta _a,\varDelta _b,\varDelta _c\) are the input RX-differences, and \(\gamma _a,\gamma _b,\gamma _c\) are the output masks for the limbs abc, respectively.

Table 4 Distinguishers for reduced-round FRIET-PC with rotation offset \(t = 4\)

For FRIET-PC reduced to 4-round, an R-DL distinguisher with correlation 1 is detected, with input RX-differences (0, 0, 0) and output masks (0, 1, 0). For 5-, 6-round FRIET-PC, we found practical rotational differential-linear distinguishers with correlation \(2^{-0.96}\) and \(2^{-5.81}\), respectively. All the distinguishers shown in Table 4 are verified experimentally with \(2^{24}\) random plaintexts.

5.1.2 Extending the Practical Distinguishers

According to the discussion of Sect. 3, we can extend a rotational differential-linear distinguisher by appending a linear approximation \(\gamma \rightarrow \mu \), and the bias of the extended distinguisher can be computed with Eq. (7). Consequently, this extension is optimal when \(\epsilon _{\gamma ,\mu }\) and \(\epsilon _{{\texttt {rot}}^{-1}(\gamma ),{\texttt {rot}}^{-1}(\mu )}\) reach their largest possible absolute values simultaneously. For FRIET-PC, we always have \(\epsilon _{\gamma ,\mu } = \epsilon _{{\texttt {rot}}^{-1}(\gamma ),{\texttt {rot}}^{-1}(\mu )}\), and thus we can focus on finding an optimal linear approximation \(\gamma \rightarrow \mu \).

Here, we take the 6-round R-DL distinguisher shown in Table 4 and append optimal linear approximations to extend it. The output linear mask of the 6-round distinguisher is \((\texttt{0},\texttt{0},\texttt{40000})\). In Table 5, we list the correlations of the optimal linear approximations for round-reduced FRIET-PC whose input masks are \((\texttt{0},\texttt{0},\texttt{40000})\), which are found with the SMT-based approach [36].

Table 5 Correlation of optimal linear trails found in round-reduced FRIET-PC with the input masks \((\texttt{0},\texttt{0},\texttt{40000})\)

The optimal 1-round linear trail we found has output masks

$$\begin{aligned}\begin{aligned}&\mu _a = \texttt{00000000000000020000000000040000}\\&\mu _b = \texttt{00004000000000020000000000040000}\\&\mu _c = \texttt{00000000000080020000000000060000}. \end{aligned}\end{aligned}$$

Thus, a 7-round distinguisher can be built by concatenating the 6-round distinguisher with a 1-round linear approximation, and the estimated correlation is \(2^{-5.81} \times 2^{-2\times 2} = 2^{-9.81}\). In our experiments, with \(2^{24}\) randomly chosen pairs of inputs satisfying the input RX-difference, the output difference under the specified mask are biased with a correlation approximately \(2^{-9.12}\). Similarly, by appending a 2-round linear trail with output masks

$$\begin{aligned}\begin{aligned}&\mu _a = \texttt{00000000000000030000000000060000}\\&\mu _b = \texttt{00006000000000010000000030020000}\\&\mu _c = \texttt{600000000000c0010000000000030000}. \end{aligned}\end{aligned}$$

at the end of the 6-round rotational differential-linear distinguisher, we get a 8-round RL-distinguisher with a correlation \(2^{-17.81}\). And with \(2^{40}\) pairs of inputs satisfying the input RX-difference, we find the experimental correlation of the 8-round distinguisher is \(2^{-17.2}\). As a comparison, the 7-, 8-round linear trails presented in the specification of FRIET-PC have correlation \(2^{-29}\) and \(2^{-40}\), respectively. With the linear trails shown in Table 5, the concatenated distinguisher can reach up to 13 rounds, with an estimated correlation \(2^{-117.81}\).

5.2 Distinguishers for Round-Reduced Xoodoo

Xoodoo  [30] is a 384-bit lightweight cryptographic permutation whose primary target application is in the Farfalle construction [37]. The state of Xoodoo is arranged into a \(4\times 3\times 32\) cuboid and the bit at a specific position is accessed as a[x][y][z]. One round of Xoodoo consists of the following operations.

$$\begin{aligned}{} & {} a[x][y][z] = a[x][y][z] \oplus \sum _y a[x-1][y][z-5] \oplus \sum _y a[x-1][y][z-14]\\{} & {} a[x][1][z] = a[x-1][1][z], a[x][2][z] = a[x][2][z-11]\\{} & {} a[0][0] = a[0][0] \oplus RC_i\\{} & {} a[x][y][z] = a[x][y][z] \oplus ((a[x][y+1][z]+1)*(a[x][y+2][z]))\\{} & {} a[x][1][z] = a[x][1][z-1], a[x][2][z] = a[x-1][2][z-8] \end{aligned}$$

The total number of rounds in Xoodoo is 12, and in some modes (Farfalle [37] for instance), the core permutation calls a 6-round Xoodoo permutation. The round constants of Xoodoo are shown in the following, and for Xoodoo reduced to r rounds, the round constants are \(c_{-(r-1)}, \cdots , c_0\).

$$\begin{aligned} \begin{aligned} c_{-11}&= \texttt {00000058} ,\quad c_{-8} = \texttt {000000D0} ,\quad c_{-5} = \texttt {00000060} ,\quad c_{-2} =\texttt {000000F0} \\ c_{-10}&=\texttt {00000038} ,\quad c_{-7} =\texttt {00000120} ,\quad c_{-4} =\texttt {0000002C} ,\quad c_{-1} =\texttt {000001A0}\\ c_{-9}&=\texttt {000003C0} ,\quad c_{-6} =\texttt {00000014} ,\quad c_{-3} =\texttt {00000380} ,~~\quad c_{0} =\texttt {00000012} \end{aligned} \end{aligned}$$

Given input difference being all-zero, i.e., the input pair is exactly a rotational pair, let the rotation amount be left-rotate by 1-bit. We find that after 3 rounds of Xoodoo, there are still many output bits that are highly biased, with the largest correlation being 1 and the one-bit mask at position (1, 0, 16). This suggests a nonzero mask \(\texttt {10000}\) at the lane (1, 0). However, extending one extra round, we no longer see any significant correlation.

Noticing that the round constant is XORed into the state right after the first two linear operations, one can control the input RX-difference such that the difference is cancelled by the injection of the first-round constant. As a result, it gains one round free at the beginning, and we are able to construct a 4-round distinguishers for Xoodoo. When the left-rotational amount is set to 1-bit, the RX-difference of the first constant \(c_{-3}\) is \(\texttt{00000480}\). This suggests that if we take input RX-differences

$$\begin{aligned}\begin{aligned} a[0][0] = \texttt{484ccc80}; a[0][1] = \texttt{484cc800}; a[0][2] = \texttt{484cc800};\\ a[1][0] = \texttt{3ab9821a}; a[1][1] = \texttt{3ab9821a}; a[1][2] = \texttt{3ab9821a};\\ a[2][0] = \texttt{37b6cde9}; a[2][1] = \texttt{37b6cde9}; a[2][2] = \texttt{37b6cde9};\\ a[3][0] = \texttt{45a3f0cb}; a[3][1] = \texttt{45a3f0cb}; a[3][2] = \texttt{45a3f0cb}. \end{aligned}\end{aligned}$$

The RX-difference after the first round of Xoodoo will be all zero. Hence, we are able to find a 4-round distinguishers with significant correlations. We find a rotational differential-linear distinguishers with correlation 1 with the output mask being 10000 at lane (1, 0) and zero for the rest lanes. Another two distinguishers with the same correlation are found with output mask 20000 at lane (1, 1) and 1000000 at lane (3, 2).

6 Applications to ARX Primitives

In this section, we apply the rotational differential-linear technique to the ARX permutations involved in Alzette and SipHash, and the source code for experimental verifications is available.Footnote 3

6.1 Application in the 64-Bit ARX-Box Alzette

At CRYPTO 2020, Beierle et al. presented a 64-bit ARX-box Alzette [31] that is efficient for software implementation. The design is along the same research line with a previous design called SPARX [38] with a 32-bit ARX-box where a long trail argument was proposed for deriving a security bound in ARX ciphers. Figure 4 shows an instance of Alzette with an input \((x,y)\in {\mathbb {F}}_2^{32}\times {\mathbb {F}}_2^{32}\).

Fig. 4
figure 4

A 4-round Alzette instance

The differential and linear properties of the 4-round Alzette instance shown in Fig. 4 are comparable to the 8-bit S-box of AES. The optimal differential characteristic in Alzette has a probability of \(2^{-6}\). In addition, because of the modular additions in Alzette and the diffusion, the designers showed by division property that the Alzette may have full degree in all its coordinates.

In the following, we present the rotational differential-linear and differential-linear distinguishers of Alzette found with the techniques in Sect. 4. The constant \(c = \texttt {B7E15162}\) (the first constant in SPARX-based design Sparkle-128) is considered for illustration.

Rotational differential-linear distinguisher. In Sect. 4.2, \((\texttt {7ffffffc},\texttt {7fffffffe})\) is found to be optimal in 32-bit modular addition under the objective function considered in Example 3. Here, the difference can be used as the input difference of the first modular addition in Alzette. Because of the right rotation by 31 bits before the modular addition, the input RX-difference to Alzette is \((\texttt {7ffffffc},\texttt {3ffffffff})\).Footnote 4 With an iterative evaluation on the steps in Alzette, we found that the second least significant bit is biased. Specifically, with an output mask \((\texttt {2},\texttt {0})\), the RL-probability is 0.500189, that is a correlation \(2^{-11.37}\). By taking \(2^{28}\) pairs of random plaintexts, the experimental correlation of the distinguisher is \(2^{-7.35}\). In addition, we checked all input RX-differences (ab) with Hamming weight \(wt(a) + wt(b) = 1\), but no rotational differential-linear distinguisher is found.

Differential-linear distinguisher. For all input differences with Hamming weight 1, we compute the differential-linear probability of Alzette with the technique in Sect. 4. The best found distinguisher has an input difference \((\texttt {80000000},\texttt {0})\) and output mask \((\texttt {80000000},\texttt {0})\), with a probability of 0.086, equivalently, a correlation of \(2^{-0.27}\). By experiment verification with \(2^{28}\) pairs of random plaintexts, the correlation is \(2^{-0.1}\).

The following Fig. 5 shows a comparison of the probability for an input difference \((\texttt {80000000},\texttt {0})\) and output masks \((\texttt {1}\lll t,\texttt {0})\) (for all integer \(t\in [0,31]\)), by our evaluation technique and the experiment with \(2^{24}\) pairs of random plaintexts. The theoretical evaluation matches the experiment within a tolerable fluctuation.

Fig. 5
figure 5

A comparison between the differential-linear probability in Alzette by theoretical computation and by experiment. The index shows the index of the nonzero bit in the unit-vector output mask. For instance, when the index is 0, the output mask is (0,1), and when the index is 63, it is (80000000,0)

Comparing with RL-distinguishers and DL-distinguisher found in Alzette, the latter is significantly stronger. Also, it is interesting to notice that input differences with low Hamming weight often lead to good differential-linear distinguishers in Alzette, whereas we did not find any rotational differential-linear distinguisher with low-weight RX-differences when the rotational offset is greater than zero. The influence of the constants in RL-distinguishers may be the main cause.

6.2 Experimental Distinguishers for SipHash Explained

SipHash [39], designed by Aumasson and Bernstein, is a family of ARX-based pseudorandom functions optimized for short inputs. Instances of SipHash are widely deployed in practice. For example, SipHash-2-4 is used in the dnscache instances of all OpenDNS resolvers and employed as hash() in Python for all major platforms (https://131002.net/siphash/#us).

In [40], from a perspective of differential cryptanalysis, a bias of the difference distribution of one particular output bit for 3-round SipHash is observed when the Hamming weight of the input difference is one. For instance, with input difference \(a = 1\), He and Yu showed that the output difference is biased at the 27-th bit with a correlation \(2^{-6}\) by experiments. This observation was obtained through extensive experiments, and the theoretical reason behind these distinguishers is unclear as stated by He and Yu:

... we are not concerned about why it shows a rotation property or why it reaches such a bias level. However, a great number of experiments can support those observations. (see [40, Section 4.2, Page 11])

According to the discussion of Sect. 3.3, the bias of \( E(x) \oplus E(x \oplus \delta ) \) observed in [40] is equivalent to the bias of

$$\begin{aligned} e_i\cdot (E(x) \oplus E(x \oplus \delta )). \end{aligned}$$

It can be interpreted in the differential-linear framework and analyzed with the theoretical approach presented in Sect. 4. Here, we apply the rules for modular addition and XOR, and compute the DL-probability of the 3-round distinguisher found in SipHash. With our technique, we confirm that the 3-round differential-linear distinguisher with the aforementioned difference and mask, the predicted correlation is \(2^{-6.6}\) which is close to He and Yu’s experiments.

In addition, we can explain the observation on the rotation property with the \(\boxplus \)-rule in differential-linear. We will adopt the notations that are used in Theorem 6.

Because the input difference in their experiment has only one nonzero bit, we consider the DL-probability of an n-bit modular addition where the input difference is \((e_k, 0)\), for an integer k.

Then, for a pair of inputs (xy) and \((x',y')\), the probability \(p_k = \Pr \left[ x_k\ne x'_k \right] = 1\). And for the remaining bits, \(p_i = \Pr \left[ x_i\ne x'_i \right] , i\ne k\) and \(q_i = \Pr \left[ y_i,y'_i \right] \) are equal to zero.

Let \(s_i = \Pr \left[ \varsigma (x,y)_i \ne \varsigma (x',y')\right] .\) We have \(s_0,\cdots ,s_{k} = 0, s_{k+t} = 2^{-t}, 1\le t \le n-1-k\). As a result, the DL-probabilities through the modular addition at the i-th bit is given by \(P_i = \Pr \left[ (x\boxplus y)_i \ne (x'\boxplus y')_i \right] , 0\le i \le n-1\), where

$$\begin{aligned} \Pr \left[ (x\boxplus y)_i \ne (x'\boxplus y')_i \right] = {\left\{ \begin{array}{ll} 0, \quad \quad i\le k\\ 2^{-i+k}, \quad \quad \hbox {otherwise} \end{array}\right. } \end{aligned}$$
(21)

By rotating the input difference \((\texttt {1}\lll k, 0)\) to the left by one bit, the differential-linear probability for the i-th bit of the output \(\overleftarrow{P_i}\) is equal to \(2^{-i+k+1}\) for \(k+1 < i \le n-1\), and to zero for \(i\le k+1\).

It is obvious that the by rotating the differential-linear probability in Eq. (21), we obtain the probabilities \(\overleftarrow{P_i}\) for all but the least significant bit, where \(\overleftarrow{P_0} = 0\) and \(P_{n-1} = 2^{-n-1+k}\). Nevertheless, the error is negligible if \(n-k\) is large, and it holds for large modular additions such as the 64-bit one adopted in SipHash.

For input differences with Hamming weight more than 1, a similar rotational property can be observed for the \(\boxplus \)-rule in differential-linear. And it gives a straightforward intuition on the rotational property observed in the differential-linear distinguishers of SipHash.

7 Conclusion and Future Work

We extend the differential-linear framework by using rotational-XOR differentials in the differential part of the framework, and we name the resulting cryptanalytic technique as rotational differential-linear cryptanalysis. We derive a closed formula for the bias of a rotational differential-linear distinguisher under the sole assumption of the independence between the rotational-XOR differential part and linear part. Moreover, we show that Morawiecki et al.’s technique can be generalized to estimate the bias of a rotational differential-linear distinguisher whose output linear mask is a unit vector. We apply our method to the permutations involved in FRIET, Xoodoo, Alzette, and SipHash, which leads to significant improvements over existing cryptanalytic results or explanations for previous experimental distinguishers without a theoretical foundation.

Finally, we make an initial attempt to apply the rotational differential-linear technique to keyed primitives and S-box-based designs, and discuss the difficulties we encountered along the way. This sections serves to motivate further researches in this direction.

Keyed Primitives. The most fundamental discrepancy between ordinary differential and rotational differential cryptanalysis lies in the effect of secret key additions on the intermediate differences of the (rotational) differential trails.

In ordinary differential cryptanalysis, let \((x, x') \in {\mathbb {F}}_2^n\) be a pair of data and \(k \in {\mathbb {F}}_2^n\) be a secret key. The key addition operation has no effect on the difference of x and \(x'\) since \((x \oplus k) \oplus (x' \oplus k) = x \oplus x'\). In rotational differential cryptanalysis, the situation is completely different. Let \((x, x')\) be a pair of data with \(x' = (x \lll t) \oplus \delta \). Then, the rotational difference of the pair obtained by performing the key addition operation is

$$\begin{aligned} ((x \oplus k) \lll t) \oplus (x' \oplus k) = \delta \oplus ( k \oplus (k \lll t) ), \end{aligned}$$

which is key-dependent (this fact is also reflected in Proposition 4). This brings some difficulties in searching for good rotational differential trails. One way to overcome this issue is to impose some constraints on the key values such that \(k \oplus (k \lll t)\) is somewhat predictable. For example, we may require \(k \oplus (k \lll t)\) to be some constant. This approach leads to weak key attacks. Therefore, we conclude that in general it is difficult to apply rotational differential-linear cryptanalysis to keyed primitives.

S-box-based Permutations. Due to the rotational property of modular addition and bitwise operations, rotational cryptanalysis finds successful applications in ARX or AND-RX ciphers, whereas S-box-based designs are less studied and intuitively techniques employing RX differences would not be quite effective against S-box-based designs due to the lack of strong rotational properties of general S-boxes. Nevertheless, we still try to apply this technique to certain S-box-based permutations to gain some concrete understandings. For S-box-based designs, one could consider different types of rotations. Assuming that the S-box layer consists t parallel applications of an \(s \times s\) S-box, we can have different rotations illustrated in Fig. 6.

Fig. 6
figure 6

Main types of rotations in S-box-based permutations

Previous studies focus on the rotation of S-boxes and the rotational invariance through the rounds (e.g., [41]). More generally, if one uses a linear function instead of a rotation, attacks that explore self-similarities in LS-designs have been proposed, see for example the invariant permutation attacks [2]. The rotation shown in Fig. 6b on LS-design can be regarded as a special type of permutation on the state, and it can be of interest if such a rotation commutes with the linear layer.

In this section, we aim at finding rotational properties within the S-boxes, namely given an S-box \(S:{\mathbb {F}}_{2}^s\rightarrow {\mathbb {F}}_{2}^s\), find differences \(a,b\in {\mathbb {F}}_{2^s}\), such that

$$\begin{aligned} S(\overleftarrow{x}\oplus a) \oplus \overleftarrow{S(x)} = b \end{aligned}$$

holds with a high probability. Analogous to the DDT of an S-box, we can define the rotational difference distribution table.

Definition 9

(rDDT table) Given an s-bit S-box \(S:{\mathbb {F}}_{2}^s\rightarrow {\mathbb {F}}_{2}^s\), we define the rotational difference distribution table (rDDT) as a \(2^s\times 2^s\) table \({\mathbb {T}}\), such that

$$\begin{aligned}{\mathbb {T}}[\alpha ][\beta ] = \# \{x\in {\mathbb {F}}_{2}^s|S(\overleftarrow{x}\oplus \alpha ) \oplus \overleftarrow{S(x)} = \beta \}.\end{aligned}$$

In [42], Leander and Poschmann show that up to affine equivalence, there are only 16 different optimal S-boxes with respect to linear and differential cryptanalyses. Table 6 shows the maximal entries in the rDDT tables for the 16 optimal 4-bit S-box presented in [42], see Appendix C for details, where t is the rotation offset of the underlying rotational difference.

It is interesting to observe that the differentially 4-uniform S-boxes permit RX-difference transitions with a higher probability than ordinary differentials.

Table 6 Maximum entries of the rDDTs for 16 optimal 4-bit S-boxes

Some S-boxes have a rotational property by design. For example, the four 8-bit S-boxes of Midori-128 [43] are constructed with the 4-bit S-box

$$\begin{aligned} \textsf {Sb}_1 = [ \texttt {0x1}, \texttt {0x0}, \texttt {0x5}, \texttt {0x3}, \texttt {0xe}, \texttt {0x2}, \texttt {0xf}, \texttt {0x7}, \texttt {0xd}, \texttt {0xa}, \texttt {0x9}, \texttt {0xb}, \texttt {0xc}, \texttt {0x8}, \texttt {0x4}, \texttt {0x6}], \end{aligned}$$

whose internal structure is depicted in Fig. 7.

Fig. 7
figure 7

Four S-boxes of Midori-128 [43]

Note that the swap of the two 4-bit inputs to \(\textsf {SSb}_i\) leads to the swap of the output nibbles. Namely, \(\textsf {SSb}_i(a_L||a_R) = (b_L||b_R)\) implies \(\textsf {SSb}_i(a_R||a_L) = (b_R||b_L)\).

In a different notation: \(\textsf {SSb}_i(x\lll 4) = (\textsf {SSb}_i(x))\lll 4\). The reason behind this property is that the two layers of bit permutation in \(\textsf {SSb}_i\) are the inverse of each other.

Consider the difference propagation. Assume the input pair of values to the S-box being \((x_1||x_0)\) and \(((x_0 \oplus \delta _L) || (x_1 \oplus \delta _R))\). Then, the probability for the output pair being \((y_1 || y_0)\) and \(((y_1\oplus d_L)||(y_0 \oplus d_R))\) is the same as normal difference propagation from \((\delta _L||\delta _R)\) to \((d_R||d_L)\) through the S-box. In other words, the output difference (in ordinary difference definition) is rotated/swapped.

$$\begin{aligned} \Pr ((\delta _L||\delta _R) \xrightarrow {RX} (d_L||d_R)) = \Pr ((\delta _L||\delta _R) \rightarrow (d_R||d_L)). \end{aligned}$$

As we have already show, \(\textsf {SSb}_i(x\lll 4) = (\textsf {SSb}_i(x))\lll 4\). The ShuffleCell operation merely moves the cells around, so the rotation on each cell is preserved. For the MixColumn operation,

$$\begin{aligned} \begin{pmatrix} 0 &{} 1 &{} 1 &{} 1\\ 1 &{} 0 &{} 1 &{} 1\\ 1 &{} 1 &{} 0 &{} 1\\ 1 &{} 1 &{} 1 &{} 0 \end{pmatrix}\begin{pmatrix} x_3\\ x_2\\ x_1\\ x_0 \end{pmatrix} = \begin{pmatrix} y_3\\ y_2\\ y_1\\ y_0 \end{pmatrix},~~\textrm{and}~~ \begin{pmatrix} 0 &{} 1 &{} 1 &{} 1\\ 1 &{} 0 &{} 1 &{} 1\\ 1 &{} 1 &{} 0 &{} 1\\ 1 &{} 1 &{} 1 &{} 0 \end{pmatrix}\begin{pmatrix} x_3\lll 4\\ x_2\lll 4\\ x_1\lll 4\\ x_0\lll 4 \end{pmatrix} = \begin{pmatrix} y_3\lll 4\\ y_2\lll 4\\ y_1\lll 4\\ y_0\lll 4 \end{pmatrix} \end{aligned}$$
(22)

The round key of Midori-128 is the same for every round, so if the round key has a rotation property on each cell, the rotational property will pass though key addition as well. And it leaves us with the constant addition.

Constant addition For Midori-128, the constants are 0 or 1 for each cell. When the constant is 0, it has no effect on the propagation of rotational property. When the constant is 1,

$$\begin{aligned} (d_L||d_R) = (a_L||a_R)\oplus (0000||0001), (d'_L||d'_R) = (a_R||a_L) \oplus (0000||0001) \end{aligned}$$

therefore, \((d'_L||d'_R) = ((d_L||d_R)\lll 4) \oplus (0001||0001)\). The first round constant of Midori-128 is

$$\begin{aligned} \begin{pmatrix} 0&{} 0&{} 1&{} 0\\ 0&{} 1&{} 0&{} 0\\ 0&{} 0&{} 1&{} 1\\ 1&{} 1&{} 1&{} 1 \end{pmatrix}. \end{aligned}$$

And it injects an RX-difference

$$\begin{aligned} \begin{pmatrix} 0&{} 0&{} 11&{} 0\\ 0&{} 11&{} 0&{} 0\\ 0&{} 0&{} 11&{} 11\\ 11&{} 11&{} 11&{} 11 \end{pmatrix} \end{aligned}$$

into the state.

In the following, we give an example on 2-round rotational differential in Midori-128 (without the second SR and MC).

$$\begin{aligned} \begin{aligned}&\begin{pmatrix} \texttt {00}&{} \texttt {02}&{} \texttt {5a}&{} \texttt {00}\\ \texttt {20}&{} \texttt {00}&{} \texttt {00}&{} \texttt {00}\\ \texttt {78}&{} \texttt {00}&{} \texttt {00}&{} \texttt {00}\\ \texttt {99}&{} \texttt {01}&{} \texttt {00}&{} \texttt {00} \end{pmatrix}\overrightarrow{SB} \begin{pmatrix} \texttt {00}&{} \texttt {01}&{} \texttt {05}&{} \texttt {00}\\ \texttt {01}&{} \texttt {00}&{} \texttt {00}&{} \texttt {00}\\ \texttt {05}&{} \texttt {00}&{} \texttt {00}&{} \texttt {00}\\ \texttt {11}&{} \texttt {14}&{} \texttt {00}&{} \texttt {00} \end{pmatrix}\overrightarrow{SR} \begin{pmatrix} \texttt {00}&{} \texttt {00}&{} \texttt {00}&{} \texttt {14}\\ \texttt {00}&{} \texttt {01}&{} \texttt {11}&{} \texttt {00}\\ \texttt {00}&{} \texttt {00}&{} \texttt {00}&{} \texttt {05}\\ \texttt {00}&{} \texttt {01}&{} \texttt {00}&{} \texttt {05} \end{pmatrix}\\ \overrightarrow{MC}&\begin{pmatrix} \texttt {00}&{} \texttt {00}&{} \texttt {11}&{} \texttt {00}\\ \texttt {00}&{} \texttt {01}&{} \texttt {00}&{} \texttt {14}\\ \texttt {00}&{} \texttt {00}&{} \texttt {11}&{} \texttt {11}\\ \texttt {00}&{} \texttt {01}&{} \texttt {11}&{} \texttt {11} \end{pmatrix} \overrightarrow{ARC} \begin{pmatrix} \texttt {00}&{} \texttt {00}&{} \texttt {00}&{} \texttt {00}\\ \texttt {00}&{} \texttt {10}&{} \texttt {00}&{} \texttt {14}\\ \texttt {00}&{} \texttt {00}&{} \texttt {00}&{} \texttt {00}\\ \texttt {11}&{} \texttt {10}&{} \texttt {00}&{} \texttt {00} \end{pmatrix}\overrightarrow{SB} \begin{pmatrix} \texttt {00}&{} \texttt {00}&{} \texttt {00}&{} \texttt {00}\\ \texttt {00}&{} \texttt {80}&{} \texttt {00}&{} \texttt {40}\\ \texttt {00}&{} \texttt {00}&{} \texttt {00}&{} \texttt {00}\\ \texttt {99}&{} \texttt {09}&{} \texttt {00}&{} \texttt {00} \end{pmatrix} \end{aligned} \end{aligned}$$

The probability of the RX-differential is \(2^{-24}\). Note that it is possible to choose the round key difference such that it cancels out the RX-difference injected by the first round constant, and this can give an RX-differential characteristic with probability up to one in this case, and it is indeed better than an optimal 2-round differential characteristic. However, as discussed above, such a gain in the early rounds may not be preserved for more rounds, because of the heavy RX-differences injected by the round constants in each round, and the trivial key schedule makes it difficult to cancel out using a fixed RX-difference in the round keys. Therefore, comparing with the optimal differential characteristics, the RX-characteristic is generally weaker when the number of rounds covered by the trail is large. For S-box-based primitives with a nontrivial key schedule, we expect it to be more challenging to find a good rotational distinguisher. Finally, we would like to propose an open problem concerning the distinguishers employed in [26, 44, 45].

Definition 10

Let \(f: {\mathbb {F}}_{2}^n\rightarrow {\mathbb {F}}_{2}^n\) be a vectorial Boolean function, and \({\mathbb {A}}\) and \({\mathbb {B}}\) be two subsets of \({\mathbb {F}}_2^n\). For \((\delta , \lambda , \gamma ) \in {\mathbb {F}}_2^n \times {\mathbb {F}}_2^n \times {\mathbb {F}}_2^n\), the correlation of the generalized rotational differential-linear distinguisher of f is defined as

$$\begin{aligned} \frac{2^n}{ |{\mathbb {A}}| |{\mathbb {B}}| } \sum _{x \in {\mathbb {D}}} (-1)^{\lambda \cdot f(x) \oplus \gamma \cdot f(\overleftarrow{x}\oplus \delta ) }, \end{aligned}$$

where \({\mathbb {D}} = \{ x \in {\mathbb {F}}_2^n : f(x) \in {\mathbb {A}}~\textrm{and}~ f(\overleftarrow{x}\oplus \delta ) \in {\mathbb {B}} \}\).

Then, can we derive a closed formula for the correlation of the generalized rotational differential-linear distinguisher of \(E = E_1 \circ E_2\) under the assumption that \(E_0\) and \(E_1\) are independent?