1 Introduction

Low-end devices such as RFID tags, sensor networks, and the Internet of Things (IoT) are becoming ubiquitous. In 2018, Gartner, Inc. forecasted that there would be more than 25 billion connected devices forming the IoT by 2021 [24], and following the COVID-19 lockdowns Gartner also revealed that the unprecedented event led IoT implementers to increase IoT investments to reduce costs [25]. Traditional cryptographic algorithms are not suitable for these resource-constrained devices, and several lightweight cryptographic algorithms have been recently proposed to meet this growing demand. In this regard, the National Institute of Standards and Technology (NIST) has started a process to evaluate and standardize lightweight cryptographic algorithms [59].

ARX primitives, composed exclusively of modular Additions, cyclic Rotations, and XORs, are a promising class of lightweight cryptographic algorithms with the most efficient software implementations on low-end microcontrollers [17]. There are many noteworthy ARX algorithms, such as the hash function BLAKE [1], the stream cipher Salsa20 [7], the MAC algorithm Chaskey [58] and notable block ciphers like HIGHT [31], LEA [32], SPECK [6], SPARX [16] or CHAM [41]. Usually, ciphers that are exclusively composed of ARX operations and other common bit-vector operations (e.g., modular multiplication or logical shifts) are also considered in the class of ARX ciphers, such as IDEA [43], TEA [73], or XTEA [60].

The security of ARX ciphers is evaluated by analysing their robustness against various attacks. Some of the most successful attacks applied to ARX algorithms are differential cryptanalysis and their variants, such as boomerang or related-key differential attacks [32, 41]. These attacks exploit differences in the inputs that propagate through the cipher with high probability. Another powerful attack based on non-random propagation of differences is impossible-differential cryptanalysis [8, 37], which exploits input differences propagating to differences in the outputs with zero probability.

The standard approach to show an ARX cipher is secure against differential and impossible-differential attacks is by finding the optimal characteristics (i.e., trails of differences with the highest probabilities) and the longest impossible differentials and checking that no high-probability characteristic and no impossible differential cover most rounds of the cipher [31, 32]. When the best attack in the design stage is a differential or an impossible-differential attack, the number of rounds of the cipher is determined by the longest observed high-probability characteristic or impossible differential. Thus, searching for optimal characteristics and impossible differentials is a crucial step in the design and security analysis of a cipher.

Two main techniques have been applied to search for optimal characteristics of ARX algorithms: branch-and-bound algorithms [11, 13] based on Matsui’s algorithm [54], and the recent automated methods based on Constraint Satisfaction Problems (CSP), such as SMT (Satisfiability Modulo Theories) or MILP (Mixed Integer Linear Programming) problems [21, 57]. CSP-based methods have also been recently applied to find impossible differentials [14, 65, 66]. These automated methods formulate the characteristic or impossible-differential search problem as a CSP and delegate the solving task to one of the powerful off-the-shelf CSP solvers available nowadays [5, 49]. While some CSP-based open-source tools automate the search of ARX characteristics (e.g., CryptoSMT [38]), no CSP-based open-source tool has been published to search for impossible differentials of ARX ciphers.

The main difficulty in formulating a CSP-based search problem lies in the differential models of the non-linear operations, that is, the constraints describing the differential probability of the non-linear operations of the cipher. ARX ciphers can be efficiently described using the bit-vector theory of SMT, and several bit-vector differential models have been proposed so far [39, 47, 48]. For the modular addition with two n-bit operands, the foremost non-linear operation in ARX primitives, Lipmaa and Moriai found a bit-vector algorithm for computing the differential probability with complexity \(O(\log _2 n)\) [47]. This algorithm can be straightforwardly translated to a bit-vector differential model, and it has been used in several SMT-based methods to search for characteristics [48, 57, 68] and impossible differentials [65] of ARX ciphers.

However, no CSP-based differential model has been proposed for the modular addition with a constant input, preventing from searching for characteristics or impossible differentials of ARX ciphers that contain constant additions. Lipmaa’s algorithm is restricted to the modular addition with two operands, and it cannot be applied when one of the inputs is fixed to a constant, as we will discuss later. Machado proposed an algorithm to compute the differential probability of the constant addition [53], but it cannot be translated to an efficient bit-vector differential model due to its recursive nature and the use of floating-point arithmetic.

1.1 Contributions

We propose an efficient bit-vector differential model for the modular addition by an n-bit constant. Our model contains \(O(\log _2 n)\) basic bit-vector constraints and it is composed of a bit-vector formula that determines whether a differential over the constant addition has non-zero probability, and a bit-vector function that computes the binary logarithm of the differential probability. Our bit-vector model exploits the properties of the carry chain of the modular addition and relies on efficient well-known bit-vector functions, such as the Hamming weight or the bit-reversal, and new bit-vector functions that we have developed for the binary logarithm.

Furthermore, we describe an SMT-based automated method to search for characteristics of ARX ciphers, including constant additions. Our method is composed of an iterated search of optimal characteristics of round-reduced versions of the cipher and an automated encoding technique that formulates the SMT problems from the cipher’s Single Static Assignment (SSA) form. Moreover, we describe a new automated method to search for impossible differentials of ARX ciphers which does not depend on any pre-defined sets of input and output differences.

We have implemented our methods in an SMT-based open-source tool ArxPy,Footnote 1 which fully automated the search of ARX characteristics and impossible differentials. ArxPy is the first open-source tool that can search for the characteristics of ARX ciphers with constant additions, and it is also the first CSP-based open-source tool that automates the search of ARX impossible differentials. ArxPy offers a simple interface to represent any ARX cipher, different types of characteristics and impossible differentials to search, and a complete documentation.

We have applied our characteristic and impossible-differential search methods to several ARX ciphers containing constant additions to provide some examples. In particular, we have searched for different types of related-key characteristics alongside related-key impossible differentials of TEA, XTEA, HIGHT, LEA, SHACAL-1, and SHACAL-2. With our automated approach, we have revisited results previously found with manual and ad-hoc techniques. We have obtained better characteristics in terms of probability and number of rounds, and longer impossible differentials.

With our bit-vector model for the constant addition, the SMT-based automated methods, and our open-source tool ArxPy, we provide cipher designers with the resources to design ARX ciphers, including constant additions that are secure against differential and impossible-differential attacks. Thus, cipher designers can choose the best constants for the modular additions and optimize the number of rounds to balance security and efficiency.

1.2 Differences to the conference version

This paper is an extended full version of the conference paper [3]. Thus, the content of the conference paper is included in this paper, namely the bit-vector differential model for the constant addition, the SMT-based method and tool to search for ARX differential characteristics and the related-key differential characteristics found for TEA, XTEA, HIGHT, and LEA. Apart from this content, the rest of this paper is new material. This new content includes the SMT-based method to search for impossible differentials of ARX ciphers, the tool to search for ARX impossible differentials, the related-key differential characteristics found for SHACAL-1 and SHACAL-2 and the related-key impossible differentials found for TEA, XTEA, HIGHT, LEA, SHACAL-1 and SHACAL-2. Furthermore, this paper enhances the description of the bit-vector differential model of the constant addition with improved proofs and new examples.

1.3 Outline

The notations and preliminaries are introduced in Sect. 2, and the bit-vector model for the modular addition by a constant is described in Sect. 3. Section 4 illustrates the formulation of the search of characteristics and impossible differentials as sequences of SMT problems. Section 5 presents the characteristics and impossible differentials found for TEA, XTEA, HIGHT, LEA, SHACAL-1, and SHACAL-2 using our automated approaches. Finally, Sect. 6 concludes the paper and addresses future works.

2 Preliminaries

2.1 Notations

Let x be an integer such that its n-bit vector representation when \(0 \le x < 2^n\) is \(x = (x[n-1], \dots , x[0])\), where x[0] and \(x[n-1]\) denote respectively the least and the most significant bit. For ease of notation, we define \(x[i] = 0\) when \(i < 0\) and the symbol \(*\) stands for an undetermined bit. The usual integer operations are denoted by \((+, -, \times , /)\) and the basic bit-vector operations are gathered in Table 1.

A mathematical expression only involving bit-vector variables and basic bit-vector operations is called a bit-vector expression. A bit-vector formula is a bit-vector expression returning \(\texttt {True}\) or \(\texttt {False}\), such as \({\textsf {Equals}}\), whereas an n-bit vector function is a bit-vector expression returning an n-bit vector. In order to measure the complexity of the bit-vector differential model that we propose in this paper, we define the bit-vector complexity of a bit-vector expression as the number of basic bit-vector operations that the expression is composed of.

Table 1 Basic bit-vector operations for n-bit vectors

In the literature of the bit-vector theory, the set of basic bit-vector operations usually includes the operations gathered in Table 1 and few additional operations, such as modular multiplication or modular division [42]. However, modular multiplication and modular division are much more costly than the other operations in practice, and we have excluded them from our set of basic bit-vector operations, which resembles the unit-cost RAM model used in [47].

Apart from the basic bit-vector operations listed in Table 1, we will also employ the following well-known bit-vector functions: \({\textsf {Carry}}, {\textsf {Rev}}, {\textsf {RevCarry}}, {\textsf {HW}}\) and \({\textsf {LZ}}\). The carry function \(c = {\textsf {Carry}}(x,y)\) returns the n-bit carry chain of the n-bit modular addition \(x \boxplus y\). It is defined iteratively as \(c[0] = 0\) and \(c[i+1] = (x[i] \wedge y[i]) \oplus (x[i] \wedge c[i]) \oplus (y[i] \wedge c[i])\) for \(0< i < n - 1\). Note that the carry has bit-vector complexity O(1), since \({\textsf {Carry}}(x,y) = x \oplus y \oplus (x \boxplus y)\). The carry function is an efficient function that allows propagating information from the least significant bits to the most significant bits, a property that we will exploit for our bit-vector differential model.

The bit-reversal function \({\textsf {Rev}}(x)\) reverses the order of bits of x, i.e., \({\textsf {Rev}}(x) = (x[0], x[1], \dots , x[n-1])\). This function can be computed using a divide and conquer method with bit-vector complexity \(O(\log _2 n)\) [29, Fig. 7-1]. We will use this function to define the reverse carry, \({\textsf {RevCarry}}(x,y) = {\textsf {Rev}}({\textsf {Carry}}({\textsf {Rev}}(x), {\textsf {Rev}}(y)))\), which allows to propagate information from right to left and also has bit-vector complexity \(O(\log _2 n)\).

The Hamming weight \({\textsf {HW}}(x)\) returns an n-bit vector denoting the number of non-zero bits of the n-bit input x. Similar to the bit-reversal, the Hamming weight can be computed using a divide and conquer approach with bit-vector complexity \(O(\log _2 n)\) [29, Fig. 5-2]. The Hamming weight will be one of the main building blocks to obtain an efficient bit-vector representation of the binary logarithm.

The last bit-vector function we will consider is the leading zeros function \({\textsf {LZ}}(x)\). This function marks the leading zeros of an n-bit input x, that is, for \(0 \le i < n\), \({\textsf {LZ}}(x)[i] = 1 \iff x[n-1, i] = 0\). This function is used as a subroutine for the well-known function to compute the number of leading zeros. Similar to the previous bit-vector functions, \({\textsf {LZ}}\) can be computed with bit-vector complexity \(O(\log _2 n)\) [29, Fig. 5-16].

2.2 Differential and impossible-differential cryptanalysis

A block cipher is a family of permutations parametrized by a \(\kappa \)-bit key k, mapping n-bit plaintexts p to n-bit ciphertexts c. Most block ciphers consist of a key scheduling algorithm \({\textsf {KS}}\), which derives round keys \(k_1, \dots , k_r\) from the master key k, and an encryption algorithm \(E_k\), which processes the plaintext by iterating a round function f and injecting a round key at each round, i.e., \( E_k = f_{k_r} \circ \dots \circ f_{k_1}\).

Block ciphers are shown to be secure by analysing their resistance against all known attacks. One of the most potent attacks, especially to ARX primitives, is differential cryptanalysis [10]. It exploits the non-random propagation of differences in the input to recover the secret key.

Let F be an n-bit to n-bit function and \((\Delta _p, \Delta _c)\) be the XOR of a pair of inputs \((p, p')\) and their corresponding outputs \((c, c')\), i.e., \(\Delta _p = p \oplus p'\) and \(\Delta _c = c \oplus c'\). The pair \((\Delta _p, \Delta _c)\) is called a differential and its probability is defined as

$$\begin{aligned} \Pr \left[ \Delta _p \xrightarrow {F} \Delta _c\right] = \frac{\# \{ p : \ F(p) \oplus F(p \oplus \Delta _p) = \Delta _c \}}{2^n}\,. \end{aligned}$$

A differential is valid if it has non-zero probability. In this case, its weight is defined as

$$\begin{aligned} {\textsf {weight}}_F(\Delta _p, \Delta _c) = - \log _2\left( \Pr \left[ \Delta _p \xrightarrow {F} \Delta _c\right] \right) \,. \end{aligned}$$

The differential \(0 \xrightarrow {F} 0\) has probability 1 for any function F, and a differential with non-zero input difference over a random n-bit permutation has probability \(2^{-n}\). Differential cryptanalysis [10] exploits a differential over the n-bit block cipher with probability \(p > 2^{-n}\) to recover the secret key with roughly \(O(p^{-1})\) encryption calls.

Related-key differential cryptanalysis [34] extends differential cryptanalysis by considering key differences. A related-key differential is given by a pair of differentials over the key schedule and the encryption function respectively,

$$\begin{aligned} \left( \Delta _k \xrightarrow {{\textsf {KS}}} \left( \Delta _{k_1}, \dots , \Delta _{k_r}\right) \right) , \quad \left( \Delta _p \xrightarrow {E} \Delta _c\right) , \end{aligned}$$

where the ciphertext difference is computed using the related round-key pairs,

$$\begin{aligned} \Delta _c = \left( f_{k_r} \circ \dots \circ f_{k_1}\right) (p) \oplus \left( f_{k_r\oplus \Delta _{k_r}} \circ \dots \circ f_{k_1\oplus \Delta _{k_1}}\right) \left( p \oplus \Delta _p\right) . \end{aligned}$$

The probability of a related-key differential is the product of the probability of key schedule differential \(p_{{\textsf {KS}}}\) and the probability of encryption differential \(p_{E}\).

A related-key attack exploits a related-key differential with \(p_{{\textsf {KS}}} > 2^{-\kappa }\) and \(p_{E} > 2^{-n}\) to recover the secret key with complexity \(O((p_{{\textsf {KS}}} \times p_{E})^{-1})\). The attacker takes about \(p_{{\textsf {KS}}}^{-1}\) key pairs to find one key, on average, that satisfies the key schedule differential. Next and for each key pair, the attacker runs a differential attack over the encryption using \(O(p_{E}^{-1})\) encryption calls.

Related-key differential cryptanalysis requires a very powerful attacker that can query the encryption function \(E_{k \oplus \Delta _k}\) for many keys \(k \oplus \Delta _k\). In fact, if an adversary can query \(E_{k \oplus \Delta _k}\) for \(2^{m}\) key differences \(\Delta _k\), any block cipher is vulnerable to a related-key attack with complexity \(O(2^{m}+2^{n-m})\) [74]. Thus, we distinguish between weak related-key differentials (i.e., \(p_{{\textsf {KS}}} < 1\)) and strong related-key differentials (i.e., \(p_{{\textsf {KS}}} = 1\)), which can be exploited in practice with a single related-key pair. Furthermore, we define equivalent keys as pairs of related keys \((k, k \oplus \Delta _k)\) such that \(\forall p, \ E_{k}(p) = E_{k \oplus \Delta _k}(p \oplus \Delta _p) \oplus \Delta _c\), for some \((\Delta _p, \Delta _c)\). Note that a related-key differential with \(p_{E} = 1\) leads to \(2^{\kappa } p_{{\textsf {KS}}}\) pairs of equivalent keys.

Lastly, we consider (related-key) impossible differentials. A differential \((\Delta _p, \Delta _c)\) over a function F is called impossible if its probability is zero, and a related-key differential \((\Delta _k, \Delta _p \rightarrow \Delta _c)\) over a block cipher is called impossible if its probability is zero for all keys. Impossible-differential cryptanalysis [8] is an attack on block ciphers that exploits an impossible differential over the block cipher holding for every key. Related-key impossible-differential cryptanalysis is a combination of impossible-differential cryptanalysis and related-key cryptanalysis. Using the known difference of the key pairs and the input and the output of the impossible differential, the attacker discards the wrong keys to obtain the correct key.

2.2.1 Searching for characteristics and impossible differentials.

The most challenging step to launch a differential attack is finding a differential with high probability. The main approach is to analyse how differences traverse through the round function and search for a characteristic, that is, a trail of differences

$$\begin{aligned} \Omega = \left( \Delta _p = \Delta _{x_0} \xrightarrow {f_{k_1}} \Delta _{x_ 1} \rightarrow \dots \rightarrow \Delta _{x_{r-1}} \xrightarrow {f_{k_r}} \Delta _{x_r} = \Delta _c\right) \,. \end{aligned}$$

Similar to differentials, a characteristic \(\Omega \) is valid if it has non-zero probability and its weight is defined as \(- \log _2(\Pr [\Omega ])\). Furthermore, we denote a related-key characteristic by a pair of characteristics \((\Omega _{{\textsf {KS}}}, \Omega _{E})\), where \(\Omega _{KS}\) is the key schedule characteristic containing the trail of differences from the master key to the round keys and \(\Omega _E\) is the encryption characteristic containing the trail of differences through the encryption.

Obtaining the exact probability of a characteristic is computationally infeasible. Thus, two assumptions are commonly made. First, it is assumed that the differential probabilities over each round are independent, which allows computing the weight of a characteristic by summing the round weights, i.e.,

$$\begin{aligned} {\textsf {weight}}(\Omega ) = \sum _{i=0}^{r} {\textsf {weight}}(\Delta _{x_i} \rightarrow \Delta _{x_ {i+1}})\,. \end{aligned}$$

Second, it is assumed that the probability of a characteristic does not strongly depend on the choice of the secret key, also known as the hypothesis of stochastic equivalence [44], which allows computing the weight of a characteristic by averaging over all keys.

On top of that, designers also assume that the probability of a differential \((\Delta _p, \Delta _c)\) is close to the probability of the best characteristic \((\Delta _p \rightarrow \dots \rightarrow \Delta _c)\), and they prove a cipher is secure against differential cryptanalysis by showing that characteristics with high probability cannot cover most rounds of the cipher. While these assumptions do not always hold, currently this is the best systematic approach to argue security against differential cryptanalysis, and this heuristic approach is widely used for ARX ciphers in practice [21, 39, 57, 68,69,70].

Searching for characteristics is usually dependent on some assumptions, as mentioned earlier. In contrast, the process of obtaining an impossible differential typically results in a sound proof, guaranteeing that the probability of the achieved differential is equal to zero. Therefore, most of the impossible-differential search methods are sound but not complete. In other words, any differential found by these methods is assuredly impossible, yet there may be many impossible differentials that the search methods cannot detect.

2.2.2 SMT solvers

A recent approach to search for characteristics and impossible differentials of ARX ciphers is by formulating the search problem as an SMT problem in the bit-vector theory [2, 39, 48, 57, 65, 68]. Satisfiability Modulo Theories (SMT) refers to the problem of determining whether a first order formula is satisfiable with respect to some logical theory. SMT problems are a generalization of SAT problems; while the latter problems are expressed in propositional logic, SMT formulas can be expressed in richer logics, such as the theory of bit-vectors or the theory of integers.

SMT has grown in recent years into a very active research field, and several off-the-shelf SMT solvers are available nowadays [5]. Most SMT solvers can determine the satisfiability of a problem and obtain an assignment of the variables that satisfies the problem. This feature allows SMT solvers to be applied in search problems.

An SMT problem in the bit-vector theory is given by a set of bit-vector variables and a set of bit-vector formulas or constraints. The constraints can be defined with the usual logical operations (e.g., \({\textsf {Equals}}, {\textsf {NotEquals}}, {\textsf {Implies}}\), etc.) and the usual bit-vector operations (e.g., \(\oplus , \boxplus , \lll \), etc.).

For example, a bit-vector SMT problem to find an 8-bit preimage of \(y = f(x) = x \oplus ((x \boxplus x) \vee 1)\), given the 8-bit image \(y = 3 = \texttt {00000011}\), is the following:

$$\begin{aligned} \exists x \in \{0,1\}^8: \ {\textsf {Equals}}(\texttt {00000011}, x \oplus ((x \boxplus x) \vee 1) \,. \end{aligned}$$

This problem is satisfiable and the only assignment that satisfies the problem is \(x = \texttt {11111110}\).

2.3 Differential models

To represent a characteristic in a constraint satisfaction problem, it is necessary to find a differential model of the round function f. For an SMT problem in the bit-vector theory, a differential model of a function \(y = f(x)\) is given by a bit-vector formula \({\textsf {valid}}_{f}(\Delta _x, \Delta _y)\) and a bit-vector function \({\textsf {weight}}_{f}(\Delta _x, \Delta _y)\). The formula \({\textsf {valid}}_{f}(\Delta _x, \Delta _y)\) is True if and only if the differential \((\Delta _x \rightarrow \Delta _y)\) over f is valid, and the function \({\textsf {weight}}_{f}(\Delta _x, \Delta _y)\) returns the weight of a valid differential \((\Delta _x \rightarrow \Delta _y)\).

Characteristics over ARX ciphers are usually defined by considering the difference after each ARX operation. The differential models of the XOR and the cyclic rotations are very simple since these operations propagate differences deterministically, that is,

$$\begin{aligned} \begin{aligned} \Delta _{x_1}, \Delta _{x_2}&\xrightarrow {f(x_1, x_2) = x_1 \oplus x_2}\Delta _{x_1} \oplus \Delta _{x_2},\\ \Delta _x&\xrightarrow {f_a(x) = x \lll a \ \ \ \ } \Delta _x \lll a, \end{aligned} \qquad \begin{aligned} \Delta _x&\xrightarrow {f_a(x) = x \oplus a \ } \Delta _x,\\ \Delta _x&\xrightarrow {f_a(x) = x \ggg a} \Delta _x \ggg a. \end{aligned} \end{aligned}$$

For the modular addition with two n-bit inputs, \(y = f(x_1, x_2) = x_1 \boxplus x_2\), the algorithm by Lipmaa and Moriai [47] can be translated into the following differential model with bit-vector complexity \(O(\log _2 n)\).

Theorem 1

Let \(((\Delta _{x_1}, \Delta _{x_2}), \Delta _{y})\) be a differential over the modular addition \(y = x_1 \boxplus x_2\) and denote \(\overleftarrow{x} = x \ll 1\) and \(\mathrm {eq}(a, b, c) = (\lnot a \oplus b) \wedge (\lnot a \oplus c)\). Then, the differential is valid if and only if the bit-vector formula

$$\begin{aligned} {\textsf {valid}}_{\boxplus }((\Delta _{x_1}, \Delta _{x_2}), \Delta _y) = {\textsf {Equals}}(0, \mathrm {eq}(\overleftarrow{\Delta _{x_1}}, \overleftarrow{\Delta _{x_2}}, \overleftarrow{\Delta _y}) \wedge (\Delta _{x_1} \oplus \Delta _{x_2} \oplus \Delta _y \oplus \overleftarrow{\Delta _{x_2}})) \end{aligned}$$

is True. In this case, the differential weight is given by the bit-vector function

$$\begin{aligned} {\textsf {weight}}_{\boxplus }((\Delta _{x_1}, \Delta _{x_2}), \Delta _y) = {\textsf {HW}}(\lnot \mathrm {eq}(\Delta _{x_1}, \Delta _{x_2}, \Delta _y) \ll 1)\,. \end{aligned}$$

For the modular addition with a constant input \(\boxplus _a(x) = x \boxplus a\), Machado [53] obtained the following algorithm to compute the differential probability [53].

Theorem 2

Let (uv) be a differential over the n-bit constant addition \(\boxplus _a\). Then, the differential probability is given by

$$\begin{aligned} \Pr [u \xrightarrow {\boxplus _a} v] = \varphi _{0} \times \dots \times \varphi _{n-1}\,, \end{aligned}$$

where \(\varphi _i\) depends on the \(\delta _{i-1}\) and \(S_i\), each one defined for \(0 \le i < n\) by

$$\begin{aligned} S_i&= (u[i-1], v[i-1], u[i] \oplus v[i])\,, \\ \delta _i&= {\left\{ \begin{array}{ll} (a[i-1] + \delta _{i-1}) / 2, &{} S_i = \texttt {000}\\ 0, &{} S_i = \texttt {001} \\ a[i-1], &{} S_i \in \{\texttt {010}, \texttt {100}, \texttt {110}\} \\ \delta _{i-1}, &{} S_i \in \{\texttt {011}, \texttt {101} \} \\ 1/2, &{} S_i = \texttt {111}\ \end{array}\right. } \\ \varphi _i&= {\left\{ \begin{array}{ll} 1, &{} S_i = \texttt {000} \\ 0, &{} S_i = \texttt {001} \\ 1/2, &{} S_i \in \{\texttt {010}, \texttt {011}, \texttt {100}, \texttt {101}\} \\ 1 - (a[i-1] + \delta _{i-1} - 2 a[i-1] \delta _{i-1}), &{} S_i = \texttt {110} \\ (a[i-1] + \delta _{i-1} - 2 a[i-1] \delta _{i-1}), &{} S_i = \texttt {111}, \end{array}\right. } \end{aligned}$$

For \(i = -1\), \(S_i\) and \(\delta _i\) are defined by \(S_{-1}=\bot \) and \(\delta _{-1}=0\).

Unfortunately, the algorithm illustrated in Theorem 2 is not suitable for constraint satisfaction problems due to its recursive nature and the use of floating-point arithmetic.

Some authors [46, Corollary 2] [4] have adapted the differential model of the 2-input addition (i.e., the modular addition with two independent inputs) for the constant addition by setting the difference of the second operand to zero, that is,

$$\begin{aligned} \begin{aligned} {\textsf {valid}}_{\boxplus _a}(\Delta _x, \Delta _y)&\leftarrow {\textsf {valid}}_{\boxplus }((\Delta _{x}, 0), \Delta _y)\,, \\ {\textsf {weight}}_{\boxplus _a}(\Delta _x, \Delta _y)&\leftarrow {\textsf {weight}}_{\boxplus }((\Delta _{x}, 0), \Delta _y)\,. \end{aligned} \end{aligned}$$
(1)

The approximation given by Eq. (1) models the differential \((\Delta _x \xrightarrow {\boxplus _a} \Delta _y)\) by averaging over all a. While this approach can be used to model the constant addition by a round key, since the characteristic probability is also computed by averaging over all keys, for a fixed constant this approach is rather inaccurate.

Surprisingly, the differential properties of the 2-input addition and the constant addition are very different. The 2-input addition was shown to be CCZ-equivalent to a quadratic function [67], that is, the differential properties of the 2-input addition are the same as some quadratic functions. In particular, the set of inputs \((x_1, x_2)\) satisfying a differential \(((\Delta _{x_1}, \Delta _{x_2}) \rightarrow \Delta _y)\) over the 2-input addition forms a subspace of \({\mathbb {F}}_2^n\), which allows to describe its differential model using few basic operations.

On the other hand, the constant addition is not CCZ-equivalent to a quadratic function, since the set of inputs \((x_1, x_2)\) satisfying a differential \((\Delta _{x}, \Delta _y)\) over \(\boxplus _a\) does not form a subspace for many a. In other words, the probability of a differential over the constant addition is not necessarily of the form \(2^{-\alpha }\) for a positive integer \(\alpha \), and finding a differential model for the constant input addition is a much harder problem.

We experimentally checked the accuracy of the approximation given by Eq. (1) for 8-bit constants a. For most values of a, validity formulas differ roughly in \(2^{13}\) out of all \(2^{16}\) differentials. For those differentials where they did not differ, the difference between their weights was significantly high on average.

Consequently, no differential model of the constant addition suitable for constraint satisfaction problems has been proposed so far. In the next section, we present the first differential model of the constant addition for SMT problems in the bit-vector theory.

3 Bit-vector differential model of the constant addition

We present a bit-vector differential model of the constant addition, composed of a bit-vector formula to determine whether a given differential is valid and a bit-vector function that computes the weight of the valid differential. Our model takes benefit from Theorem 2 [53]; however, we avoid bit iterations, floating-point arithmetic, multiplications and look-up tables, in order to obtain efficient bit-vector constraints to be used in bit-vector SMT problems.

Before we illustrate our model, we remark an essential property of Theorem 2. When the state \(S_i\) is not \(\texttt {110}\) or \(\texttt {111}\), the probability of the step i, \(\varphi _i\), depends exclusively on \(S_i\); otherwise, \(\varphi _i\) depends on \(S_i\) and \(\delta _{i-1}\). When \(S_i=\texttt {11*}\), \(S_{i-1}\in \{\texttt {010}, \texttt {100}, \texttt {110}, \texttt {000} \}\) and for the first three cases, \(\delta _{i-1}\) is equal to \(a[i-2]\). However, considering the forth case, i.e., \(S_{i-1}=\texttt {000}\), \(\delta _{i-1}\) depends on \(\delta _{i-2}\) and this dependency will proceed until we obtain a state \(S_{i-\ell _i} \ne \texttt {000}\) for some positive integer \(\ell _i\). Thus, \(\delta _{i-1}\) has the following expression when \(S_i=\texttt {11*}\),

$$\begin{aligned} \delta _{i-1} = \frac{a[i-\ell _i-1]}{2^{\ell _i - 1}} + \sum _{j=2}^{\ell _i} \frac{a[i-j]}{2^{j-1}}. \end{aligned}$$
(2)

Therefore, when \(S_{i} = \texttt {11*}\), \(\varphi _i\) also depends on the previous states \(S_{i-1}, \ldots , S_{i-\ell _i}\), which motivates the following definition.

Definition 1

Let \(S_i=\texttt {11*}\). The chain \(\Gamma _i\) is defined as the smallest set of previous states \(\{S_{i-1},S_{i-2},\ldots ,S_{i-\ell _i} \}\) that completely determine \(\varphi _i\), and the positive integer \(\ell _i\) is called the length of \(\Gamma _i\).

Given a chain \(\Gamma _i = \{S_{i-1},S_{i-2},\ldots ,S_{i-\ell _i} \}\), note that \(S_{i-\ell _i} \ne \texttt {000}\) and the remaining states in the chain (if any) are all equal to \(\texttt {000}\).

In the next example, we illustrate how to calculate the differential probability using the iterative method of Theorem 2 and we learn more about the intermediate variables used for obtaining the probability.

Example 1

Consider the differential \((u, v) = (\texttt {1010001110}, \texttt {1010001010})\) over the modular addition by the 10-bit constant \(a = \texttt {1000101110}\). According to Theorem 2, the differential probability of (uv) is given by

$$\begin{aligned} \Pr [u \xrightarrow {\boxplus _a} v] = \frac{\# \{ x : \ (x \boxplus _a) \oplus ((x \oplus u) \boxplus _a) = v \}}{2^{10}} = \prod _{i=0}^{9}\varphi _i \,. \end{aligned}$$

Table 2 displays the variables we need to compute to obtain the differential probability. As we mentioned earlier, if \(S_i = (u_{i-1},v_{i-1}, u_i \oplus v_i) \ne \texttt {11*}\), each \(\varphi _i\) can be computed in a straightforward way without any further dependencies of previous states.

For the remaining states equal to \(\texttt {110}\) or \(\texttt {111}\), we first obtain their associated chains as

$$\begin{aligned} S_2&= \texttt {111},\quad \ \Gamma _2 = \{S_1=\texttt {000},S_0=\texttt {000},S_{-1}=\bot \},\quad \ell _2 = 3\,, \\ S_4&= \texttt {110},\quad \ \Gamma _4 = \{S_3=\texttt {100}\},\quad \ell _4 = 1\,, \\ S_8&= \texttt {110},\quad \ \Gamma _8 = \{S_7=\texttt {000},S_6=\texttt {000},S_5=\texttt {000},S_4=\texttt {110}\},\quad \ell _8 = 4\,. \end{aligned}$$

Then, we compute the associated \(\delta _{i-1}\) using Eq. (2), and finally we obtain \(\varphi _i\) from the values of \(a[i-1]\) and the computed \(\delta _{i-1}\).

Table 2 The intermediate variables for finding the differential probability of Example 1

Multiplying each \(\varphi _i\) listed in Table 2 leads to the differential probability,

$$\begin{aligned} \Pr \left[ u \xrightarrow {\boxplus _a} v\right] = \prod _{i=0}^{9} \varphi _i = \frac{5}{16} \,. \end{aligned}$$

3.1 Validity

Let (uv) be a differential over \(\boxplus _a\), the modular addition by n-bit constant a. According to Theorem 2, the differential probability of (uv) can be expressed as \(\varphi _0 \times \dots \times \varphi _{n-1}\). Thus, (uv) is a valid differential, i.e., with non-zero probability, if and only if all \(\varphi _i\) are non-zero. If \(\varphi _i = 0\), note that \(S_i\) must be \(\texttt {001}, \texttt {110}\) or \(\texttt {111}\). While \(S_i = \texttt {001}\) always implies \(\varphi _i = 0\), the other two cases require an extra condition to result in \(\varphi _i=0\), as shown in the next lemma.

Lemma 1

Let the state \(S_i\) be \(\texttt {11b}\), for \(\texttt {b} \in \{\texttt {0}, \texttt {1}\}\). Then, \(\varphi _i\) is equal to 0 if and only if \( \lnot \texttt {b} \oplus a[i-1] = a[i-2] = \dots = a[i-\ell _i-1]. \)

Proof

Having \(S_i = \texttt {11b}\), \(\varphi _i = 0\) if and only if \(\lnot \texttt {b} = \delta _{i-1} \oplus a[i-1]\). Let \(\ell _i\) be the chain length of \(S_i\). The case for \(\ell _i = 1\) is trivial, since \(\delta _{i-1} = a[i-2]\). To achieve \(\delta _{i-1} = a[i-1] \oplus \lnot b\) when \(\ell _i > 1\), the non-negative rational number \(\delta _{i-1}\) must be equal to 0 or 1. Since \(\delta _{i-1}\) is a monotonically increasing function of \((a[i-2],\dots ,a[i-\ell _i -1])\) regarding Eq. (2), \(\delta _{i-1}\) reaches its extrema in \((0, \dots , 0)\) and \((1, \dots , 1)\), that is,

$$\begin{aligned} \delta _{i-1} = c \iff a[i-2] = a[i-3] = \cdots = a[i-\ell _i-1] = c, \quad \forall c \in \{\texttt {0}, \texttt {1}\}, \end{aligned}$$

Thus, \(\delta _{i-1} = a[i-1] \oplus \lnot \texttt {b} \iff \delta _{i-1} = a[i - 2] = \dots = a[i - \ell _i]\). \(\square \)

The next lemma provides a bit-vector expression to check Lemma 1 by exploiting the fact that the carry chain allows a bit to affect the bits to its left.

Lemma 2

Consider the following n-bit values,

$$\begin{aligned} s_{\texttt {00*}}&= \lnot (u \ll 1) \wedge \lnot (v \ll 1), \quad s_{\texttt {**1}} = u \oplus v, \quad a' = (a \oplus (a \ll 1)) \ll 1, \\ c&= {\textsf {Carry}}\big ( s_{\texttt {00*}} \wedge \lnot a', \lnot (s_{\texttt {00*}} \ll 1) \big ), \quad g = (s_{\texttt {**1}} \oplus a') \wedge (c \vee \lnot (s_{\texttt {00*}} \ll 1) )\,. \end{aligned}$$

Then, for all states \(S_i = \texttt {11*}\), we have \(\varphi _i = 0\) if and only if \(g[i] = 1\).

Proof

Let \(S_i = \texttt {11b}\) with chain length \(\ell _i\). Note that \(a'[i] = a[i-1] \oplus a[i-2]\) and that \(s_{\texttt {00*}}[i]=1\) (resp. \(s_{\texttt {**1}}[i]=1\)) if and only if \(S_{i} = \texttt {00*}\) (resp. \(S_i = {\texttt {**1}}\)).

The first operand of g[i], i.e., \((s_{\texttt {**1}}\oplus a')[i]\), is equal to one if and only if \(\texttt {b} = \lnot (a[i-1] \oplus a[i-2] )\). For \(\ell _i = 1\) it is easy to see that \(S_{i-1} \ne \texttt {00*}\); therefore, the second operand of g[i] is 1, and by Lemma 1\(g[i] = 1\) if and only if \(\varphi _i = 0\).

When \(\ell _i > 1\), \(S_{i-1} = \texttt {000}\) and the second major operand of g[i] reduces to c. In particular, the two major operands of the \({\textsf {Carry}}\) function of c are given by

$$\begin{aligned} (s_{\texttt {00*}} \wedge \lnot a')[i, i-\ell _i]&= (\lnot ( a[i-1] \oplus a[i-2]), \dots , \lnot (a[i-\ell _i] \oplus a[i-\ell _i-1]), 0)\,, \\ \lnot (s_{\texttt {00*}} \ll 1)[i, i-\ell _i]&= (0, \dots , 0, 1, *)\,. \end{aligned}$$

Thus, \(c[i] = c[i-1] \wedge \lnot a'[i-1]\) and \(c[i-\ell _i+1] = c[i-\ell _i] \wedge \lnot s_{\texttt {00*}}[i-\ell _i-1] = 0\); otherwise, for \(0 \le j \le i-\ell _i-1\) we will obtain \(s_{\texttt {00*}}[j]=0\) which does not conform to \(S_0 =\texttt {00*}\). By unrolling the recursive definition of c[i], we see that \(c[i] = \lnot a'[i-1] \wedge \cdots \wedge \lnot a'[i-\ell _i+1]\). In other words, \(c[i]=1\) if and only if \(a[i-2] = \dots = a[i - \ell _i-1]\). Together with the condition for \((s_{\texttt {**1}}\oplus a')[i]=1\), we have that \(g[i] = 1\) exactly when \(\varphi _i = 0\), regarding Lemma 1. \(\square \)

Lemma 2 provides a bit-vector variable g that detects the states \(S_i = \texttt {11*}\) leading to invalidity. The next theorem presents the final bit-vector formula for the validity by taking into account the states \(S_i = \texttt {001}\) as well.

Theorem 3

Let (uv) be a differential over the n-bit constant addition \(\boxplus _a\). Consider the n-bit value g defined in Lemma 2 and the following n-bit values

$$\begin{aligned} s_{\texttt {001}} = \lnot (u \ll 1) \wedge \lnot (v \ll 1) \wedge (u \oplus v), \quad s_{\texttt {11*}} = (u \ll 1) \wedge (v \ll 1)\,. \end{aligned}$$

Then, the bit-vector formula \({\textsf {valid}}_{\boxplus _a}(u, v) = {\textsf {Equals}}( s_{\texttt {001}} \vee (s_{\texttt {11*}} \wedge g), 0)\) is True if and only if the differential (uv) is valid.

Proof

By the definition of \(s_{\texttt {001}}\) and \(s_{\texttt {11*}}\), \(s_{\texttt {001}}[i]=1\) (respectively \(s_{\texttt {11*}}[i]=1\)) if and only if \(S_i=\texttt {001}\) (respectively \(S_i=\texttt {11*}\)). Moreover, \(\varphi _i = 0\) exactly when \(S_i = 001\), or when \(S_i = 11*\) and \(g[i]=1\) (Lemma 2). Thus, \(\varphi _i = 0\) if and only if \(s_{\texttt {001}} \vee (s_{\texttt {11*}} \wedge g)[i] = 1\). \(\square \)

Since the number of basic bit-vector operations of our bit-vector validity formula is independent of the bit-size of the inputs, the bit-vector complexity of \({\textsf {valid}}_{\boxplus _a}\) is O(1).

Example 2

Consider the valid differential of Example 1, i.e. \(a=\texttt {1000101110}\), \(u=\texttt {1010001110}\), and \(v=\texttt {1010001010}\). Previously, we showed that its differential probability is non-zero and equal to 5/16. In this example, we will illustrate our bit-vector validity formula step by step.

Table 3 provides some of the essential bit-vector values used in Theorem 3. Since there is no state equal to \(\texttt {001}\), \(s_{\texttt {001}}\) is the all-zero bit-vector. As we have shown in Example 1, there are three states equal to \(\texttt {11*}\), and the associated bit of \(s_{\texttt {11*}}\) is equal to one in the corresponding bits. In this example, no state \(\texttt {11*}\) leads to invalidity, and g is equal to the all-zero bit-vector. Thus, \(s_{001}[i] \vee (s_{11*}[i] \wedge g[i]) = 0\) for all i, and our validity formula

$$\begin{aligned} {\textsf {valid}}_{\boxplus _a}(u, v) = {\textsf {Equals}}( s_{001} \vee (s_{11*} \wedge g), 0) \end{aligned}$$

evaluates to True.

Table 3 The intermediate variables for evaluating the bit-vector validity formula of Example 2

3.2 Weight of a valid differential

In this section, we propose a bit-vector function that computes the weight of a valid differential over the constant addition. Working with differential weights has the advantage that multiple differential weights can be combined by adding them up, while probabilities need to be multiplied, a very costly operation in a bit-vector SMT problem.

The weight of a valid differential over the constant addition is an irrational value in general, and it cannot be represented as a fixed-sized bit-vector. Thus, our bit-vector function computes a close approximation of the weight, and we provide almost tight bounds for the approximation error.

Through the rest of the section, let (uv) be a valid differential over the n-bit constant addition \(\boxplus _a\). According to Theorem 2, the weight can be obtained by

$$\begin{aligned} {\textsf {weight}}_{\boxplus _a}(u,v) = - \log _2\left( \prod _{i=0}^{n-1} \varphi _i\right) = - \sum _{i=0}^{n-1} \log _2(\varphi _i)\,. \end{aligned}$$
(3)

Let \({\mathcal {I}}\) denote the set of indices corresponding to the states \(\texttt {11*}\) with chain length bigger than one, i.e., \({\mathcal {I}} = \{ 1 \le i \le n-1 \ | \ S_i=\texttt {11*}, \ \ell _i >1 \} \). For \(i \notin {\mathcal {I}}\), the probability \(\varphi _i\) only depends on the current state \(S_i\) and \(\varphi _i\) is either 1 or 1/2. Based on the aforementioned fact, we show how to acquire the summation of all \(\log _2(\varphi _i)\) when \(i \notin {\mathcal {I}}\) using bit-vector expressions.

Lemma 3

Let \({\mathcal {I}} = \{ 1 \le i \le n-1 \ | \ S_i=\texttt {11*}, \ \ell _i >1 \} \). Then,

$$\begin{aligned} - \sum _{i \notin {\mathcal {I}}} \log _2(\varphi _i) = {\textsf {HW}}( (u \oplus v) \ll 1 )\,. \end{aligned}$$

Proof

To prove the lemma, we divide the set \(\{ i \ | \ i \notin {\mathcal {I}} \}\) into two parts as \(\{ i \ | \ S_i \ne \texttt {11*}\}\) and \(\{ i \ | \ S_i = \texttt {11*}, \ell _i=1 \}\). For each state \(S_i \ne 11*\), there are two possible cases. If \(S_i\) is equal to \(\texttt {000}\), the corresponding step probability is \(\varphi _i=1\). Otherwise, \(S_i \in \{010, 011, 100, 101\}\) and we obtain \(\varphi _i = 1/2\). Considering these two cases leads to

$$\begin{aligned} \sum _{\begin{array}{c} i \\ S_i \ne 11* \end{array}} \log _2(\varphi _i)&= \sum _{\begin{array}{c} i \\ S_i = 000 \end{array}} \log _2(1) \quad + \!\!\! \sum _{\begin{array}{c} i \\ S_i \in \{010, 011, 100, 101\} \end{array}} \!\!\!\!\!\!\!\!\!\!\!\!\!\! \log _2(1/2) \,, \\&= - \ \#\{S_i \in \{010, 011, 100, 101\} : 0 \le i < n\} \,. \end{aligned}$$

Since the second case \(S_i \in \{010, 011, 100, 101\}\) occurs when \(u[i-1] \oplus v[i-1] = 1\), we can use \({\textsf {HW}}( (u \oplus v) \ll 1 )\) to compute the number of times this case happens.

Now for \(S_i=\texttt {11*}\) when \(\ell _i=1\), we know that \(\delta _{i-1} = a[i-2] \in \{0,1\}\). Since the probability is not equal to zero, we obtain \(\varphi _i=1\). Thus we get

$$\begin{aligned} \sum _{\begin{array}{c} i \\ S_i = 11* \\ \ell _i=1 \end{array} } \log _2(\varphi _i) = 0. \end{aligned}$$

By and large, the sum of \(\log _2(\varphi _i)\) when \(i \notin {\mathcal {I}}\) is

$$\begin{aligned} \sum _{i \notin {\mathcal {I}}} \log _2(\varphi _i) = \sum _{\begin{array}{c} i \\ S_i \ne 11* \end{array}} \log _2(\varphi _i) \ + \sum _{\begin{array}{c} i \\ S_i = 11* \\ \ell _i=1 \end{array} } \log _2(\varphi _i) = - {\textsf {HW}}( (u \oplus v) \ll 1). \end{aligned}$$

\(\square \)

Lemma 3 describes the sum of \(\log _2(\varphi _i)\) when \(i \not \in {\mathcal {I}}\) as a bit-vector expression with complexity \(O(\log _2 n)\). To describe the logarithmic summation when \(i \in {\mathcal {I}}\) as a bit-vector, we will first show how to split \(\varphi _i\) as the quotient of two integers.

Lemma 4

Let \(i \in {\mathcal {I}}\) and let \(p_i\) be the positive integer defined by

$$\begin{aligned} p_i = {\left\{ \begin{array}{ll} a[i-2,i-\ell _i] + a[i-\ell _i-1],\quad u[i] \oplus v[i] \oplus a[i-1] = 1 \\ 2^{\ell _i - 1} - (a[i-2,i-\ell _i] + a[i-\ell _i-1]),\quad u[i] \oplus v[i] \oplus a[i-1] = 0 \end{array}\right. } \end{aligned}$$

where \(\ell _i > 1\) is the chain length of the state \(S_i=\texttt {11*}\). Then, \(\varphi _i = \dfrac{p_i}{2^{\ell _i-1}}\).

Proof

Considering the definition of \(\varphi _i\) when \(S_i=\texttt {11*}\),

$$\begin{aligned} \varphi _i = {\left\{ \begin{array}{ll} \delta _{i-1}, &{} u[i] \oplus v[i] \oplus a[i-1] = 1 \\ 1 - \delta _{i-1}, &{} u[i] \oplus v[i] \oplus a[i-1] = 0 \end{array}\right. } \end{aligned}$$

and following the definition of \(\delta _{i-1}\) given by Eq. (2),

$$\begin{aligned} 2^{\ell _i-1}\delta _i = \sum _{j=0}^{\ell _i-2}2^{j}a[i-\ell _i+j] \ + a[i-\ell _i-1] = a[i-2,i-\ell _i] + a[i-\ell _i-1]\,, \end{aligned}$$

we obtain that \(\varphi _i = p_i / 2^{\ell _i-1}\). Moreover, having \(0 < \varphi _i \le 1\) and \(\ell _i > 1\) results in \(0 < p_i \le 2^{\ell _i-1}\). Thus, \(p_i\) is always a positive integer. \(\square \)

Due to Lemma 4, we can decompose the logarithmic summation over \({\mathcal {I}}\) as

$$\begin{aligned} \sum _{i \in {\mathcal {I}} } \log _2(\varphi _i) = \sum _{i \in {\mathcal {I}} } \log _2(p_i) \ - \sum _{i \in {\mathcal {I}}} (\ell _i - 1)\,. \end{aligned}$$

The next lemma shows how to describe the summation involving the chain lengths with basic bit-vector operations.

Lemma 5

Consider the n-bit vector \(s_{000} = \lnot (u \ll 1) \wedge \lnot (v \ll 1)\). Then,

$$\begin{aligned} \sum _{i \in {\mathcal {I}} } (\ell _i - 1) = {\textsf {HW}}\big ( s_{000} \wedge \lnot {\textsf {LZ}}(\lnot s_{000}) \big )\,. \end{aligned}$$

Proof

Recall that there are exactly \((\ell _i-1)\) states in each chain \(\Gamma _i\) such that

$$\begin{aligned} S_{i-1}=S_{i-2}=\cdots = S_{i-(\ell _i-1)} = \texttt {000}. \end{aligned}$$

Therefore, we have \( \sum _{i \in {\mathcal {I}} } (\ell _i - 1) = \#\{S_j | S_j=\texttt {000} \text { and } \exists i \in {\mathcal {I}} \ s.t. \ S_j \in \Gamma _i \}\,. \) When \(S_j=\texttt {000}\), the next state \(S_{j+1}\) will be a member of the set \(\{\texttt {000},\texttt {11*}\}\). As a result, it is easy to see that for an arbitrary j, if \(S_j\) is equal to \(\texttt {000}\), then either \(S_j\) is included in some chain \(\Gamma _i, i \in {\mathcal {I}}\), or \(S_j\) belongs to the set \(\Gamma '\) defined by

$$\begin{aligned} \Gamma ' = \{ S_{n-1}=\texttt {000}, \ldots , S_{n-k} = \texttt {000} \}\,, \end{aligned}$$

for some \(k > 0\), where \(S_{n-k-1} \ne 000\). Concerning Definition 1, one can observe that \(\Gamma '\) is not a chain. Therefore, \( \sum _{i \in {\mathcal {I}} } (\ell _i - 1) = \#\{S_j | S_j = \texttt {000} \text { and } S_j \not \in \Gamma ' \} \).

Since we are assuming that the differential is valid, there are no states \(S_j = \texttt {001}\), and \(s_{000}[j] = 1\) if and only if \(S_j = \texttt {000}\). On the other hand, the function \({\textsf {LZ}}\) can be used to detect the states from the set \(\Gamma '\). In particular, \({\textsf {LZ}}(\lnot s_{000})[i]\) is equal to \(\texttt {1}\) if and only if \(S_i \in \Gamma '\). Therefore, we obtain

$$\begin{aligned} \sum _{i \in {\mathcal {I}} } (\ell _i - 1) = {\textsf {HW}}\big ( s_{000} \wedge (\lnot {\textsf {LZ}}(\lnot s_{000}) ) \big )\,. \end{aligned}$$

\(\square \)

Representing the sum of \(\log _2(p_i)\) by a bit-vector expression is the most complex and challenging part of our differential model. Thus, we will proceed in several steps. First, we will show how to obtain a bit-vector w that contains all the \(p_i\) as some sub-vectors.

Lemma 6

Consider the following n-bit values,

$$\begin{aligned}&s_{\texttt {000}} = \lnot (u \ll 1) \wedge \lnot (v \ll 1), \quad&s_{\texttt {000}}' = s_{\texttt {000}} \wedge \lnot {\textsf {LZ}}(\lnot s_{\texttt {000}}), \\&t = \lnot s_{\texttt {000}}' \wedge (s_{\texttt {000}}' \gg 1), \quad&t' = s_{\texttt {000}}' \wedge (\lnot (s_{\texttt {000}}'\gg 1)), \\&s = ((a \ll 1) \wedge t) \boxplus (a \wedge (s_{\texttt {000}}' \gg 1)), \quad&q = \big ( ( \lnot ( (a \ll 1) \oplus u \oplus v)) \gg 1 \big ) \wedge t', \\&d = {\textsf {RevCarry}}(s_{\texttt {000}}', q) \vee q, \quad&w = (q \boxminus (s \wedge d)) \vee (s \wedge \lnot d). \end{aligned}$$

Then, for all states \(S_i = \texttt {11*}\) with \(i \in {\mathcal {I}}\), \(w[i-1,i-\ell _i] = p_i\).

Proof

For each \(i \in {\mathcal {I}}\) and \(0 \le j < n\), note that \(s_{\texttt {000}}'[j] = 1\) exactly when \(S_j=\texttt {000}\) and \(S_j \in \Gamma _i\), and \(t[j] = 1\) (resp. \(t'[j] = 1\)) if and only if \(S_j = S_{i - \ell _i}\) (resp. \(S_j = S_{i-1}\)). Denoting \(s = s_1 \boxplus s_2\), where \(s_1 = (a \ll 1) \wedge t\) and \(s_2=a \wedge (s_{000}' \gg 1)\), when \(i \in {\mathcal {I}}\) the sub-vectors

$$\begin{aligned} \begin{array}{rlllllll} s_1[i-1,i-\ell _i-1] &{} = &{} (0, &{} 0,&{} \dots , &{} 0, &{} a[i-\ell _i - 1], &{} 0 )\,, \\ s_2[i-1,i-\ell _i-1] &{} = &{} ( 0, &{} a[i-2], &{} \dots , &{} a[i-\ell _i+1], &{} a[i - \ell _i], &{} 0 ) \,, \end{array} \end{aligned}$$

result in \(s[i-1,i-\ell _i] = a[i-2,i-\ell _i] + a[i-\ell _i-1]\). In particular, \(s[i-1,i-\ell _i] \le 2^{\ell _i-1}\) and the equality holds when \(s[i-1,i-\ell _i] = \texttt {10\dots 0}\).

It is easy to see that \(q[i-1] = \lnot (a[i-2] \oplus u[i-1] \oplus v[i-1])\) when \(i \in {\mathcal {I}}\) and q is zero elsewhere. Then, the sub-vectors \(d[i - 1, i - \ell _i]\) are composed of repeated copies of \(q[i-1]\) when \(i \in {\mathcal {I}}\), as shown by the following sub-vectors

$$\begin{aligned} \begin{array}{rllllllll} s_{000}'[i, i-\ell _i-1]&{} = &{} (0, &{} 1, &{} 1, &{} \dots , &{} 1, &{} 0, &{} *)\,, \\ q [i, i-\ell _i-1]&{} = &{} (0, &{} q[i-1], &{} 0, &{} \dots , &{} 0, &{} 0, &{} *)\,, \\ {\textsf {RevCarry}}(s_{000}', q)[i, i-\ell _i-1] &{} = &{} (*, &{} 0, &{} q[i-1], &{} \dots , &{} q[i-1], &{} q[i-1], &{} 0)\,, \\ d[i, i-\ell _i-1]&{} = &{} (*, &{} q[i-1], &{} q[i-1], &{} \dots , &{} q[i-1], &{} q[i-1], &{} *)\,. \end{array} \end{aligned}$$

The only exception for the above equations is when \(i-\ell _i = -1\), where the two least significant bits of the above sub-vectors will be equal to zero.

Let \(w = w_1 \wedge w_2\), where \(w_1 = q \boxminus (s \wedge d)\) and \(w_2 = s \wedge \lnot d\). Regarding the acquired patterns for q and d, we prove the following inequalities for \(i \in {\mathcal {I}}\)

$$\begin{aligned} (s \wedge d)[i-1,i-\ell _i]&\le q[i-1,i-\ell _i]\,, \\ (s \wedge d)[i-\ell _i-1,0]&\le q[i-\ell _i-1,0]\,, \end{aligned}$$

which imply the identity \(w_1[i-1,i-\ell _i] = q[i-1,i-\ell _i] \boxminus (s \wedge d)[i-1,i-\ell _i]\).

The first inequality can be derived from the fact that \(s[i-1,i-\ell _i] \le \texttt {10\dots 0}\). For the second inequality, consider the index set \({\mathcal {J}} = \{j|\forall i \in I, S_j \notin \Gamma _i\}\). Then, the second inequality holds since for \(j \in {\mathcal {J}}\) and \(c \in \{0, 1\}\) we can see that

$$\begin{aligned} s'_{\texttt {000}}[j+1-c] = 0 \implies s_1[j-c] = s_2[j-c] = 0\,. \end{aligned}$$

We are now ready to evaluate \(w[i-1,i-\ell _i]\) when \(i \in {\mathcal {I}}\). If \(q[i-1]=0\), then \(d[i-1,i-\ell _i] = (0, \dots , 0)\), \(w_1[i-1,i-\ell _i]\) reduces to 0, and

$$\begin{aligned} w[i-1,i-\ell _i] = w_2[i-1,i-\ell _i] = a[i-2,i-\ell _i] + a[i-\ell _i-1]\,. \end{aligned}$$

If \(q[i-1]=1\), then \(d[i-1,i-\ell _i] = (1, \dots , 1)\), \(w_2[i-1,i-\ell _i]\) reduces to 0, and

$$\begin{aligned} w[i-1,i-\ell _i]&= w_1[i-1,i-\ell _i] = (1, 0, \dots , 0) \boxminus s[i-1, i-\ell _i] \\&= 2^{\ell _i - 1} - (a[i-2,i-\ell _i] + a[i-\ell _i-1]) \,. \end{aligned}$$

Hence, for \(q[i-1] = \lnot (a[i-1] \oplus u[i] \oplus v[i])\) and regarding Lemma 4, we obtain that \(w[i-1,i-\ell _i] = p_i\). \(\square \)

Recall that both \({\textsf {LZ}}\) and \({\textsf {RevCarry}}\) have bit-vector complexity \(O(\log _2 n)\). Therefore, w can be described with \(O(\log _2 n)\) basic bit-vector operations.

Since \(p_i\) is not always a power of two, \(\log _2(p_i)\) cannot be represented by a fixed-sized bit-vector. Thus, we will use the following approximation for the binary logarithm of a positive integer x,

$$\begin{aligned} {\textsf {apxlog}}_2(x) \triangleq m + \frac{{\textsf {Truncate}}(x[m-1, 0])}{2^4}\,, \end{aligned}$$
(4)

where \(m = {\lfloor {\log _2(x)} \rfloor }\) and \({\textsf {Truncate}}(z)\) for an m-bit vector z is defined by

$$\begin{aligned} {\textsf {Truncate}}(z) = {\left\{ \begin{array}{ll} z[m - 1, m - 4], &{} m \ge 4 \\ z[m - 1, 0] \parallel (\overbrace{0, \dots , 0}^{4 - m}), &{} m < 4 \end{array}\right. }\, \end{aligned}$$

In other words, \({\textsf {apxlog}}_2\) includes the integer part of the logarithm and takes the four bits right after the most significant one as the “fraction” bits. While \({\textsf {Truncate}}\) can be generalized to consider more fraction bits, we will show later that four fraction bits are enough to minimize the bounds of our approximation error.

To describe \(\sum _{i \in {\mathcal {I}}} {\textsf {apxlog}}_2(p_i)\) with basic bit-vector operations, we will introduce in the next proposition two new bit-vector functions \({\textsf {ParallelLog}}\) and \({\textsf {ParallelTrunc}}\). Given a bit-vector x with sub-vectors delimited by a bit-vector y, \({\textsf {ParallelLog}}(x, y)\) computes the sum of the integer part of the logarithm of the delimited sub-vectors, whereas \({\textsf {ParallelTrunc}}(x, y)\) calculates the sum of the four most significant bits of the delimited sub-vectors.

Proposition 1

Let x and y be n-bit vectors such that y has r sub-vectors

$$\begin{aligned} y[i_t, j_t] = (1, 1, \dots , 1, 0), \quad t = 1, \dots , r \end{aligned}$$

where \(i_1> j_1> i_2> j_2> \dots> i_r > j_r \ge 0\), and y is equal to zero elsewhere.

We define the bit-vector functions \({\textsf {ParallelLog}}\) and \({\textsf {ParallelTrunc}}\) by

where \(z_{\lambda } = x \wedge (y \gg 0) \wedge \dots \wedge (y \gg \lambda ) \wedge \lnot (y \gg (\lambda +1))\).

  1. (a)

    If \(x[i_t,j_t] > 0\) for \(t = 1, \dots , r\), then

    $$\begin{aligned} \sum _{t=1}^{r} {\lfloor {\log _2(x[i_t,j_t])} \rfloor } = {\textsf {ParallelLog}}(x, y)\,. \end{aligned}$$
  2. (b)

    If at least \({\lfloor {\log _2(n)} \rfloor } + 4\) bits are dedicated to \({\textsf {ParallelTrunc}}(x, y)\), then

    $$\begin{aligned} \sum _{t=1}^{r} {\textsf {Truncate}}(x[i_t,j_t+1]) = {\textsf {ParallelTrunc}}(x,y) \,. \end{aligned}$$

Proof

Case (a) Let \(m = {\lfloor {\log _2(x[i_1,j_1])} \rfloor }\) and \(c = {\textsf {RevCarry}}(x \wedge y, y)\). Note that \(c[n-1,i_1]=0\), since \(y[n-1,i_1+1] = 0\). For \(m \ge 1\), we obtain the sub-vectors

$$\begin{aligned} \begin{array}{rrccccccccc} \ &{} &{} i_1, &{} \dots , &{} j_1\!+m+1, &{} j_1\!+m, &{} j_1\!+m-1, &{} \dots , &{} j_1\!+1, &{} j_1, &{} j_1\!-1 \\ y[i_1,j_1\!-1] &{} = &{} (1, &{} \dots , &{} 1, &{} 1, &{} 1, &{} \dots , &{} 1, &{} 0, &{} * )\,, \\ (x \wedge y)[i_1,j_1\!-1] &{} = &{} (0, &{} \dots , &{} 0, &{} 1, &{} *, &{} \dots , &{} *, &{} 0, &{} * )\,, \\ c[i_1,j_1\!-1] &{} = &{} (0, &{} \dots , &{} 0, &{} 0, &{} 1, &{} \dots , &{} 1, &{} 1, &{} 0 )\,. \end{array} \end{aligned}$$

In particular, \(c[i_1,j_1]\) has m bits set to one. If \(m = 0\), \(x[i_1,j_1+1] = 0\) and \(y[j_1] = 0\), which implies that there is no carry chain, i.e., \(c[i_1, j_1] = 0\). Therefore, in both cases \({\textsf {HW}}(c)[i_1,j_1]) = m = {\lfloor {\log _2(x[i_1,j_1])} \rfloor }\).

Note that the reversed carry chain stops at \(j_1\), and \(c[j_1-1, i_2] = \texttt {0}\cdots \texttt {0}\). Therefore, the same argument can be applied for \(t = 2, \dots , r\), obtaining

$$\begin{aligned} {\textsf {HW}}(c[i_t, j_t]) = {\lfloor {\log _2(x[i_t,j_t])} \rfloor }\,, \quad c[j_{t} - 1, i_{t+1}] = 0\,. \end{aligned}$$

Finally, it is easy to see that \(c[j_r - 1, 0] = 0\), concluding the proof for this case.

Case (b) First note that for \(\lambda = 0, \dots , 3\) and \(t = 1, \dots , r\), the variable \( z_\lambda \) is

$$\begin{aligned} z_\lambda [i] = {\left\{ \begin{array}{ll} x[i], &{} \text {if } i = i_t-\lambda > j_t \\ 0, &{} \text {otherwise } \\ \end{array}\right. } \end{aligned}$$

Therefore, the Hamming weight of \(z_\lambda \) computes the following summation:

$$\begin{aligned} {\textsf {HW}}(z_\lambda ) = \sum _{\begin{array}{c} t \\ i_t - \lambda > j_t \end{array}} x[i_t - \lambda ]\,. \end{aligned}$$

While we define \({\textsf {HW}}\) as a bit-vector function returning an n-bit output given an n-bit input, \({\lfloor {\log _2(n)} \rfloor }+1\) bits are sufficient to represent the output of \({\textsf {HW}}\). Therefore, by representing each \({\textsf {HW}}(z_{\lambda }) \ll (3 -\lambda )\) in a \(({\lfloor {\log _2(n)} \rfloor } + 4)\)-bit variable \(h_\lambda \), the bit-vector expression \(h_{0} \boxplus h_{1} \boxplus h_{2} \boxplus h_{3}\) does not overflow, and we obtain

$$\begin{aligned} \sum _{t=1}^{r} {\textsf {Truncate}}(x[i_t,j_t+1]) = \sum _{t=1}^{r} \sum _{\begin{array}{c} \lambda =0 \\ i_t - \lambda > j_t \end{array}}^{3} x[i_t - \lambda ] \times 2^{3 - \lambda } = h_{0} \boxplus h_{1} \boxplus h_{2} \boxplus h_{3}, \end{aligned}$$

which concludes the proof. \(\square \)

Since both \({\textsf {HW}}\) and \({\textsf {Rev}}\) have \(O(\log _2 n)\) bit-vector complexities, so do the functions \({\textsf {ParallelLog}}\) and \({\textsf {ParallelTrunc}}\). The next lemma applies \({\textsf {ParallelLog}}\) and \({\textsf {ParallelTrunc}}\) to provide a bit-vector expression of the sum of \({\textsf {apxlog}}_2(p_i)\).

Lemma 7

Let r and f be the bit-vectors given by

If at least \({\lfloor {\log _2(n)} \rfloor } + 5\) bits are dedicated to r and f, then

$$\begin{aligned} 2^4 \sum _{i \in {\mathcal {I}}} {\textsf {apxlog}}_2(p_i) = (r \ll 4) \boxplus f\,. \end{aligned}$$

Proof

Regarding Lemma 6, \(w[i-1,i-\ell _i]\) represents the \(\ell _i\)-bit vector of \(p_i\) and \(s'_{\texttt {000}}[i-1,i-\ell _i]\) conforms to the pattern \((1,\ldots ,1,0)\) for any \(i \in {\mathcal {I}}\). Therefore,

$$\begin{aligned} \sum _{i \in {\mathcal {I}}} {\lfloor {\log _2(p_i)} \rfloor } = {\textsf {HW}}\,\, \big ( \! {\textsf {RevCarry}}((w \wedge s'_{\texttt {000}}) \ll 1,s'_{\texttt {000}} \ll 1 ) \big )\,, \end{aligned}$$

following Proposition 1. For the second case, let c be the n-bit vector given by \(c = {\textsf {RevCarry}}((w \wedge s'_{\texttt {000}}) \ll 1,s'_{\texttt {000}} \ll 1 )\). Denoting by \(j = i- l_i\) and \(m = {\lfloor {\log _2(p_i)} \rfloor }\) for a given \(i \in {\mathcal {I}}\), note that \(p_i[m]\) is the most significant active bit of \(p_i\) and

$$\begin{aligned} \begin{array}{rrccccccccccl} \ &{} &{} i\!+\!1, &{} \dots , &{} j\!+\!m\!+\!2, &{} j\!+\!m\!+\!1, &{} j\!+\!m, &{} \dots , &{} j\!+\!2, &{} j\!+\!1, &{} j \ \ \\ (w \ll 1)[i\!+\!1,j] &{} = &{} ( 0, &{} \dots , &{} 0 &{} p_i[m], &{} p_i[m\!-\!1], &{} \dots , &{} p_i[1], &{} p_i[0] &{} 0)\,, \\ c[i\!+\!1,j] &{} = &{} ( 0, &{} \dots , &{} 0 &{} 0, &{} 1, &{} \dots , &{} 1, &{} 1 &{} 0)\,. \end{array} \end{aligned}$$

Thus \(c[j+m,j]\) conforms to the pattern \((1,\ldots ,1,0)\) and Proposition 1 leads to

$$\begin{aligned} \sum _{\begin{array}{c} i \in {\mathcal {I}} \\ m = {\lfloor {\log _2(p_i)} \rfloor } \end{array}} {\textsf {Truncate}}( p_i[m-1 , 0]) = {\textsf {ParallelTrunc}}(w \ll 1, c)\,. \end{aligned}$$

For any n-bit variables x and y, it is easy to see that \({\textsf {ParallelLog}}(x, y) < n\). Thus, \({\lfloor {\log _2(n)} \rfloor }+4\) bits are sufficient to represent \((r \ll 4)\), and f can also be represented with the same number of bits following Proposition 1. Therefore, by representing \((r \ll 4)\) and f in \(({\lfloor {\log _2(n)} \rfloor }+5)\)-bit variables, the bit-vector expression \((r \ll 4) \boxplus f\) does not overflow. \(\square \)

Recall that the differential weight of constant addition can be decomposed as

$$\begin{aligned} {\textsf {weight}}_{\boxplus _a}(u,v) = - \sum _{i \notin {\mathcal {I}}} \log _2(\varphi _i) - \sum _{i \in {\mathcal {I}}}\log _2\left( \dfrac{1}{2^{\ell _i-1}}\right) - \sum _{i \in {\mathcal {I}}}\log _2(p_i)\,. \end{aligned}$$

If the binary logarithm of \(p_i\) is replaced by our approximation of the binary logarithm \({\textsf {apxlog}}_2(p_i)\), we obtain the following approximation of the weight,

$$\begin{aligned} {\textsf {apxweight}}_{\boxplus _a}(u, v) \triangleq - \sum _{i \notin {\mathcal {I}}} \log _2(\varphi _i) - \sum _{i \in {\mathcal {I}}}\log _2\left( \dfrac{1}{2^{\ell _i-1}}\right) - \sum _{i \in {\mathcal {I}}}{\textsf {apxlog}}_2(p_i)\,. \end{aligned}$$
(5)

Our weight approximation can be computed with the bit-vector function \({\textsf {BvWeight}}\) described in Algorithm 1, as shown in the lemma.

figure a

Lemma 8

If at least \({\lfloor {\log _2(n)} \rfloor }+5\) bits are dedicated to \({\textsf {BvWeight}}(u,v,a)\), then

$$\begin{aligned} 2^4 {\textsf {apxweight}}_{\boxplus _a}(u,v) = {\textsf {BvWeight}}(u,v,a). \end{aligned}$$

Proof

Regarding Lemmas 3 and 5 and 7 we respectively obtain

All in all, we get the following identities,

$$\begin{aligned} 2^4 {\textsf {apxweight}}_{\boxplus _a}(u, v) = 2^4 ((int \ll 4) \boxminus frac) = {\textsf {BvWeight}}(u, v, a) \,. \end{aligned}$$

\(\square \)

Note that the four least significant bits of \({\textsf {BvWeight}}(u, v, a)\) correspond to the fraction bits of the approximate weight. In other words, the output of \({\textsf {BvWeight}}(u, v, a)\) represents the rational value

$$\begin{aligned} \sum _{i=0}^{{\lfloor {\log _2(n)} \rfloor }+4} 2^{i-4} {\textsf {BvWeight}}(u, v, a)[i]\,. \end{aligned}$$

The bit-vector complexity of \({\textsf {BvWeight}}\) is dominated by the complexity of \({\textsf {LZ}}\), \({\textsf {Rev}}\), \({\textsf {HW}}\), \({\textsf {ParallelLog}}\), and \({\textsf {ParallelTrunc}}\). Since these operations can be computed with \(O(\log _2 n)\) basic bit-vector operations, so does \({\textsf {BvWeight}}\).

Theorem 4 shows that \({\textsf {BvWeight}}\) leads to a close approximation of the differential weight and provides explicit bounds for the approximation error.

Theorem 4

Let (uv) be a valid differential over the n-bit constant addition \(\boxplus _a\), let \({\textsf {weight}}_{\boxplus _a}(u,v)\) be the differential weight of (uv), and let \({\textsf {BvWeight}}\) be the bit-vector function defined by Algorithm 1. Then, the approximation error,

$$\begin{aligned} E = {\textsf {weight}}_{\boxplus _a}(u,v) - {\textsf {apxweight}}_{\boxplus _a}(u,v) = {\textsf {weight}}_{\boxplus _a}(u,v) - 2^{-4} {\textsf {BvWeight}}(u,v,a) \end{aligned}$$

is bounded by \(- 0.029(n-1) \le E \le 0 \,.\)

Section 3.3 is devoted to the proof of Theorem 4, where we will also analyse the error caused by our approximated binary logarithm. Before proving Theorem 4, we will describe an example to understand this theorem and Algorithm 1.

Example 3

Consider the same conditions as defined in Example 1, i.e., \((u,v) = (\texttt {1010001110},\texttt {1010001010})\) and the constant input \(a=\texttt {1000101110}\). The weight for our differential is

$$\begin{aligned} {\textsf {weight}}_{\boxplus _a}(u,v)&= - \log _2(\Pr [u \xrightarrow {\boxplus _a} v]) = - \log _2\left( \frac{5}{16}\right) \approx 1.678. \end{aligned}$$

Let’s find the approximate weight \({\textsf {apxweight}}_{\boxplus _a}\) based on Algorithm 1. Table 4 presents some of the variables we obtain to compute the aforementioned approximate weight.

Table 4 Intermediate variables for computing \( {\textsf {apxweight}}_{\boxplus _a}(u,v) \) of Example 3

The set \({\mathcal {I}} = \{ 1 \le i \le n-1 \ | \ S_i=\texttt {11*}, \ \ell _i >1 \} \) in this example is equal to \({\mathcal {I}} = \{2,8\}\). The variable int in Algorithm 1 consists of three parts. The first part of the variable int calculated in Algorithm 1 is

$$\begin{aligned} {\textsf {HW}}( (u \oplus v) \ll 1) = \texttt {0000000001} = 1\,, \end{aligned}$$

which is equal to \(- \sum _{i \notin {\mathcal {I}}} \log _2(\varphi _i) = - \sum _{i \in \{0,1,3,4,5,6,7,9\}} \log _2(\varphi _i) = - \log _2(\varphi _3)\).

The second part uses the variable \(s'_{\texttt {000}}\) which is the same as \(s_{\texttt {000}}\) except that the leading one bits of \(s_{\texttt {000}}\) are replaced in \(s'_{\texttt {000}}\) by zeros. In other words, \(s'_{\texttt {000}}[9]=0\) but the remaining bits of these two bit-vectors are exactly equal. Computing the second part results in

$$\begin{aligned} {\textsf {HW}}(s'_{\texttt {000}}) = \texttt {0000000101} = 5\,, \end{aligned}$$

and it is equal to \( \sum _{i \in {\mathcal {I}} } (\ell _i - 1) = (\ell _2-1) + (\ell _8 - 1)\).

For the third and last part of int, we compute w, obtaining \(w = \texttt {0001010010}\). We remark that the bit-vector w for \(i \in {\mathcal {I}}\) in fact includes \(p_i\) as some subvectors, i.e.

$$\begin{aligned} i= & {} 2: \ w[2-1,2-\ell _2] = w[1,-1] = \texttt {100} = 4 = p_2\,,\\ i= & {} 8: \ w[8-1,8-\ell _8] = w[7,4] = \texttt {0101} = 5 = p_8\,. \end{aligned}$$

Hence, the third part of int is computed as

$$\begin{aligned} {\textsf {ParallelLog}}((w \wedge s'_{\texttt {000}}) \ll 1,s'_{\texttt {000}} \ll 1) = \texttt {0000000100} = 4\,, \end{aligned}$$

which is equal to \( \sum _{i \in {\mathcal {I}}} {\lfloor {\log _2(p_i)} \rfloor } = {\lfloor {\log _2(p_2)} \rfloor } + {\lfloor {\log _2(p_8)} \rfloor }\). Considering all three parts, the bit-vector int is obtained by

$$\begin{aligned} int = \texttt {0000000001} \boxplus \texttt {0000000101} \boxminus \texttt {0000000100} = \texttt {0000000010} = 2 \,. \end{aligned}$$

Moreover, the frac bit-vector in Algorithm 1 is calculated by

Considering the third major operand of int and the bit-vector frac, we can obtain the summation of all approximate logarithms

$$\begin{aligned} \sum _{i \in {\mathcal {I}}} {\textsf {apxlog}}_2(p_i) = {\textsf {apxlog}}_2(p_2) + {\textsf {apxlog}}_2(p_8) = 4 + \frac{4}{2^4} = 4.25\,. \end{aligned}$$

To summarize, the output of Algorithm 1 is

$$\begin{aligned} {\textsf {BvWeight}}(u,v,a)&= (int \ll 4) \boxminus frac \,, \\&= \texttt {0000100000} \boxminus \texttt {0000000100} = \texttt {0000011100} = 28\,. \end{aligned}$$

Therefore, the approximate weight will be equal to

$$\begin{aligned} {\textsf {apxweight}}_{\boxplus _a}(u,v) = 2^{-4}{\textsf {BvWeight}}(u,v,a) = \dfrac{28}{2^4} = 1.75\,. \end{aligned}$$

The total error of our approximation is

$$\begin{aligned} E = {\textsf {weight}}_{\boxplus _a}(u,v) - {\textsf {apxweight}}_{\boxplus _a}(u,v) \approx 1.678 - 1.75 = -0.072 \,, \end{aligned}$$

which is a negative value and lower bounded by \(-0.029(n-1) = -0.261\) as suggested by Theorem 4.

3.3 Error analysis: proof of Theorem 4

In this subsection, we will prove Theorem 4 by gradually analysing the error produced by our approximation of the binary logarithm. As we can see from Eqs. (3) and (5), the gap between \({\textsf {weight}}_{\boxplus _a}(u,v)\) and \({\textsf {apxweight}}_{\boxplus _a}(u,v)\) is

$$\begin{aligned} {\textsf {weight}}&_{\boxplus _a}(u,v) - {\textsf {apxweight}}_{\boxplus _a}(u,v) = - \sum _{i \in {\mathcal {I}}} \big (\!\log _2(p_i) - {\textsf {apxlog}}_2(p_i) \big )\,. \end{aligned}$$

Note that the integer part of \({\textsf {apxlog}}_2\) is equal to the integer part of \(\log _2\) and the error is caused by the fraction part of the logarithm.

Given a positive integer x and the corresponding \(m = {\lfloor {\log _2(x)} \rfloor }\), we define \({\textsf {apxlog}}_2^{\kappa }\) as

$$\begin{aligned}{\textsf {apxlog}}_2^{\kappa }(x) = {\left\{ \begin{array}{ll} m + x[m-1,0]/2^m, &{} m \le \kappa \\ m + x[m-1,x-\kappa ]/2^{\kappa }, &{} m > \kappa \end{array}\right. } \end{aligned}$$

The non-negative integer \(\kappa \) is called the precision of the fraction part. Note that \({\textsf {apxlog}}_2^{\kappa }\) is the generalization of \({\textsf {apxlog}}_2\), which considers \(\kappa = 4\) bits for the fraction part. While Theorem 4 only focuses on \(\kappa = 4\), we will use \({\textsf {apxlog}}_2^{\kappa }\) in this section to additionally prove that our error bound also applies to \(\kappa \ge 4\).

The following lemma bounds the approximation error of \({\textsf {apxlog}}_2\) when \(\kappa \ge {\lfloor {\log _2(x)} \rfloor }\), with a similar proof as [56] for the sake of completeness. The main idea is that after extracting integer part of the logarithm in base 2, one can estimate \(\log _2(1+\gamma )\) by \(\gamma \) when \(0 \le \gamma < 1\).

Lemma 9

Consider a positive integer x and the binary logarithm approximation \(\log _2(x) \approx m + x[m-1,0]/2^m\,,\) where \(m = {\lfloor {\log _2(x)} \rfloor }\). Then, the approximation error \(e = \log _2(x) - ( m + x[m-1,0]/2^m )\) is bounded by \(0 \le e \le B\), where B is given by

$$\begin{aligned} B = 1 - \big ( 1 + \ln (\ln (2)) \big ) / \ln (2) \approx 0.086\,. \end{aligned}$$

Proof

Let \(x = 2^m + b\), where b is a non-negative integer such that \(0 \le b < 2^m\). Therefore, \(x[m-1,0] = x - 2^m = b\) and the error is given by

$$\begin{aligned} e= & {} \log _2(x) - \left( m + \frac{x[m-1,0]}{2^m}\right) = \log _2(2^m + b) - \left( m + \frac{b}{2^m}\right) \\= & {} {} \mathbf \log _2\left( 1+\frac{b}{2^m}\right) - \dfrac{b}{2^m}\,. \end{aligned}$$

For \(\gamma = b/2^m\), we obtain \(0 \le \gamma < 1\) and \(e = \log _2(1+\gamma )-\gamma \). Note that e is a concave function of \(\gamma \) where \(e \ge 0\) if and only if \(0 \le \gamma \le 1\). By deriving \(e = e(\gamma )\), one can see that \(\max (e) = B = 1 - \big ( 1 + \ln (\ln (2)) \big ) / \ln (2) \approx 0.086\) is reached when \(\gamma = 1/\ln (2)-1 \approx 0.44\)\(\square \)

The bound B is an almost tight bound, e.g., when \(x=3\), the obtained error is \(\log _2(3) - (1 + \frac{1}{2}) \approxeq 0.085\). The following example sheds more light on our binary logarithm approximation.

Example 4

Consider the positive integer \(x=\texttt {11101}\). Note that \(\log _2(x) \approxeq 4.85798\) and \(m = {\lfloor {(\log _2(x))} \rfloor } = 4\). In order to obtain the approximation defined in Proposition 9, we first find and omit the greatest "one" in binary representation of x, and we get \(x[m-1,0]=\texttt {1101}\). By interpreting the remaining bits as a binary fraction, we have \(x[m-1,0]\big / 2^m = \texttt {0.1101}\). Therefore, the approximated binary logarithm of \(x = \texttt {11101}\) in binary representation is

$$\begin{aligned} m + \dfrac{x[m-1,0]}{2^m} = \texttt {100.1101}\,, \end{aligned}$$

which is equal to 4.8125. In addition, the corresponding error of such approximation is \(e \approx 0.04548\), which is a positive value and upper bounded by \(B \approx 0.086\).

Finally, we can now prove Theorem 4, which basically states that if we dedicate 4 bits to the fraction precision \(\kappa \), the approximation error E is bounded by \( - 0.029 \cdot (n-1) \le E \le 0\). While Theorem 4 focuses on \(\kappa = 4\), we will show in the proof that we can also bound the error for \(\kappa \ge 4\). To this end, we generalize the approximated weight \({\textsf {apxweight}}_{\boxplus _a}\) and the approximated weight error \(E_{\kappa }\) as follows

$$\begin{aligned} {\textsf {apxweight}}_{\boxplus _a}^{\kappa }(u,v)&= - \left( \sum _{i \in {\mathcal {I}}}{\textsf {apxlog}}_2^{\kappa }(p_i) + \sum _{i \in {\mathcal {I}}}\log _2\left( \dfrac{1}{2^{\ell _i-1}}\right) + \sum _{i \notin {\mathcal {I}}} \log _2(\varphi _i) \right) \\ E_{\kappa }&= {\textsf {weight}}_{\boxplus _a}(u,v) - {\textsf {apxweight}}_{\boxplus _a}^{\kappa }(u,v) \,, \end{aligned}$$

where \({\textsf {apxweight}}_{\boxplus _a}^4(u,v) = {\textsf {apxweight}}_{\boxplus _a}(u,v)\) is defined by Eq. (5) and \(E_{4} = E\) is defined in Theorem 4.

Fig. 1
figure 1

The error \(e = \log _2(1 + \gamma ) - \gamma \), over \(0 \le \gamma < 1\)

Proof (Theorem 4)

First, we mention that \(\log _2(\varphi _i)\) is an integer number when \(S_i \ne \texttt {11*}\) or for \(S_i=\texttt {11*}\) we see \(\ell _i < 3\). For these cases, \(\log _2(\varphi _i) = {\lfloor {\log _2(\varphi _i)} \rfloor }\) and the approximation error is equal to zero.

Next, for each \(i \in {\mathcal {I}}\) when \(\ell _i \ge 3\), let \(p_i = 2^{m_i} +b_i\) such that \(m_i\) and \(b_i\) are two non-negative integers, \(m_i \le \ell _i - 2\) and \(0 \le b_i < 2^{m_i}\). If \(\ell _i \le \kappa + 2\), we obtain \(m_i \le \kappa \) and \({\textsf {apxlog}}_2^{\kappa }(p_i) = m_i + b_i \cdot 2^{-m_i}\). Thus, the resulting error

$$\begin{aligned} e_i = \log _2(p_i) - {\textsf {apxlog}}_2^{\kappa }(p_i) = \log _2(p_i) - (m_i + b_i \cdot 2^{-m_i}) \end{aligned}$$

is exactly the same as the error defined in Proposition 9, and \(0 \le e_i \le B \approx 0.086\).

On the other hand, for \(m_i > \kappa \), i.e., \( \ell _i \ge \kappa +3 \), let \(p_i = 2^{m_i} + t_i \cdot 2^{m_i - \kappa }+\zeta _i\), where \(t_i\) and \(\zeta _i\) are two non-negative integers such that \(0 \le t_i < 2^{\kappa }\) as well as \(0 \le \zeta _i < 2^{m_i - \kappa }\). In this case, the approximated binary logarithm is \({\textsf {apxlog}}_2^{\kappa }(p_i) = m_i + t_i \cdot 2^{-\kappa }\). We now define a new error \(e^{\prime }_i\) as

$$\begin{aligned} e^{\prime }_i&= \log _2(p_i) - {\textsf {apxlog}}_2^{\kappa }(p_i) = \log _2(1 + t_i \cdot 2^{-\kappa } + \zeta _i \cdot 2^{-m_i}) - t_i \cdot 2^{-\kappa }\,. \end{aligned}$$

Due to the fact that \(\zeta _i \ge 0 \), we can see that

$$\begin{aligned} e_i^{\prime } = \log _2(p_i) - (m_i + t_i \cdot 2^{-\kappa }) \ge \log _2(p_i) - (m_i + t_i \cdot 2^{-\kappa } + \zeta _i \cdot 2^{-m_i} ) = e_i \ge 0\,. \end{aligned}$$

Since \(\zeta _i < 2^{m_i - \kappa }\) and by reforming the error, we obtain the upper bound of \(e^{\prime }_i\)

$$\begin{aligned} e^{\prime }_i \le \log _2(1 + t_i \cdot 2^{-\kappa }+2^{-\kappa }) - t_i \cdot 2^{-\kappa } = (\log _2(1+\gamma _i^{\prime }) - \gamma _i^{\prime }) +2^{-\kappa }\,, \end{aligned}$$

where \(\gamma _i^{\prime } = (t_i+1)\cdot 2^{-\kappa }\) and \(2^{-\kappa } \le \gamma _i^{\prime } < 1\). Regarding Proposition 9, the new error \(e_i^{\prime }\) is bounded by \( 0 \le e^{\prime }_i \le B+2^{-\kappa }\).

Note that for a valid differential (uv) over the constant addition \(\boxplus _a\) we can see that \(S_0=\texttt {000}\) which for some \(i^*\) belongs to the chain

$$\begin{aligned} \Gamma _{i^*} = \{S_{i^*-1}, \ldots , S_0, S_{-1}=\bot \}, \quad \ell _{i^*} = i^* + 1. \end{aligned}$$

Hence, we obtain

$$\begin{aligned} \delta _{i^*-1}&= \frac{a[i^*-\ell _{i^*}-1]}{2^{\ell _{i^*} - 1}} + \sum _{j=2}^{\ell _{i^*}} \frac{a[i^*-j]}{2^{j-1}} = \sum _{j=2}^{\ell _{i^*}-1} \frac{a[i^*-j]}{2^{j-1}}. \end{aligned}$$

Similar to the proof of Lemma 4 we have

$$\begin{aligned} \varphi _{i^*} = \frac{p_{i^*}}{2^{\ell _{i^*}\!-1}} = \frac{p^*_{i^*}}{2^{\ell _{i^*}\!-2}}, \end{aligned}$$

where \(p^*_{i^*} = p_{i^*}/2\) is an integer. Therefore, by replacing \(\ell _{i^*}\) with \(\ell ^*_{i^*} = \ell _{i^*}-1\), the previous statements considering the error bounds of our approximation for the state \(S_{i^*}=\texttt {11*}\) and its new \(\ell ^*_{i^*}\) are still correct. We now define two bits \(\texttt {b}\) and \(\texttt {b}'\) as

$$\begin{aligned} \texttt {b} = {\left\{ \begin{array}{ll} 1, &{} 3 \le \ell ^*_{i^*} \le \kappa + 2 \\ 0, &{} o.w. \end{array}\right. }, \quad \texttt {b}' = {\left\{ \begin{array}{ll} 1, &{} \ell ^*_{i^*} > \kappa + 2 \\ 0, &{} o.w. \end{array}\right. } \end{aligned}$$

Finally, by defining the conditional index set \({\mathcal {I}}_\alpha ^\beta = \{ i \in {\mathcal {I}}-\{i^*\} \ | \ \alpha \le \ell _i \le \beta \}\) we obtain

$$\begin{aligned} E_{\kappa }&= {\textsf {weight}}_{\boxplus _a}(u,v) - {\textsf {apxweight}}_{\boxplus _a}^{\kappa }(u,v) \\&= - \sum _{i \in {\mathcal {I}}} ( \log _2(p_i) - {\textsf {apxlog}}_2^{\kappa }(p_i) ) \\&= - \left( \sum _{i \in {\mathcal {I}}_3^{\kappa + 2} } \!\!\! e_i + \!\!\! \sum _{i \in {\mathcal {I}}_{\kappa + 3}^{n} } \!\!\! e_i^{\prime } + \texttt {b} e_{i^*} + \texttt {b}' e'_{i^*} \ \right) \\&\ge - \left( B \!\!\! \sum _{i \in {\mathcal {I}}_3^{\kappa + 2} } \!\!\! 1 \ + \ (B + 2^{-\kappa }) \!\!\! \sum _{i \in {\mathcal {I}}_{\kappa + 3}^{n} } \!\!\! 1 + \texttt {b}B + \texttt {b}'(B+2^{-\kappa }) \ \right) \\&\ge - \left( \dfrac{B}{3} \!\!\! \sum _{i \in {\mathcal {I}}_3^{\kappa + 2} } \!\!\!\ell _i \ + \ \left( \dfrac{B + 2^{-\kappa }}{\kappa +3}\right) \!\!\! \sum _{i \in {\mathcal {I}}_{\kappa + 3}^n } \!\!\! \ell _i + \texttt {b}\frac{B}{3}\ell ^*_{i^*} + \texttt {b}'\left( \frac{B+2^{-\kappa }}{\kappa +3}\right) \ell ^*_{i^*} \ \right) \,. \end{aligned}$$

For \(\kappa \ge 4\), we can see that \(\dfrac{B + 2^{-\kappa }}{\kappa +3} \le \dfrac{B}{3}\), resulting in

$$\begin{aligned} 0 \ge E_{\kappa }&\ge - \left( \dfrac{B}{3} \sum _{i \in {\mathcal {I}}_3^n } \ell _i + \dfrac{B}{3}\ell ^*_{i^*} \right) = - \left( \dfrac{B}{3} \sum _{i \in {\mathcal {I}}_3^n } \ell _i + \dfrac{B}{3}(\ell _{i^*}-1) \right) \\&\ge - \dfrac{B}{3} (n-1) \approx - 0.029 (n-1). \end{aligned}$$

Since for \(\kappa =4\), we have \(E_{4} = E = {\textsf {weight}}_{\boxplus _a}(u,v) - {\textsf {apxweight}}_{\boxplus _a}(u,v)\), the above inequalities hold for the approximation error E as well. \(\square \)

While dedicating \(\kappa = 4\) bits as the fraction precision is enough to obtain the same error bounds as \(\kappa > 4\), considering \(\kappa < 4\) creates a trade-off between the lower bound of the error and the complexity of Algorithm 1. As an example, choosing \(\kappa = 3\) removes one \({\textsf {HW}}\) call in Algorithm 1. However, by following the proof of Theorem 4 for \(\kappa =3\), the error will be lower bounded by \(-0.035 (n-1)\), which potentially is an acceptable trade-off.

The differential model of the constant addition as well as the approximation error will be used in the automated method that we will present in the next section to search for characteristics and impossible differentials of ARX ciphers.

4 SMT-based search of characteristics and impossible differentials

In this section, we describe how to formulate the search of optimal characteristics and impossible differentials as a sequence of SMT problems, which can be solved by an off-the-shelf SMT solver such as Boolector [62] or STP [22]. Our methods are inspired by the approach of Mouha and Preneel [57] to search for single-key characteristics of Salsa20 [57] and the approach by Sasaki and Todo to search for impossible differentials using Mixed Integer Linear Programming (MILP) [66].

4.1 Searching for characteristics

To search for characteristics up to probability \(2^{-n}\), the probability space is decomposed into n intervals \(I_w = \left( 2^{-w-1}, 2^{-w}\right] \), where \(w = 0, 1, \dots , n-1\), and for each interval, the decision problem of whether there exists a characteristic with probability \(p \in I_w\) is encoded as an SMT problem. Note that a characteristic \(\Omega \) has probability \(p \in I_w\) if and only if its integer weight \({\lfloor {{\textsf {weight}}(\Omega )} \rfloor }\) is equal to w. Section 4.2 describes the encoding process for an ARX cipher.

The SMT problems are provided to the SMT solver, which checks their satisfiability in increasing weight order. When the SMT solver finds the first satisfiable problem, an assignment of the variables that makes the problem satisfiable is obtained, and the search finishes. The assignment contains a characteristic with integer weight \({\hat{w}}\), and it is optimal in the sense that there are no characteristics with integer weight strictly smaller than \({\hat{w}}\). If the n SMT problems are found to be unsatisfiable, then it is proved there are no characteristics with probability higher than \(2^{-n}\).

To speed up the search, we perform the search iteratively on round-reduced versions of the cipher. First, we search for an optimal characteristic for a small number of rounds r; let \({\hat{w}}\) denote its integer weight. Then, we search for an optimal \((r+1)\)-round characteristic, but skipping the SMT problems with weight strictly less than \({\hat{w}}\). Since these SMT problems were found to be unsatisfiable for r rounds, they will also be unsatisfiable for \(r + 1\) rounds. This procedure is repeated until the total number of rounds is reached. Algorithm 2 describes in pseudo-code this search strategy. Our strategy prioritises SMT problems with low weight and small number of rounds, which are faster to solve. In addition, our search also finds optimal characteristics of round-reduced versions, which can be used in other differential-based attacks, such as rectangle or boomerang attacks [9, 71].

figure b

This automated method can be used to search for either single-key or related-key characteristics. Furthermore, additional SMT constraints can be added to the SMT problems in order to search for different types of characteristics. For related-key characteristics and by default, this method searches characteristics minimizing the total weight \({\textsf {weight}}(\Omega ) = {\textsf {weight}}(\Omega _{KS}) + {\textsf {weight}}(\Omega _E)\). Strong related-key characteristics can be searched by adding the constraint \({\textsf {weight}}(\Omega _{KS}) = 0\) in the SMT problems. Similarly, equivalent keys can be found by adding the constraint \({\textsf {weight}}(\Omega _{E}) = 0\).

Algorithm 2 returns the characteristic with the minimum SMT integer weight, obtained from the bit-vector differential models used within the SMT problems. When some of these models compute approximations of the intermediate weights, the SMT integer weight and the actual integer weight of the characteristic might differ, and the returned characteristic might not be optimal. However, if we have alternative models that compute the exact intermediate weights (that cannot be represented in the SMT problems) and a bound for the error of the SMT integer weight, Algorithm 2 can be adapted to obtain the optimal characteristic as follows. First, Algorithm 2 is used to obtain the characteristic \(\Omega \) with the minimum SMT integer weight \({\hat{w}}\). Then, one finds allFootnote 2 characteristics with SMT integer weights \(\{{\hat{w}},{\hat{w}} + 1, \dots , {\hat{w}} + {\lfloor {\epsilon } \rfloor }\}\), where \(\epsilon \) is the absolute bound for the error of the SMT integer weight. Finally, the weights of the found characteristics are recomputed with the alternative models, and the characteristic with the minimum integer weight is returned. This adaptation can be used for ARX ciphers with constant additions, as the error bound can be computed from Theorem 4 and Machado’s algorithm [53] can be used to compute the exact weights of the constant additions.

This method only ensures optimality if the differential probabilities over each round are independent and the characteristic probability does not strongly depend on the choice of the secret key. When these assumptions do not hold for a cipher, we empirically compute the weight of each characteristic found by sampling many input pairs satisfying the input difference and counting those satisfying the difference trail. In this case, this method provides a practical heuristic to find characteristics with high probability, and it is one of the best systematic approaches for some families of ciphers, such as ARX.

4.2 Encoding the SMT problems

In this section, we explain how to formulate the decision problem of determining whether a characteristic \(\Omega \) exists with integer weight W of an ARX cipher as an SMT problem in the bit-vector theory.

First, the ARX cipher is represented in Static Single Assignment (SSA) form, that is, as an ordered list of instructions \(y \leftarrow f(x)\) such that each variable is assigned exactly once and each instruction is a modular addition, a rotation, or an XOR.

For each variable x in the SSA representation, a bit-vector variable \(\Delta _x\) denoting the difference of x is defined in the SMT problem. Then, for every instruction \(y \leftarrow f(x)\), the weight and the differential model of f are added to the SMT problem as a bit-vector variable w and bit-vector constraints \({\textsf {valid}}_{f_i}(\Delta _x, \Delta _y)\) and \({\textsf {Equals}}(w, {\textsf {weight}}_{f_i}(\Delta _x, \Delta _y))\), following Table 5.

Table 5 Bit-vector differential models of ARX operations

Finally, the following bit-vector constraints are added to the SMT problem,

$$\begin{aligned} {\textsf {NotEquals}}(\Delta _p, 0) \,, \ {\textsf {Equals}}(W, w_1 \boxplus \dots \boxplus w_r) \,, \end{aligned}$$

where \(\Delta _p\) denotes the input difference and \((w_1, \dots , w_r)\) denote the weight of each operation. The first constraint excludes the trivial characteristic with zero input difference, while the second constraint fixes the weight of the characteristic to the target weight. Note that the bit-size of the weights might need to be increased to prevent an overflow in the modular addition of the last constraint.

Example 5

Consider the keyed function \(f_k\) with key k and input \(p=(p_1, p_2)\),

$$\begin{aligned} f_k(p_1, p_2) = (((p_2 \boxplus 1) \oplus k) \boxplus p_1, p_1 \lll 1)\,. \end{aligned}$$

This function can be written as a list of simple instructions (SSA form) as

$$\begin{aligned}&x_1 \leftarrow p_2 \boxplus 1 \,, \\&x_2 \leftarrow x_1 \oplus k \,, \\&x_3 \leftarrow x_2 \boxplus p_1 \,, \\&x_4 \leftarrow p_1 \lll 1 \,, \end{aligned}$$

where the output is the pair \((x_3, x_4)\). Figure 2 depicts the function \(f_k\) together with its intermediate variables.

Fig. 2
figure 2

The function \(f_k\)

An SMT problem in the bit-vector theory denoting whether \(f_k\) has a characteristic with integer weight W is as follows:

$$\begin{aligned} \exists \Delta _{p_1},&\Delta _{p_2}, \Delta _{x_1}, \Delta _{x_2}, \Delta _{x_3}, \Delta _{x_4}, w_1, w_2, w_3, w_4: \ \\&{\textsf {valid}}_{\boxplus _1}(\Delta _{p_2}, \Delta _{x_1}) \,, \\&{\textsf {Equals}}(w_1, {\textsf {BvWeight}}(\Delta _{p_2}, \Delta _{x_1}, 1)) \,, \\&{\textsf {Equals}}(\Delta _{x_2}, \Delta _{x_1}) \,, \\&{\textsf {Equals}}(w_2, 0) \,, \\&{\textsf {valid}}_{\boxplus }((\Delta _{x_2}, \Delta _{p_1}), \Delta _{x_3}) \,, \\&{\textsf {Equals}}(w_3, {\textsf {weight}}_{\boxplus }((\Delta _{x_2}, \Delta _{p_1}), \Delta _{x_3})) \,, \\&{\textsf {Equals}}(\Delta _{x_4}, \Delta _{d_{p_1}} \lll 1) \,, \\&{\textsf {Equals}}(w_4, 0) \,, \\&{\textsf {NotEquals}}( (\Delta _{p_1}, \Delta _{p_2}), 0) \,, \\&{\textsf {Equals}}(W, (w_1 \boxplus ((w_2 \boxplus w_3 \boxplus w_4) \ll 4) \gg 4)) \,. \end{aligned}$$

The shifts in the last constraint are due to the fact that the last four bits of \(w_1\) denote fraction bits. Furthermore, depending on the bit-size of \(f_k\), it might be necessary to extend the bit-size of the weights in order to prevent an overflow in the last modular additions.

4.3 Searching of impossible differentials

In [66], Sasaki and Todo [66] propose a MILP-based method to search for impossible differentials that employs the MILP problems used to search for characteristics. Since this method can also be adapted to SMT problems, we will explain the method within the SMT context.

The method’s main idea is that one can check whether a particular differential \((\Delta _p, \Delta _c)\) is impossible by querying a simple SMT problem. While it is infeasible to check all differentials, one can check those with a low number of active bits since most of the known impossible differentials have this property.

The subroutine to check whether a particular differential \((\Delta _{p_0}, \Delta _{c_0})\) is impossible can be done as follows. First, the SMT problem of whether there exists a characteristic over the cipher is encoded as in Sect. 4.2. However, only the validity constraints are added; the weight constraints and the target weight W are ignored. Second, the constraints that fix the input and output differences \((\Delta _p, \Delta _c)\) to \((\Delta _{p_0}, \Delta _{c_0})\) are added to the SMT problem, that is,

$$\begin{aligned} {\textsf {Equals}}(\Delta _{p}, \Delta _{p_0}), \quad {\textsf {Equals}}(\Delta _{c}, \Delta _{c_0})\,. \end{aligned}$$

Then, the SMT solver checks the satisfiability of the SMT problem. If the problem is found to be unsatisfiable, the differential is impossible.

This method can be used to search for single-key and related-key impossible differentials. For the former case, the validity constraints of the key schedule are ignored, while for the latter case they are included in the SMT problems.

As opposed to the previous SMT-based characteristic search method, the impossible check subroutine is a sound method. In other words, a characteristic found by Algorithm 2 could be invalid due to the independence assumptions, but a differential found impossible by the check subroutine is always impossible. While the check subroutine is a sound method, it is not complete; there are some impossible differentials that cannot be detected by the check subroutine.

While the check subroutine is fast, checking all differentials is infeasible and only a small subset can be checked with the method by [66]. Thus, we propose a new automated method to search for impossible differentials that does not restrict the search over any pre-defined small subset and let the SMT solver efficiently search through the space of differentials. Our automated method proceeds as follows.

First, we split the cipher \(E = E_2 \circ E_1 \circ E_0\) into three parts \(E_2, E_1\) and \(E_0\). Let \(\Omega = (\Delta _{x_0}, \Delta _{x_1}, \Delta _{x_2}, \Delta _{x_3})\) denote a partial characteristic over E, that is, any characteristic verifying

$$\begin{aligned} \Pr (\Delta _{x_0} \xrightarrow {E_0} \Delta _{x_1}) = 1, \quad \Pr (\Delta _{x_2} \xrightarrow {E_2} \Delta _{x_3}) = 1\,. \end{aligned}$$

Note that no relation is imposed between \(\Delta _{x_1}\) and \(\Delta _{x_2}\).

Then, we search for all partial characteristics using our SMT-based method from Sect. 4.1. For each partial characteristic \(\Omega = (\Delta _{x_0}, \Delta _{x_1}, \Delta _{x_2}, \Delta _{x_3})\), we apply the check subroutine to the differential \((\Delta _{x_1}, \Delta _{x_2})\) over \(E_1\). If \((\Delta _{x_1}, \Delta _{x_2})\) is found to be impossible over \(E_1\), then \((\Delta _{x_0}, \Delta _{x_3})\) is an impossible differential over E, since \((\Delta _{x_0}, \Delta _{x_1})\) and \((\Delta _{x_2}, \Delta _{x_3})\) are differentials with probability one (see Fig. 3).

Fig. 3
figure 3

The partial characteristic \(\Omega = (\Delta _{x_0}, \Delta _{x_1}, \Delta _{x_2}, \Delta _{x_3})\) over \(E = E_2 \circ E_1 \circ E_0\), alongside the condition that the inner part \((\Delta _{x_1}, \Delta _{x_2})\) over \(E_1\) is an impossible differential

Like the characteristic search method, we start searching for impossible differentials over a round-reduced version of the cipher and keep increasing the number of rounds iteratively. This procedure is described in Algorithm 3. While FindPartialCh in Algorithm 3 is responsible for finding partial characteristics, IsImpossible subroutine checks the corresponding inner differential to be impossible. Impossible differentials starting after a few rounds are useful in practice, and our method can easily be adapted by splitting the cipher into four parts, \(E = E_2 \circ E_1 \circ E_0 \circ E_{-1}\), where \(E_{-1}\) denotes the skipped rounds.

figure c

The main advantage of our method is that the subset of differentials to check does not need to be specified. Thus, it can find impossible differentials that other methods cannot. Moreover, the search of partial characteristics is quite fast, as for many operations f (including the modular addition and the constant addition) the constraint \({\textsf {Equals}}(0, {\textsf {weight}}_f(\Delta _x, \Delta _y))\) is much simpler than the constraint for the general case \({\textsf {Equals}}(w, {\textsf {weight}}_f(\Delta _x, \Delta _y))\).

As opposed to the search of characteristics, the search of \(r+1\)-round impossible differentials cannot reuse information obtained from the search of r-round impossible differentials. In other words, Algorithm 2 exploits the fact that if no r-round characteristics were found with weight w, then no \(r+1\)-round characteristics can be found with the same weight. However, for some key schedules, Algorithm 2 might find \(r+1\)-round impossible differentials even if no r-round impossible differentials were found.

4.4 Implementation

We have developed an open-source tool ArxPyFootnote 3 to find characteristics and impossible differentials of ARX ciphers implementing the methods described earlier. Originally, ArxPy was a tool to search for rotational-XOR characteristics using SMT solvers [64]. However, we have extended it to support (related-key) differential characteristics and impossible differentials containing the constant addition. ArxPy provides high-level functions that automate the search, a simple interface to represent ARX ciphers, and complete documentation in HTML format, among other features.

ArxPy workflow is represented in Fig. 4. The user first defines the ARX cipher using the interface provided by ArxPy and chooses the parameters of the search (e.g., the type of the characteristic to search, the SMT solver to use, etc.). Then, ArxPy automatically translates the python implementation of the ARX cipher into SSA form, encodes the SMT problems associated to the type of search selected by the user, and solves the SMT problems by querying the SMT solver. When searching for characteristics, for each satisfiable SMT problem found, ArxPy reconstructs the characteristic from the assignment of the variables that satisfies the problem and empirically verifies the weight of the characteristic. Finally, ArxPy returns the results of the search to the user.

Fig. 4
figure 4

Workflow of ArxPy

Internally, ArxPy is implemented in Python 3 and uses the libraries SymPy [55] to obtain the SSA representation through symbolic execution and PySMT [23] for the communication with the SMT solvers. Thus, all the SMT solvers supported by PySMT can be directly used for ArxPy.

5 Experiments

We have applied our methods for finding characteristics and impossible differentials to some ARX ciphers that include constant additions. In particular, we have searched for related-key characteristics and related-key impossible differentials of TEA, XTEA, HIGHT, LEA, SHACAL-1, and SHACAL-2.

Due to the difficulty of searching for characteristics of ciphers with constant additions this far, cipher designers have avoided constant additions in the encryption functions so that they could search for single-key characteristics, the most threatening ones. Only a few ciphers include constant additions in the encryption function, and their ad-hoc structures make them more suitable to be analysed with other types of differences, such as additive differences in the case of TEA [11]. As a result, we have focused on searching related-key characteristics and impossible differentials of some well-known ciphers.

Regarding the search for characteristics, we used Algorithm 2 to find related-key characteristics starting from the first round of each cipher. For the case of impossible differentials, we applied Algorithm 3 to search for related-key impossible differentials but skipping the first rounds of the cipher. To this end, we repeatedly call Algorithm 3 while increasing the number of skipped rounds in each call.

For related-key characteristics, the usual assumptions (i.e., round independence and the hypothesis of stochastic equivalence) do not always hold. Thus, we empirically verify each characteristic and stopped each round-reduced search after the first valid characteristic is found.

To verify a related-key characteristic \(\Omega \), we split \(\Omega \) in smaller characteristics \(\Omega _i = (\Delta _{x_i} \rightarrow \dots \rightarrow \Delta _{y_i})\) with weight \(w_i\) lower than 20, and empirically compute the probability of each differential \((\Delta _{x_i}, \Delta _{y_i})\) by sampling a small multiple of \(2^{w_i}\) input pairs for \(2^{10}\) related-key pairs. After combining the probability of each differential, we obtain \(2^{10}\) characteristic probabilities, one for each related-key pair. If the characteristic probability is non-zero for several key pairs, we consider the characteristic valid and we define its empirical probability (resp. weight) as the arithmetic mean of the \(2^{10}\) characteristic probabilities (resp. weights), but excluding those key pairs with zero probability.

Thus, for each characteristic that we have found, i.e. strong related-key characteristic (\(w_{{\textsf {KS}}}=0\)) and weak related-key characteristic (\(w_{{\textsf {KS}}}>0\)), Table 6 provides: (1) the theoretical key schedule and encryption weights \((w_{{\textsf {KS}}}, w_E)\), computed by summing the weight of each ARX operation; (2) the empirical key schedule and encryption weights \((\overline{w_{{\textsf {KS}}}}, \overline{w_E})\), computed by sampling input pairs as explained before; and (3) the percentage of the valid key pairs that empirically lead to non-zero probability in the weight verification. In the appendix, we provide the round weights and the round differences for the characteristics covering the most rounds.

Table 6 Best related-key differential characteristics of XTEA, HIGHT, LEA, SHACAL-1, and SHACAL-2
Table 7 Best related-key impossible differentials of XTEA, HIGHT, LEA, SHACAL-1, and SHACAL-2

When searching for impossible differentials with skipped rounds, Algorithm 3 splits the cipher into four parts. More specifically, the cipher E is represented as \(E = E_{2} \circ E_{1} \circ E_{0} \circ E_{-1}\), where \(E_{-1}\) denotes the skipped rounds, \(E_{1}\) stands for the rounds of the inner impossible differential, and \(E_{0}\) and \(E_{2}\) respectively denote the backward and the forward rounds of the partial characteristics. In Table 7 we provide the number of rounds of each part for the best related-key impossible differentials that we found, and in Table 8 we provide the input and output differences of our longest impossible differentials.

Apart from all of the best-known impossible differentials that are indicated in Table 7, we also implemented and searched impossible differentials using the automated method of Sasaki and Todo  [66] to compare the results with the ones observed by Algorithm 3. While for XTEA, LEA, and HIGHT, both methods find impossible differentials with the same number of rounds, for SHACAL-1 and SHACAL-2 Algorithm 3 achieves impossible differentials covering more rounds. For XTEA and HIGHT, the longest impossible differentials found by Algorithm 3 include a few active bits, and thus they could also be found by the other method. However, for SHACAL-1 and SHACAL-2, our algorithm found impossible differentials containing multiple active bits, which cannot be obtained by other methods that restrict to predefined differential subsets with a low number of active bits.

Table 8 Input, output, and key differences of our longest related-key impossible differentials

For the experiments, we have used ArxPy equipped with the SMT solver Boolector [62], winner of the SMT competition SMT-COMP 2019 in the bit-vector track [26]. We run the characteristic search for one week on a single core of an Intel Xeon 6244 at 3.60GHz. The search of impossible differentials was done on similar hardware during one week as well. Note that better characteristics and impossible differentials could be found if the round-reduced searches are not stopped after the first valid characteristic or if more time is employed.

5.1 TEA

Designed by Wheeler and Needham, TEA [73] is a block cipher with 64-bit block size and 128-bit key size. It iterates 64 times an ARX round function, including constant additions and logical shifts, depicted in Fig. 5. Since the logical shifts propagate XOR differences deterministically, the encoding method presented in Sect. 4.2 can be easily extended to include these operations.

Fig. 5
figure 5

The i-th round of TEA, \(i = 0, 1 \dots , 63\). The master key mk is split into four 32-bit words \((mk_0, mk_1, mk_2, mk_3)\) and the i-th round key is defined as \((k_{i, 0}, k_{i, 1}) = (mk_0, mk_1)\) if i is even and \((k_{i, 0}, k_{i, 1}) = (mk_2, mk_3)\) if i is odd. The i-th round constant is defined as \(\Delta _i = \Delta _{i-2} \boxplus \Delta _0\), where \(\Delta _{-1} = \Delta _{0} = 2654435769\)

Kelsey et al. [35] presented the best related-key characteristics in [35]. They found a 2-round iterative strong related-key characteristic \(\Omega \) with weight \((w_k, w_e) = (0, 1)\), which they extended to a 60-round characteristic with weight (0, 30). They also discovered in [34] that each TEA key has three other equivalent keys.

Using ArxPy, we revisited the results by Kelsey et al. [34], but in a fully automated way. We found three related-key characteristics with weight zero over the full cipher, confirming that each key is equivalent to exactly three other keys. Excluding these three characteristics, we also obtained a 60-round strong related-key characteristic with weight (0, 30), and all the 60-round SMT problems with smaller weights were found to be unsatisfiable. Since a 60-round related-key characteristic is sufficient to mount the related-key differential cryptanalysis on full-round TEA [35], there is no need to search for characteristics containing more rounds of TEA, and we stop at 60 rounds.

There is also no need to search for related-key impossible differentials of TEA, as each of the three full-round zero-weight related-key characteristics induces roughly \(2 \times 2^{64}\) full-round related-key impossible differentials, simply by alternating either the plaintext or the ciphertext difference.

5.2 XTEA

The block cipher XTEA [60] is designed by the same authors of TEA to fix the weakness of the former cipher (in the related-key setting). XTEA has a 64-bit block size and 128-bit key size, and it iterates 64 times the round function depicted in Fig. 6. Like TEA, the round function also includes logical shifts, but the constant additions are included in the key schedule.

Fig. 6
figure 6

The i-th round of XTEA, \(i = 0, 1 \dots , 63\). The master key mk is split into four 32-bit words \((mk_0, mk_1, mk_2, mk_3)\) and the i-th round key is defined as \(k_i = s_i \boxplus mk_{s_i \wedge 3}\) if i is even and \(k_i = s_i \boxplus mk_{(s_i \ll 11) \wedge 3}\) if i is odd. The i-th constant \(s_i\) is defined as \(s_i = s_{i-2} \boxplus s_0\), where \(s_{-1} = s_{0} = 2654435769\)

The longest related-key characteristics found so far are the 16-round strong related-key differential with weight 32, manually found by Lu in [52], and the 18-round weak related-key characteristic with weights \((w_{{\textsf {KS}}}, w_E)=(19, 19)\), manually found by Lee et al. [45] but later improved to (17, 19) by Lu [52].

The results of our automated search for related-key characteristics are listed in Table 6. In the strong related-key search, we found an 18-round characteristic with weight 57; all the SMT problems for 19 rounds were found to be unsatisfiable. In the weak related-key search, we found characteristics up to 27 rounds, where the 27-round characteristic has total weight \(6 + 40 = 46\). No equivalent keys were found for XTEA.

In our automated search for related-key impossible differentials of XTEA, we observed impossible differentials spanning 25 rounds, similar to the best impossible differential found this far by Darbuka [15]. Denoting the cipher XTEA with 25 rounds by \(E = E_2 \circ E_1 \circ E_0\), our related-key impossible differential contains a 13-round inner impossible differential over \(E_1\), extended by a deterministic 7-round backward trail over \(E_0\) and a deterministic 5-round forward trail over \(E_2\) as depicted in Table 7. Our automated tool was also able to complete the search of related-key impossible differentials up to 31 rounds, but no impossible differentials spanning more than 25 rounds were found.

5.3 HIGHT

Adopted as an international standard by ISO/IEC [33], HIGHT [31] is a lightweight cipher with a block size of 64 bits and a key size of 128 bits. The encryption function performs initial and final key whitening transformations, and iterates 32 times a round function including XORs, 2-input additions and rotations; the constant additions are performed in the key schedule.

Fig. 7
figure 7

The i-th round function of HIGHT, \(i = 0, 1 \dots , 31\) [31]. The i-th round key is denoted by \(k_i = (SK_{4i-1}, SK_{4i-2}, SK_{4i-3}, SK_{4i-4})\) and the functions \(F_0\) and \(F_1\) are defined as \(F_0(x) = (x \lll 1) \oplus (x \lll 2) \oplus (x \lll 7)\) and \(F_1(x) = (x \lll 3) \oplus (x \lll 4) \oplus (x \lll 6)\)

Fig. 8
figure 8

The key schedule of HIGHT [31]. The round key words are denoted by \(SK_{i}\) and the key schedule constants are denoted by \(\delta _j\)

The longest related-key characteristics found for HIGHT are a 10-round strong characteristic with weight 12 found by Lu [50], and a 12-round weak characteristic with weights \((w_{{\textsf {KS}}}, w_E) = (2, 19)\) found by Koo et al. [40]. In our automated search, we found related-key characteristics up to 15 rounds, listed in Table 6. The longest strong related-key characteristic we found covered 15 rounds with weights (0, 45), whereas the longest weak related-key characteristic covered 14 rounds with total weight \(13 + 14 = 27\).

Özen et al. introduced the best-known 22-round related-key impossible differential for HIGHT [63]. Using ArxPy, we found a new impossible differential covering the same number of rounds, as shown in Table 7. Our impossible differential consists of a 14-round inner impossible differential, extended by two zero-weight 4-round backward and forward related-key trails. The mentioned 22-round impossible differential is the longest related-key impossible differential that ArxPy could obtain in one week by checking up to 32 rounds of HIGHT.

5.4 LEA

Among the family of ARX ciphers LEA [32], we analysed LEA-128, the version with 128-bit block size, 24 rounds, and 128-bit key size. The encryption round function of LEA performs 2-input additions, rotations, and XORs, whereas the key schedule contains constant additions and rotations.

Fig. 9
figure 9

The i-th round function of the encryption (top) and the key schedule (bottom) of LEA-128, \(i = 0, 1 \dots , 23\). The tuple \((k_{i, 0}, \dots , k_{i, 3})\) denotes the i-th round key and \(\delta _i\) denotes the i-th key schedule constant

The designers of LEA found related-key characteristics up to 11 rounds, but only specifying that the 11-round characteristics are valid for a small part of the key space and without providing the weights of such characteristics [32]. Excluding these characteristics, there are no others examples of related-key characteristics of LEA. Our automated search found weak related-key characteristics up to 7 rounds valid for the full key space, listed in Table 6. Strong characteristics with weight smaller than 128 were found up to 4 rounds, and all the strong related-key SMT problems for 5 rounds were found unsatisfiable. No equivalent keys were found for LEA.

We applied ArxPy on LEA to automatically search for related-key impossible differentials. While our method completed the search for a large number of rounds, only impossible differentials with non-zero key difference spanning up to four rounds were found. The lack of related-key impossible differentials, with low Hamming weight or with key schedule transitions with probability 1, seems to be due to the heavy and robust key schedule of LEA. By and large, ciphers with lightweight key schedule algorithms tend to have longer related-key impossible differentials in comparison to their single-key counterparts. However, the key schedule and the round function of LEA are of the same complexity. Thus, finding an impossible differential that covers more rounds in related-key setting instead of single-key setting seems infeasible. We confirmed this by applying our tool to LEA in the single-key setting, obtaining multiple 10-round single-key impossible differentials within a few hours.

5.5 SHACAL-1

Based on the compression function of the NIST standard hash function SHA-1 [19], the block cipher SHACAL-1 was initially suggested in [27] and submitted by Handschuh and Naccache to the NESSIE project [61]. SHACAL-1 uses 160-bit block size and 80 rounds, where its round function is similar to the SHA-1 compression function. The key size can be variable from 0 to 512 bits, although a minimum of 128-bit key size is required in [28] and we analysed SHACAL-1 for 512-bit keys.

There are some ad-hoc differential characteristics presented in [18, 30, 36]. However, Wang et al. [72] indicated that many of the previous characteristics are invalid. The longest valid XOR differential characteristic is a 35-round weak related-key characteristic that appeared in [18] and was later corrected in [72] to obtain the corrected weights \((w_{{\textsf {KS}}}, w_E) = (10, 29)\). Moreover, the longest strong related-key XOR differential characteristic that is not found to be invalid by [72] spans 27 rounds of SHACAL-1 with the corresponding weights \((w_{{\textsf {KS}}}, w_E) = (0, 29)\) [36].

These characteristics do not necessarily start from the first round of SHACAL-1 since they are not used for a differential attack but rather for a rectangle attack [9]. Note that the round function of SHACAL-1 changes in different rounds (see Fig. 10), and these characteristics could take advantage of the variable definition of the round function. However, we are only looking for the characteristics starting from the first round and for the particular case of SHACAL-1 with variable round functions, we do not necessarily obtain the best possible characteristics.

Fig. 10
figure 10

The i-th round of SHACAL-1, \(i = 0, 1 \dots , 79\). The 160-bit input is divided into five 32-bit words \(A_i\), \(B_i\), \(C_i\), \(D_i\), and \(E_i\). The function \(f_i\) significantly changes regarding the round number. For a given 512-bit master key \(mk = (M_0, M_1, \cdots , M_{15})\), the round keys \(K_i\) are computed as described above, where \(W_i\) is the round constant

Our automated tool ArxPy obtained a weak-key 25-round characteristic with \((w_{{\textsf {KS}}}, w_E) = (1, 22)\). Moreover, the tool could also find a 30-round characteristic in the strong-key setting with the corresponding weights \((w_{{\textsf {KS}}}, w_E) = (0, 63)\). In our search, a large amount of SHACAL-1 characteristics found by the SMT solver did not pass our empirical validation test, which significantly increased the running time for finding each valid characteristic. More specifically and in the weak-key setting, we found more than 100 empirically invalid characteristics until we detected a valid 25-round one; we also obtained multiple 26-round weak characteristics within a week, but they were found invalid by our empirical test. We discarded more than 900 empirically invalid characteristics for the strong-key setting before finding a valid 30-round characteristic, and none of the 31-round trails obtained in a week could pass the test.

One of the main reasons for the large number of empirically invalid characteristics is the 5-input modular addition within the round function of SHACAL-1. Since the differential model of the modular addition with three or more inputs is unknown, we had to approximate the differential model of the 5-input addition with a chain of 2-input addition models. In other words, to model the 5-input addition \(y = x_1 \boxplus x_2 \boxplus x_3 \boxplus x_4 \boxplus x_5\), we split it into four 2-input additions

$$\begin{aligned} z_1 = x_1 \boxplus x_2, \ z_2 = z_1 \boxplus x_3, \ z_3 = z_2 \boxplus x_4, \ y = z_3 \boxplus x_5, \end{aligned}$$

and we model the four 2-input additions independently. Thus, we are approximating the differential probability of the 5-input addition

$$\begin{aligned} \Pr [(\Delta _{x_1}, \dots , \Delta _{x_5}) \xrightarrow {\boxplus } \Delta _y] \end{aligned}$$

with the multiplication of the differential probabilities of the four 2-input additions

$$\begin{aligned} \Pr [(\Delta _{x_1}, \Delta _{x_2}) \xrightarrow {\boxplus } \Delta _{z_1}] \ \times&\Pr [(\Delta _{z_1}, \Delta _{x_3}) \xrightarrow {\boxplus } \Delta _{z_2}] \ \times \\&\Pr [(\Delta _{z_2}, \Delta _{x_4}) \xrightarrow {\boxplus } \Delta _{z_3}] \times \Pr [(\Delta _{z_3}, \Delta _{x_5}) \xrightarrow {\boxplus } \Delta _{y}]. \end{aligned}$$

For many differentials, this approximation is not accurate, and this caused the appearance of many empirically invalid characteristics in our search.

As depicted in Table 7, our automated tool found the first known related-key impossible differential of SHACAL-1, extending to 30 rounds of the cipher from rounds 20 to 49. The backward and forward trails respectively traverse 2 and 12 rounds of SHACAL-1, delimiting the inner 16-round impossible differential. The search for 31-round impossible differentials did not finish after one week, and we stopped the search. Thus, we expect that dedicating more time to the search may result in obtaining longer impossible differentials.

Fig. 11
figure 11

The i-th round of SHACAL-2, \(i = 0, 1 \dots , 63\). The 256-bit input is divided into eight 32-bit words \(A_i\), \(B_i\), \(C_i\), \(D_i\), \(E_i\), \(F_i\), \(G_i\), and \(H_i\). The special operators used in the round function of SHACAL-2 are \({\textsf {If}}\), \({\textsf {Maj}}\), \(\Sigma _0\), and \(\Sigma _1\) that are defined as above. For a given 512-bit master key \(mk = (M_0, M_1, \cdots , M_{15})\), the key schedule generates round keys \(K_i\) as described in above, where \(W_i\) is the round constant

5.6 SHACAL-2

Similar to SHACAL-1, the block cipher SHACAL-2 [28] was designed based on the compression function of the NIST standard hash function SHA-256 [20]. The cipher was submitted to the NESSIE project [61] and was approved as one of the NESSIE final selections. SHACAL-2 is a 256-bit block cipher, has 64 rounds, and supports a variable key size up to 512 bits. We analysed SHACAL-2 for 512-bit keys.

The longest ad-hoc related-key XOR differential characteristic in the strong-key setting is a 24-round characteristic presented in [51] with \((w_{{\textsf {KS}}}, w_E) = (0, 38)\), which relies on some additional conditions on specific values alongside the differences to improve the weights. Moreover, in the weak-key setting, Biryukov et al. [12] provided two 24-round related-key XOR differential characteristics of SHACAL-2, each has encryption weight \(w_E = 52\). However, they did not explicitly mention the key schedule weight \(w_{{\textsf {KS}}}\) for each characteristic.

Our automated search resulted in a 23-round characteristic in the strong-key setting with encryption weight \(w_E = 58\) and a 22-round characteristic in the weak-key setting with total weight \(w_{{\textsf {KS}}} + w_E = 6 + 29 = 35\). Like SHACAL-1, our automated search could not find longer characteristics of SHACAL-2 within a week due to the large number of empirically invalid characteristics found. The round function of SHACAL-2 also contains a modular addition with multiple inputs (i.e., seven operands), and modelling it with 2-input additions is one of the main reasons for the inaccurate differential behaviour for some special differences.

Table 7 lists the results of the best related-key impossible differentials of SHACAL-2. The 18-round impossible differential presented by Yang et al. in [75] has been the longest known related-key impossible differential for SHACAL-2 so far. Our automated tool ArxPy obtained a 24-round impossible differential, improving the previous best result by 6 rounds. This impossible differential includes a 12-round inner impossible differential, extended by two deterministic 1-round backward and 11-round forward trails. We checked up to 28-rounds of SHACAL-2 in one week, and the longest impossible differential we observed was the 24-round related-key impossible differential.

6 Conclusion

In this paper, we proposed the first bit-vector differential model of the n-bit modular addition with a constant. We described a bit-vector formula, with bit-vector complexity O(1), that determines whether a differential is valid and a bit-vector function, with complexity \(O(\log _2 n)\), that provides a close approximation of the differential weight. In this regard, we carefully studied our approximation error and obtained almost tight bounds. Moreover, we described two new SMT-based automated methods to search for characteristics and impossible differentials of ARX ciphers including constant additions, respectively.

Each of our methods formulates the search problem as a sequence of bit-vector SMT problems, encoded from the cipher’s SSA representation and the bit-vector differential models of each operation. We have implemented our methods in ArxPy, an open-source tool to find characteristics and impossible differentials of ARX ciphers in a fully automated way. To show some examples, we have applied our automated methods to search for equivalent keys, related-key characteristics, and related-key impossible differentials of TEA, XTEA, HIGHT, LEA, SHACAL-1, and SHACAL-2.

Regarding the characteristic results, for TEA we revisited previous results obtained in a manual approach. In contrast, for XTEA, HIGHT, and LEA, we improved the previous best-known related-key characteristics in both the strong-key and the weak-key settings. Our characteristic results of SHACAL-1 and SHACAL-2 did not outperform previous works in all settings due to the presence of modular additions with more than two inputs, for which no efficient differential model has been proposed yet.

Concerning the impossible differentials, our results for TEA, XTEA, and HIGHT are of the same length, compared to the best-known related-key impossible differentials. On the other hand, we obtained the longest related-key impossible differentials for LEA, SHACAL-1, and SHACAL-2.

Our differential model relies on a bit-vector-friendly approximation on the binary logarithm. Thus, future works could explore other approximations improving the bit-vector complexity or the approximation error, which could also be applied to other SMT problems involving the binary logarithm. While we have focused on the modular addition by a constant, there are other simple operations for which no differential model has been proposed so far, such as the modular multiplication, and the modular addition with more than two inputs. Obtaining differential models for more operations will allow designing ciphers with more flexibility, leading to new designs that potentially are more efficient.