Keywords

1 Introduction

Programming problem is a mathematical optimization which aims to achieve the minimal or maximal value of an objective function under certain constraints, and it has a wide range of applications from industry to academic community. Mixed Integer Linear Programming (MILP) is a kind of programming problem whose objective function and constraints are linear, and all or some of the variables involved in the problem are restricted to be integers. In recent years, MILP has found its applications in cryptographic community. Mouha et al. [11] and Wu et al. [21] applied MILP method to automatically count differential and linear active Sboxes for word-based block ciphers, which can be used to evaluate the resistance of block ciphers against differential and linear attacks. Later Sun et al. [13] extended this technique to count active Sboxes of SP-network block ciphers whose linear layer is a bit permutation.

Recently, this technique was improved [15] to search differential characteristics and linear trails with a minimal number of active Sboxes. They constructed the MILP model by a small number of linear inequalities chosen from the H-Representation of the convex hull of a set of points which are derived from the difference distribution (resp. linear approximation) table of Sbox. However, this method may result in invalid differential characteristics (resp. linear trails). Moreover, differential characteristic (resp. linear trail) with a minimal number of active Sboxes does not alway result in differential characteristic (resp. linear trail) with highest probability. To solve these problems, Sun et al. [14] encoded the probability of differentials (resp. linear approximations) of Sbox into the MILP model and they proved that it is always feasible to choose a set \(\mathcal {L}\) of linear inequalities from the H-Representation of the convex hull of a set of points A, such that the feasible solutions of \(\mathcal {L}\) are exactly the points in A. Thus, by adding \(\mathcal {L}\) into the model and setting the probability as objective function, the MILP optimizer will always return (if the MILP problem can be solved in limited time) a valid differential characteristic (resp. linear trail) with highest probability.

Division property is a generalized integral property introduced by Todo [18] at EUROCRYPT 2015 to search integral distinguishers of block cipher structures which is the core part of integral cryptanalysis [4, 7, 8, 10]. Todo studied propagation rules of division property through different block cipher operations and presented generalized algorithms to search integral distinguishers which only exploits the algebraic degree of nonlinear components of the block cipher. By using division property, Todo presented 10-, 12-, 12-, 14- and 14-roundFootnote 1 integral distinguishers for SIMON32, 48, 64, 96 and 128 respectively. For PRESENT cipher a 6-round integral distinguisher was found. Later at CRYPTO 2015 Todo [17] proposed a full-round integral attack of MISTY1 based on a 6-round integral distinguisher. Sun et al. [12] revisited division property, and they studied the property of a set (multiset) satisfying certain division property. At CRYPTO 2016, Boura and Canteaut [6] proposed a new notion which they called parity set to study division property, based on which they found better integral distinguisher for PRESENT cipher.

Very recently, Todo et al. [19] introduced bit-based division property at FSE 2016 which treats each bit independently in order to find better integral distinguishers. They applied this technique to SIMON32, and as a result a 14-round integral distinguisher for SIMON32 was found. However, as pointed out in [19], searching integral distinguisher by bit-based division property required much more time and memory. For a block cipher with block size n, the time and memory complexity is upper bounded by \(2^n\). Thus, bit-based division property can only apply to block ciphers with block size at most 32. For block ciphers with a much larger block size, searching integral distinguisher by bit-based division property under Todo et al.’s framework would be computationally infeasible. Thus, Xiang et al. [24] proposed a state partition to get a tradeoff between the time-memory complexity and the accuracy of the integral distinguisher, and they improved distinguishers of SIMON48 and SIMON64 by one round for both variants.

1.1 Our Contributions

In this paper, we present a novel technique to search integral distinguishers based on bit-based division property by using MILP method. First we propose a new notion that we call division trail to illustrate division property propagation. We show that each division property propagation can be represented by division trails, furthermore, we have proved that it is sufficient to check the last vectors of all division trails in order to estimate whether a useful distinguisher exists. Based on this observation we construct a linear inequality system for a given block cipher such that all feasible solutions of this linear inequality system are exactly all the division trails. Thus, the constructed linear inequality system is sufficient to describe the division property propagations. Then, we study the stopping rule in division property propagation. The stopping rule determines whether the resulting division property can be propagated further to find a longer integral distinguisher. It is observed that for a division property propagation, if the resulting vectors for the first time contain all the vectors of Hamming weight one after propagating \(r +1\) rounds, the propagation procedure should terminate and an r-round distinguisher can be derived. Hence, we set the sum of the coordinates of the last vector of r-round division trail as objective function. By combining this objective function and the linear inequality system derived from the division trails, we construct an MILP problem and present an algorithm to estimate whether r-round distinguisher exists given some initial division property. To illustrate our new technique, we run experiments (all the MILP problems in our experiments are solved by the openly available software Gurobi [1]) on SIMON, SIMECK, PRESENT, RECTANGLE, TWINE, LBlock:

  1. 1.

    For SIMON [3] family block ciphers, we first model division property propagations through Copy, And and Xor operations by linear inequalities, since those operations are the basic operations in SIMON family. By using these inequalities we construct an MILP problem and serve it in our search algorithm. As a result we found 14-, 16-, 18-, 22- and 26-round integral distinguishers for SIMON32, 48, 64, 96 and 128 respectively. For SIMON48, 64, 96 and 128, our results are 2, 1, 1 and 1 more rounds than the previous results in [27]. SIMECK [25] is a family of lightweight block ciphers whose round function is very similar to SIMON except the rotation constants. We applied our search technique to SIMECK and we found 15-, 18- and 21-round distinguishers for SIMECK32, 48 and 64 respectively.

  2. 2.

    PRESENT [5] and RECTANGLE [28] are two SP-network lightweight block ciphers whose linear layers are bit permutations. Unlike SIMON, these two ciphers are Sbox-based block ciphers. In [17, 18], Sbox is treated as a whole, that is for an n-bit Sbox the input value to the Sbox is viewed as a value of \(\mathbb {F}_2^n\). In this paper we study bit-based division property propagation of Sbox, and we present an algorithm to compute division trails of Sbox. We observed that, considering bit-based division property could preserve more integral property along with division property propagation through Sbox. By converting division trails of Sbox layer into a set of linear inequalities we construct MILP models for PRESENT and RECTANGLE, as a result, we found 9-round distinguishers for both ciphers which are two more rounds than the best integral distinguishers in the literature.

  3. 3.

    TWINE [16] and LBlock [23] are two generalized Feistel structure block ciphers. By modeling Sbox, Copy and Xor with linear inequalities, we apply our technique to these two ciphers and we found 16-round distinguishers which are in accordance with the results in [26].

Our results are listed in Table 1. All the ciphers explored above except SIMON32 have a block size larger than 32, and searching integral distinguishers by bit-based division property under Todo’s framework is computationally infeasible for those ciphers. Note that all our experiments are conducted on a desktop and the consuming time varies from seconds to minutes which is very efficient, the details are listed in Table 1. Moreover, by converting the search algorithm into MILP problems, we can find better integral distinguishers for SIMON48/64/96/128, SIMECK48/64, PRESENT and RECTANGLE.

Table 1. Results on some block ciphers.

The rest of the paper is organized as follows: In Sect. 2 we introduce some basic background which will be used later. Section 3 studies how to model some basic operations and components used in block cipher, and to construct a linear inequality system to accurately describe the division property propagations. Section 4 studies the stopping rule and a search algorithm will be presented in this section. Section 5 shows some applications of the technique, and we conclude in Sect. 6.

2 Preliminaries

2.1 Notations

Let \(\mathbb {F}_2\) denote the finite field with only two elements and \(\mathbb {F}_2^n\) denote the n-bit string over \(\mathbb {F}_2 \). Let \(\mathbb {Z}\) and \(\mathbb {Z}^n\) denote the integer ring and the set of all vectors whose coordinates are integers respectively. For any \(a\in \mathbb {F}_2^n \), let a[i] denote the i-th bit of a, and the Hamming weight of a is calculated as \(\sum _{i= 0}^{n-1} a[i] \). For any \(\varvec{a}= (a_0, \cdots , a_{m-1}) \in \mathbb {F}_2^{n_0}\times \dots \times \mathbb {F}_2^{n_{m-1}}\), the vectorial Hamming weight of \(\varvec{a}\) is defined as \(W(\varvec{a}) = (w(a_0), \cdots , w(a_{m-1}))\) where \(w(a_i)\) is the Hamming weight of \(a_i\). Let \(\varvec{k}=(k_0,k_1,\cdots ,k_{m-1})\) and \(\varvec{k^{*}}=(k^{*}_0,k_1^{*},\cdots ,k_{m-1}^{*})\) be two vectors in \(\mathbb {Z}^m\). Define \(\varvec{k} \succeq \varvec{k^{*}}\) if \(k_i \ge k^{*}_i\) holds for all \(i=0,1,\cdots ,m-1\). Otherwise we write \(\varvec{k} \nsucceq \varvec{k^{*}}\).

Bit Product Function \(\pi _u(x)\) and \(\varvec{\pi _u}(\varvec{x})\): For any \(u \in \mathbb {F}_2^n \), let \(\pi _u(x)\) be a function from \(\mathbb {F}_2^n\) to \(\mathbb {F}_2 \). For any \(x\in \mathbb {F}_2^n \), define \(\pi _u(x)\) as follows:

$$\pi _u(x) = \prod _{i=0}^{n-1} x[i]^{u[i]}$$

Let \(\varvec{\pi _u}(\varvec{x})\) be a function from \(( \mathbb {F}_2^{n_0} \times \mathbb {F}_2^{n_1} \times \cdots \times \mathbb {F}_2^{n_{m-1}})\) to \(\mathbb {F}_2\) for all \(\varvec{u} \in (\mathbb {F}_2^{n_0}\times \mathbb {F}_2^{n_1}\times ,\cdots ,\times \mathbb {F}_2^{n_{m-1}})\). For any \(\varvec{u} = (u_0, u_1, \cdots , u_{m-1}), \varvec{x}=(x_0, x_1,\cdots ,x_{m-1})\in (\mathbb {F}_2^{n_0}\times \mathbb {F}_2^{n_1}\times ,\cdots ,\times \mathbb {F}_2^{n_{m-1}}) \), define \(\varvec{\pi _u}\)(\(\varvec{x}\)) as follows:

$$\varvec{\pi _u}(\varvec{x}) = \prod _{i = 0}^{m-1}\pi _{u_i}(x_i) $$

2.2 Division Property

Division property [18] is a generalized integral property which can exploit the properties hidden between traditional integral properties \(\mathcal {A}\) and \(\mathcal {B}\). Thus, by propagating division property we desire to get some better distinguishers. In the following we will introduce division property and present some propagation rules.

Definition 1

(Division Property [17]). Let \(\mathbb {X}\) be a multiset whose elements take a value of \((\mathbb {F}_2^{n})^m\), and \(\varvec{k}\) be an m-dimensional vector whose coordinates take values between 0 and n. When the multiset \(\mathbb {X}\) has the division property \(\mathcal {D}_{\varvec{k^{(0)}}, \varvec{k^{(1)}}, \cdots , \varvec{k^{(q-1)}}}^{n,m}\), it fulfills the following conditions: The parity of \(\pi _{\varvec{u}}(\varvec{x} )\) over all \(\varvec{x}\in \mathbb {X}\) is always even when

$$\begin{aligned} \varvec{u}\in \left\{ (u_0, u_1, \cdots , u_{m-1})\in (\mathbb {F}_2^{n})^m| W(\varvec{u})\nsucceq \varvec{k^{(0)}},\cdots ,W(\varvec{u})\nsucceq \varvec{k^{(q-1)}} \right\} \end{aligned}$$

Proposition 1

(Copy [17]). Denote \(\mathbb {X}\) an input multiset whose elements belong to \(\mathbb {F}_2^n\), and let \(x\in \mathbb {X}\). The copy function creates \((y_0, y_1)\) from x where \(y_0 =x, y_1 = x\). Assuming the input multiset has division property \(\mathcal {D}_{k}^{n}\), let \(\mathbb {Y}\) be the corresponding output multiset, then \(\mathbb {Y}\) has division property \(\mathcal {D}_{(0,k),(1,k-1),\cdots ,(k,0)}^{n,2}\).

Proposition 2

(Compression by And [24]). Denote \(\mathbb {X}\) an input multiset whose elements belong to \(\mathbb {F}_2^n\times \mathbb {F}_2^n \), let \( (x_0,x_1)\in \mathbb {X}\) be an input to the compression function and denote the ouput value by y where \( y = x_0 \& x_1\). Let \(\mathbb {Y}\) be the corresponding output multiset. If input multiset \(\mathbb {X}\) has division property \(\mathcal {D}_{\varvec{k}}^{n,2}\) where \( \varvec{k}= (k_0,k_1) \), then the division property of \(\mathbb {Y}\) is \(\mathcal {D}_{k}^{n}\) where \( k = \max \{k_0, k_1\}\).

Proposition 3

(Compression by Xor [17]). Denote \(\mathbb {X}\) an input multiset whose elements belong to \(\mathbb {F}_2^n\times \mathbb {F}_2^n \), let \( (x_0,x_1)\in \mathbb {X}\) be an input to the compression function and denote the ouput value by y where \( y = x_0\oplus x_1\). Let \(\mathbb {Y}\) be the corresponding output multiset. If input multiset \(\mathbb {X}\) has division property \(\mathcal {D}_{\varvec{k}}^{n,2}\) where \( \varvec{k}= (k_0,k_1) \), then the division property of \(\mathbb {Y}\) is \(\mathcal {D}_{k_0 + k_1}^{n}\).

Proposition 4

(Substitution [17]). Denote \(\mathbb {X}\) an input multiset whose elements belong to \(\mathbb {F}_2^{n_1}\), let F be a substitution function (Sbox) with algebraic degree d and F maps an element in \(\mathbb {F}_2^{n_1}\) to an element in \(\mathbb {F}_2^{n_2}\), denote \(\mathbb {Y}\) the corresponding output multiset \( F(\mathbb {X}) \). Assuming the input multiset has division property \(\mathcal {D}_k^{n_1}\), then the output multiset has division property \(\mathcal {D}^{n_2}_{\left\lceil \frac{k}{d} \right\rceil }\). Moreover, if \( n_1 = n_2 \) and the substitution function is bijective, assuming the input multiset has division property \(\mathcal {D}^{n_1}_{n_1}\), then the output multiset has division property \(\mathcal {D}^{n_1}_{n_1}\)

For more details regarding division property we refer the readers to [1719].

2.3 Modeling a Subset in \(\{0,1\}^n\) by Linear Inequalities

Convex Hull and H-Representation: The convex hull of a set A of points is the smallest convex set that contains A, and the H-Representation of a convex set is a set of linear inequalities \(\mathcal {L}\) corresponding to the intersection of some halfspaces such that the feasible solutions of \(\mathcal {L}\) are exactly the convex set.

In [14, 15] Sun et al. treat a differential \( (x_{u-1},\cdots ,x_0)\rightarrow (y_{v-1},\cdots ,y_0) \) of an \( u\times v \) Sbox as an \( (u + v) \)-dimensional vector \( (x_{u-1},\cdots ,x_0,y_{v-1},\cdots ,y_0) \). By computing the H-Representation of the convex hull of all possible input-output differential pairs of an Sbox, a set of linear inequalities will be returned to characterize the differential propagation. Moreover, they proved that for a given subset A of \( \{0,1\}^n \), it is always feasible to choose a set of linear inequalities \(\mathcal {L}\) from the H-Representation of the convex hull of A, such that A represents all feasible solutions of \(\mathcal {L}\) restricted in \( \{0,1\}^n \).

Theorem 1

([14]). Let A be a subset of \( \{0,1\}^n \), and denote \(\mathrm {Conv}(A)\) the convex hull of A. For any \( x\in \{0,1\}^n \), \( x\in \mathrm {Conv}(A) \) if and only if \( x\in A \).

Thus, they first computed a set of vectors A which is composed of all differential pairs of a given Sbox, and then calculated the H-Representation of the convex hull of A by using the inequality_generator() function in the Sage [2] software, and this will return a set of linear inequalities \(\mathcal {L}\) which are the H-presentation of Conv(A). According to Theorem 1, \(\mathcal {L}\) is an accurate description of the difference propagations of the given Sbox, that is, all feasible solutions of \(\mathcal {L}\) restricted in \( \{0,1\}^n \) are exactly A. Since \(\mathcal {L}\) is the H-Representation of Conv(A), each possible differential characteristic corresponds to a point in A, thus, each possible differential characteristic satisfies the linear inequalities in \(\mathcal {L}\). On the other hand, for any impossible differential characteristic id, there always exists at least one linear inequality in \(\mathcal {L}\) such that id does not satisfy this inequality. Otherwise, if id satisfies all the inequalities in \(\mathcal {L}\) which indicates id belongs to Conv(A), and this is equivalent to \( id\in A \).

Since \(\mathcal {L}\) is an accurate description of A, adding all the linear inequalities in \(\mathcal {L}\) into the MILP problem when searching differential characteristics of a block cipher, it will always return valid differential characteristics. However, the number of linear inequalities in the H-Representation of Conv(A) is often very large such that adding all the inequalities into the MILP model will make the problem computationally infeasible. Thus, Sun et al. [14] proposed a greedy algorithm (See Algorithm 1) to select a subset of \(\mathcal {L}\) whose feasible solutions restricted in \( \{0,1\}^n \) are exactly A. This algorithm can greatly reduce the number of inequalities required to accurately describe A.

In order to illustrate the procedure of this section, we present a toy example in Appendix A.

figure a

3 Modeling Division Property Propagations of Basic Operations and Sbox by Linear Inequalities

In [18] Todo introduced division property by using some vectors in \(\mathbb {Z}^m \), and the propagation of division property through a round function of the block cipher is actually a transition of the vectors. Given an initial division property \(\mathcal {D}^{n, m}_{\varvec{k}}\), let \( f_r \) denote the round function of a block cipher, the division property of the state after one round \( f_r \) can be computed from \(\mathcal {D}^{n,m}_{\varvec{k}}\) by the rules introduced in [17, 18], and denote the division property after one round \( f_r \) by \(\mathcal {D}^{n,m}_{\mathbb {K}}\) where \(\mathbb {K}\) is a set of vectors in \(\mathbb {Z}^m \). Thus, the division property propagation through \( f_r \) is actually the transition from \( \varvec{k}\) to the vectors in \(\mathbb {K}\). Traditionally, if two vectors \( \varvec{k}_1 \) and \( \varvec{k}_2 \) in \(\mathbb {K}\) satisfying that \( \varvec{k}_1 \succeq \varvec{k}_2 \), then \( \varvec{k}_1 \) is redundant and will be removed from \(\mathbb {K}\). However, since the redundant vectors do not influence the division property, in this paper we do not remove redundant vectors in \(\mathbb {K}\), that is for any vector derived from \( \varvec{k}\) by using the propagation rules we add this vector into \(\mathbb {K}\). Moreover, for any vector \(\bar{\varvec{k}}\) in \(\mathbb {K}\), we call that \( \varvec{k}\) can propagate to \(\bar{\varvec{k}}\) through \( f_r \).

Definition 2 (Division Trail)

Let \( f_r \) denote the round function of an iterated block cipher. Assume the input multiset to the block cipher has initial division property \(\mathcal {D}^{n,m}_{\varvec{k}}\), and denote the division property after i-round propagation through \( f_r \) by \(\mathcal {D}^{n,m}_{\mathbb {K}_i}\). Thus, we have the following chain of division property propagations:

$$\begin{aligned} \{\varvec{k}\} \overset{def}{=} \mathbb {K}_0 \overset{f_r}{\longrightarrow } \mathbb {K}_1 \overset{f_r}{\longrightarrow } \mathbb {K}_2\overset{f_r}{\longrightarrow }\cdots \end{aligned}$$

Moreover, for any vector \( \varvec{k}_i^* \) in \(\mathbb {K}_i \) \( (i \ge 1) \), there must exist an vector \( \varvec{k}_{i-1}^* \) in \(\mathbb {K}_{i-1}\) such that \( \varvec{k}_{i-1}^* \) can propagate to \( \varvec{k}_i^* \) by division property propagation rules. Furthermore, for \( (\varvec{k}_0, \varvec{k}_1, \cdots ,\varvec{k}_r) \in \mathbb {K}_0\times \mathbb {K}_1\times \cdots \times \mathbb {K}_r\), if \( \varvec{k}_{i-1}\) can propagate to \( \varvec{k}_i \) for all \( i \in \{1,2,\cdots ,r\}\), we call \( (\varvec{k}_0, \varvec{k}_1,\cdots ,\varvec{k}_r) \) an r-round division trail.

Proposition 5

Denote the division property of input multiset to an iterated block cipher by \(\mathcal {D}^{n,m}_{\varvec{k}}\), let \( f_r \) be the round function. Denote

$$\begin{aligned} \{\varvec{k}\} \overset{def}{=} \mathbb {K}_0 \overset{f_r}{\longrightarrow } \mathbb {K}_1 \overset{f_r}{\longrightarrow } \mathbb {K}_2\overset{f_r}{\longrightarrow }\cdots \overset{f_r}{\longrightarrow }\mathbb {K}_r \end{aligned}$$

the r-round division property propagation. Thus, the set of the last vectors of all r-round division trails which start with \( \varvec{k}\) is equal to \(\mathbb {K}_r \).

Generally, given an initial division property \(\mathcal {D}_{\varvec{k}}^{n,m}\), and if one would like to check whether there exists useful integral property after r-round encryption, we have to propagate the initial division property for r rounds to get \(\mathcal {D}_{\mathbb {K}_r}^{n,m}\) and check all the vectors in \(\mathbb {K}_r \). According to Proposition 5, it is equivalent to find all r-round division trails which start with \( \varvec{k}\), and check the last vectors in the division trails to judge if any exploitable distinguisher can be extracted. Based on this observation, in the following we focus on how to accurately describe all division trails.

A linear inequality system will be adopted to describe division property propagations, that is we will construct a linear inequality system such that the feasible solutions represent all division trails. Since division property propagation is a deterministic procedure, the constructed linear inequality system must satisfy:

  • For each division trail, it must satisfy all linear inequalities in the linear inequality system. That is each division trail corresponds to a feasible solution of the linear inequality system.

  • Each feasible solution of the linear inequality system corresponds to a division trail. That is all feasible solutions of the linear inequality system do not contain any impossible division trail.

A linear inequality system satisfying the above two conditions is an accurate description of division property propagation. In the rest of the paper, we only consider bit-based division property. We start by modeling bit-based division property propagation of some basic operations and Sbox in block ciphers.

3.1 Modeling Copy, And and Xor

In this subsection, we show how to model bit-wise Copy, And and Xor operations by linear inequalities.

Modeling Copy. Copy operation is the basic operation used in Feistel block cipher. The left half of the input is copied into two equal parts, one of which is fed to the round function. Since we consider bit-based division property, the division property propagation of each bit is independent of each other. Thus, we consider only a single bit.

Let \(\mathbb {X}\) be an input multiset whose elements take a value of \(\mathbb {F}_2 \). The copy function creates \( y =(y_0, y_1) \) from \( x \in \mathbb {X}\) where \( y_0 = x \) and \( y_1 = x\). Assuming the input multiset has division property \(\mathcal {D}_k^1 \), then the corresponding output multiset has division property \(\mathcal {D}_{(0,k),\cdots , (k,0)}^1 \) from Proposition 1. Since we consider bit-based division property, the input multiset division property \(\mathcal {D}_k^1 \) must satisfy \( k\le 1 \). If \( k = 0 \), the output multiset has division property \(\mathcal {D}^1_{(0,0)}\), otherwise if \( k = 1 \), the output multiset has division property \(\mathcal {D}^1_{(0,1),(1,0)}\). Thus, \( (0)\overset{copy }{\longrightarrow } (0,0) \) is the only division trail given the initial division property \(\mathcal {D}^1_0 \), and \((1)\overset{copy }{\longrightarrow }(0,1) \), \((1)\overset{copy }{\longrightarrow }(1,0) \) are the two division trails given the initial division property \(\mathcal {D}_1^1 \).

Now we are ready to give a linear inequality description of these division trails. Denote \( (a)\overset{copy }{\longrightarrow } (b_0,b_1)\) a division trail of Copy function, the following inequalityFootnote 2 is sufficient to describe the division propagation of Copy.

$$\begin{aligned} {\left\{ \begin{array}{ll} a - b_0 - b_1 = 0\\ a, b_0, b_1 \text { are binaries}\\ \end{array}\right. } \end{aligned}$$
(1)

Apparently, all feasible solutions of the inequalities in (1) corresponding to \((a, b_0, b_1)\) are (0, 0, 0), (1, 0, 1) and (1, 1, 0), which are exactly the three division trails of Copy function described above.

Modeling And. Bit-wise And operation is a basic nonlinear function, it is the only nonlinear operation for SIMON family. Similar to the modeling procedure of Copy function, we can express its division property propagation as a set of linear inequalities.

Let \(\mathbb {X}\) be an input multiset whose elements take a value of \(\mathbb {F}_2\times \mathbb {F}_2 \). The And function creates \( y = x_0 \& x_1 \) from \( x = (x_0,x_1)\in \mathbb {X}\). Assuming the input multiset has division property \(\mathcal {D}_{\varvec{k}}^{1,2}\) where \( \varvec{k} = (k_0, k_1) \), the division property of the corresponding output multiset is \(\mathcal {D}_{k}^{1}\) where \(k = \max \{k_0, k_1\}\) according to Proposition 2. Since we consider bit-based division property here, \( \varvec{k}=(k_0,k_1) \) must satisfy \( 0\le k_0,k_1\le 1 \). Thus, there are four division trails for And function which are \((0,0)\overset{Xor }{\longrightarrow }(0), (0,1)\overset{Xor }{\longrightarrow }(1), (1,0)\overset{Xor }{\longrightarrow }(1)\) and \((1,1)\overset{Xor }{\longrightarrow }(1)\). Denote \( (a_0,a_1)\overset{and }{\longrightarrow }(b) \) a division trail of And function, the following linear inequalities are sufficient to describe this propagation features.

$$\begin{aligned} {\left\{ \begin{array}{ll} b - a_0 \ge 0\\ b - a_1 \ge 0\\ b - a_0 - a_1 \le 0\\ a_0,a_1,b \text { are binaries} \end{array}\right. } \end{aligned}$$
(2)

It is easy to check that all feasible solutions of the inequalities in (2) corresponding to \((a_0, a_1, b)\) are (0, 0, 0), (0, 1, 1), (1, 0, 1) and (1, 1, 1) , which are exactly the four division trails of And function described above.

Modeling Xor. Bit-wise Xor is another basic operation used in block ciphers. Similarly, a linear inequality system can be constructed to describe the division property propagation through Xor function.

Let \(\mathbb {X}\) denote an input multiset whose elements take a value of \(\mathbb {F}_2\times \mathbb {F}_2 \). The Xor function creates \( y = x_0\oplus x_1 \) from \( x=(x_0,x_1)\in \mathbb {X}\). Assuming the input multiset \(\mathbb {X}\) has division property \(\mathcal {D}_{\varvec{k}}^{1,2}\) where \( \varvec{k}=(k_0,k_1) \), thus, the corresponding output multiset \(\mathbb {Y}\) has division property \(\mathcal {D}^1_{k_0+k_1}\). Since we consider bit-based division property here, \( \varvec{k}=(k_0,k_1) \) must satisfy \( 0\le k_0,k_1\le 1 \). Moreover, the element of \(\mathbb {Y}\) takes a value in \(\mathbb {F}_2 \), the division property \(\mathcal {D}^1_{k_0+k_1}\) of \(\mathbb {Y}\) must satisfy \( k_0 + k_1 \le 1 \). That is, if \( (k_0, k_1)=(1, 1) \), the division property propagation will abort. Thus, there are three valid division trails: \( (0,0)\overset{Xor }{\longrightarrow }(0) \), \( (0,1)\overset{Xor }{\longrightarrow }(1) \) and \( (1,0)\overset{Xor }{\longrightarrow }(1) \). Let \( (a_0,a_1)\overset{Xor }{\longrightarrow }(b) \) denote a division trail through Xor function, the following inequality can describe the division trail through Xor function.

$$\begin{aligned} {\left\{ \begin{array}{ll} a_0 + a_1 - b = 0\\ a_0,a_1,b \text { are binaries} \end{array}\right. } \end{aligned}$$
(3)

We can check that all the feasible solutions of inequality (3) corresponding to \((a_0, a_1, b)\) are (0, 0, 0) , (0, 1, 1) and (1, 0, 1) , which are exactly the division trails described above.

3.2 Modeling Sbox

Sbox is an important component of block ciphers, for a lot of block ciphers it is the only non-linear part. In [17, 18], the Sbox is treated as a whole and the division property is considered while the element in the input multiset taking a value in \(\mathbb {F}_2^n \) for an n-bit Sbox. In [19] Todo et al. introduced bit-based division property, but they only applied their technique to non-Sbox based ciphers SIMON and SIMECK. In this section, we study bit-based division property propagation through Sbox.

Assume we are dealing with an n-bit Sbox, the input and output of the Sbox are elements in \( (\mathbb {F}_2)^n\). Suppose that the input multiset \(\mathbb {X}\) has division property \(\mathcal {D}_{\varvec{k}}^{1,n}\) where \( \varvec{k}=(k_0, k_1, \cdots ,k_{n-1}) \), that is for any \( \varvec{u}\in (\mathbb {F}_2)^n \) the parity of \( \varvec{\pi _u}(\varvec{x}) \) over \(\mathbb {X}\) is even only if \( W(\varvec{u})\nsucceq \varvec{k}\). Note that for bit-based division property it holds \( W(\varvec{u}) = \varvec{u}\), thus, we do not make a distinction between \( W(\varvec{u}) \) and \( \varvec{u}\) in the following. To compute the division property of the output multiset \(\mathbb {Y}\), we first consider a naive approach.

Previous Approach. First by Concatenation function, each element in \(\mathbb {X}\) can be converted into an element in \(\mathbb {F}_2^n \). Denote output multiset of Concatenation function as \(\mathbb {X}^* \), thus, the division property of \(\mathbb {X}^* \) is \(\mathcal {D}^{n}_{k_0+k_1+\cdots + k_{n-1}}\) according to Rule 5 in [17]. Secondly, we pass each element in \(\mathbb {X}^* \) to the Substitution function Sbox, and denote the output multiset by \(\mathbb {Y}^* \) whose elements take a value of \(\mathbb {F}_2^n \). According to Proposition 4, the division property of \(\mathbb {Y}^* \) is \(\mathcal {D}^n_{\left\lceil \frac{k_0 + k_1 + \cdots + k_{n-1}}{d}\right\rceil }\) where d is the algebraic degree of the Sbox. At last, for any value \( y^* = y_0||y_1||\cdots ||y_{n-1}\) in \(\mathbb {Y}^* \), a Split function creates \( y = (y_0, y_1, \cdots , y_{n-1}) \) from \( y^* \). Apparently, the output multiset of Split function equals to \(\mathbb {Y}\). According to Rule 4 in [17], the division property of \(\mathbb {Y}\) is \(\mathcal {D}^{1,n}_{\varvec{k^0}, \varvec{k^1},\cdots }\) where \( \varvec{k^i} = (k_0^i, k_1^i, \cdots , k_{n-1}^i) \) (\( i \ge 0 \)) denote all solutions of \( x_0 + x_1 + \cdots , x_{n-1} = \left\lceil \frac{k_0 + k_1 + \cdots + k_{n-1}}{d}\right\rceil \).

Example: Take the Sbox used in PRESENT as an example. The PRESENT Sbox is a \( 4\times 4 \) Sbox with algebraic degree three. Assume that the input multiset to the Sbox has division property \(\mathcal {D}^{1,4}_{(0,1,1,1)}\). To compute the output multiset division property, we can proceed in three steps as described above: First by a concatenation function we convert the input multiset into another multiset \(\mathbb {X}^* \) whose elements take a value in \(\mathbb {F}_2^4 \), thus the division property of \(\mathbb {X}^* \) is \(\mathcal {D}_3^4 \). Secondly, make each value in \(\mathbb {X}^* \) pass through the Sbox operation and this will result in a multiset \(\mathbb {Y}^* \) with division property \(\mathcal {D}^4_{\left\lceil \frac{3}{3}\right\rceil } = \mathcal {D}^4_1 \). Finally, we split each value in \(\mathbb {Y}^* \) into a value in \( (\mathbb {F}_2)^4 \), and we will get a multiset \(\mathbb {Y}\) with division property \(\mathcal {D}^{1,4}_{(0,0,0,1), (0,0,1,0),(0,1,0,0),(1,0,0,0)}\). Thus, we have obtained four division trails of Sbox: \( (0,1,1,1)\overset{Sbox }{\longrightarrow }(0,0,0,1) \), \( (0,1,1,1)\overset{Sbox }{\longrightarrow }(0,0,1,0) \), \( (0,1,1,1)\overset{Sbox }{\longrightarrow }(0,1,0,0) \) and \( (0,1,1,1)\overset{Sbox }{\longrightarrow }(1,0,0,0) \).

Note that only the algebraic degree is exploited to calculate the division trails of Sbox in this naive approach. From the example illustrated above, if the input multiset to the Sbox has division property \(\mathcal {D}^{1,4}_{(0,1,1,1)}\), the corresponding output multiset does not balance on any of the four output bits. However, this is not actually true. Denote the input to PRESENT Sbox as \( \varvec{x} = (x_3,x_2,x_1,x_0) \), and the corresponding output as \( \varvec{y} = (y_3,y_2,y_1,y_0) \), the algebraic normal form (ANF) of PRESENT Sbox is listed as follows:

$$\begin{aligned} {\left\{ \begin{array}{ll} y_3 = 1\oplus x_0\oplus x_1\oplus x_3\oplus x_1 x_2\oplus x_0x_1x_2\oplus x_0x_1x_3\oplus x_0x_2x_3\\ y_2 = 1\oplus x_2\oplus x_3\oplus x_0x_1\oplus x_0x_3\oplus x_1x_3\oplus x_0x_1x_3\oplus x_0x_2x_3\\ y_1 = x_1\oplus x_3\oplus x_1x_3\oplus x_2x_3\oplus x_0x_1x_2\oplus x_0x_1x_3\oplus x_0x_2x_3\\ y_0 = x_0\oplus x_2\oplus x_3\oplus x_1x_2 \end{array}\right. } \end{aligned}$$
(4)

Thus,

$$\begin{aligned} \varvec{\pi }_{(0,0,0,1)}((y_3,y_2,y_1,y_0)) = y_0 \end{aligned}$$

and

$$\begin{aligned} \begin{aligned}&\;\bigoplus _{x\in \mathbb {X}}\varvec{\pi }_{(0,0,0,1)}((y_3,y_2,y_1,y_0)) \\ =&\; \bigoplus _{x\in \mathbb {X}}y_0\\ =&\; \bigoplus _{x\in \mathbb {X}}(x_0 \oplus x_2 \oplus x_3\oplus x_1x_2)\\ =&\; \bigoplus _{x\in \mathbb {X}}\varvec{\pi }_{(0,0,0,1)}(\varvec{x})\oplus \bigoplus _{x\in \mathbb {X}}\varvec{\pi }_{(0,1,0,0)}(\varvec{x})\oplus \bigoplus _{x\in \mathbb {X}}\varvec{\pi }_{(1,0,0,0)}(\varvec{x})\oplus \bigoplus _{x\in \mathbb {X}}\varvec{\pi }_{(0,1,1,0)}(\varvec{x})\\ =&\; 0 + 0 + 0 +0\\ =&\; 0 \end{aligned} \end{aligned}$$

As illustrated above, the least significant bit \( y_0 \) of the output \( \varvec{y}\) is balanced. Similarly, we can check that \( y_2 \) and \( y_0y_2 \) are all balanced. Furthermore, it can be observed that the expressions of \( y_1 \) and \( y_3 \) all contain monomial \( x_0x_1x_2 \) whose parity over \(\mathbb {X}\) is undetermined according to the initial division property \(\mathcal {D}^{1,4}_{(0,1,1,1)}\), thus \( y_1 \) and \( y_3 \) are not balanced. Based on these observations, the division property of \(\mathbb {Y}\) should be \(\mathcal {D}^{1,4}_{(0,0,1,0),(1,0,0,0)}\). In this case we obtain two division trails of PRESENT Sbox, and what is more important is that \(y_0, y_2\) and \(y_0y_2\) are all balanced under this approach.

Our Improved Approach. Now we present a generalized algorithm to calculate division trails of an Sbox based on bit-based division property. In Algorithm 2, \( \varvec{x}=(x_{n-1}, \cdots , x_0) \) and \( \varvec{y}=(y_{n-1},\cdots ,y_0) \) denote the input and output to an n-bit Sbox respectively, and \( y_i \) is expressed as a boolean function of \( (x_{n-1}, \cdots , x_0) \).

figure b

We explain Algorithm 2 line by line:

  • Line 2–3 According to input division property \(\mathcal {D}^{1,n}_{\varvec{k}}\), the parity of monomial \( \varvec{\pi _{\bar{k}}}(\varvec{x}) \) with \(\bar{\varvec{k}} \succeq \varvec{k}\) over \(\mathbb {X}\) is undetermined, and we store these monomials in F(X). Thus, the parity of any monomial that does not belong to F(X) is zero.

  • Line 4 Initialize \(\mathbb {K}\) as an empty set.

  • Line 5–9 For any possible \( \varvec{u}\), if boolean function \( \varvec{\pi _u}(\varvec{y}) \) contains any monomial in F(X) , the parity of \( \varvec{\pi _u}(\varvec{y}) \) over \(\mathbb {X}\) is undetermined, and we store all these vectors in \(\bar{\mathbb {K}}\).

  • Line 10 SizeReduce() function removes all redundant vectors in \(\bar{\mathbb {K}}\). Since we are interested in finding a set \(\mathbb {K}\) such that for any \( \varvec{u}\in \{\varvec{u}| \varvec{u}\nsucceq \varvec{k} \text { for all } \varvec{k}\in \mathbb {K}\}\), the parity of \( \varvec{\pi _u}(\varvec{y}) \) is zero. Note that for any vector , the parity of \( \varvec{\pi _u}(\varvec{y}) \) is zero, thus, we must have , and if we let \(\mathbb {K}\)= SizeReduce(\(\bar{\mathbb {K}}\)) it will meet this condition. Otherwise, if there exists a vector \( \varvec{u}\in \{\varvec{u}\ |\ \varvec{u}\nsucceq \varvec{k} \text { for all } \varvec{k}\in \mathbb {K}\}\) such that , thus, we have \( \varvec{u}\in \bar{\mathbb {K}}\), which meants either \( \varvec{u}\in \mathbb {K}\) or there exists a vector \( \varvec{u}^* \in \mathbb {K}\) such that \( \varvec{u} \succeq \varvec{u}^* \) since \(\mathbb {K}\) = SizeReduce(\(\bar{\mathbb {K}}\)). In either case it won’t happen \( \varvec{u}\in \{\varvec{u}\ |\ \varvec{u}\nsucceq \varvec{k} \text { for all } \varvec{k}\in \mathbb {K}\}\), which leads to a contradiction. Therefore, \(\mathbb {K}\) is sufficient to characterize the division property of output multiset.

  • Line 11 Return \(\mathbb {K}\) as output.

Given an Sbox and an initial division property \(\mathcal {D}^{1,n}_{\varvec{k}}\), Algorithm 2 returns the output division property \(\mathcal {D}^{1,n}_{\mathbb {K}}\). Thus for any vector \( \varvec{k}^*\in \mathbb {K}\), \( (\varvec{k}, \varvec{k}^*) \) is a division trail of the Sbox. If we try all the \( 2^n \) possible input multiset division property, we will get a full list of division trails. Table 4 in Appendix B presents a complete list of all the 47 division trails of PRESENT Sbox.

Note that bit-based division property of an Sbox is closely related with Boura and Canteaut’s work [6]. However, Boura and Canteaut’s work is established on parity set, while our results are directly deduced from bit-based division property.

Representing the Division Trails of Sbox as Linear Inequalities. Each division trail of an n-bit Sbox can be viewed as a 2n-dimensional vector in \( \{0,1\}^{2n}\subset \mathbb {R}^{2n}\) where \(\mathbb {R}\) is the real numbers field. Thus, all division trails form a subset P of \( \{0,1\}^{2n}\). Next, we compute the H-Representation of Conv(P) by using the inequality_generator() function in the Sage [2] software, and this will return a set of linear inequalities \(\mathcal {L}\). However, \(\mathcal {L}\) contains too many inequalities which will make the size of corresponding MILP problem too large to solve. Fortunately, we can select a subset \(\mathcal {L}^* \) of \(\mathcal {L}\) by Algorithm 1 such that the feasible solutions of \(\mathcal {L}^* \) restricted in \( \{0,1\}^{2n}\) are exactly P.

Example: PRESENT Sbox contains 47 division trails which forms a subset P of \( \{0,1\}^8 \). By using the inequality_generator() function in the Sage software, a set of 122 linear inequalities will be returned. Furthermore, this set can be reduced by Algorithm 1 and we will get a set \(\mathcal {L}^* \) of only 11 inequalities. The 11 inequalities for PRESENT Sbox are listed in Appendix C. In order to get the solutions of \(\mathcal {L}^* \) restricted in \( \{0,1\}^8 \), we only need to specify that all variables can only take values in \( \{0,1\}\).

So far, we have studied calculating and modeling division trails of basic operations and Sbox, thus, for block ciphers based on these operations and (or) Sbox, we can construct a set of linear inequalities which characterize one round division property propagation. By repeating this procedure r times, we can get a linear inequality system \(\mathcal {L}\) such that all feasible solutions of \(\mathcal {L}\) are all r-round division trails.

3.3 Initial Division Property

Integral distinguisher search algorithm often has a given initial division property \(\mathcal {D}^{1,n}_{\varvec{k}}\). Even though \(\mathcal {L}\) is able to describe all division trails, we are interested in division trails starting from the given initial division property. Thus, we have to model the initial division property into the linear inequality system. Denote \( (a_{n-1}^0,\cdots , a^0_{0})\rightarrow \cdots \rightarrow (a^r_{n-1},\cdots ,a^r_{0})\) an r-round division trail, \(\mathcal {L}\) is thus a linear inequality system defined on variables \( a_i^j \) (\( i= 0, \cdots , n-1.\ \ j= 0, \cdots ,r \)) and some auxiliary variables. Let \(\mathcal {D}^{1,n}_{\varvec{k}}\) denote the initial input division property with \( \varvec{k}=(k_{n-1}, \cdots , k_0) \), we need to add \( a_i^0 = k_i \) (\( i = 0, \cdots n-1 \)) into \(\mathcal {L}\), and thus all feasible solutions of \(\mathcal {L}\) are division trails which start from vector \( \varvec{k}\).

4 Stopping Rule and Search Algorithm

In this section we first study the stopping rule in the search of integral distinguishers based on division property, and then we convert this stopping rule into an objective function of the MILP problem. At last, we propose an algorithm to determine whether an r-round integral distinguisher exists.

In the division property propagation, we note that only zero vector can propagate to zero vector. Thus if the given initial division property is \(\mathcal {D}^{1,n}_{\varvec{k}}\) with \(\varvec{k}\) a non-zero vector, and we denote the division property after r-round propagation by \(\mathcal {D}^{1,n}_{\mathbb {K}_r}\), then it holds that \(\mathbb {K}_r\) does not contain zero vector. In the following, we always assume \(\varvec{k}\ne \varvec{0}\), since \(\varvec{k}=\varvec{0}\) does not imply any integral property on the input multiset.

4.1 Stopping Rule

Let’s first consider a set \(\mathbb {X}\) with division property \(\mathcal {D}^{1,n}_{\mathbb {K}}\). If \(\mathbb {X}\) does not have any useful integral property, that is the Xor-sum of \(\mathbb {X}\) does not balance on any bit, thus we have \(\bigoplus _{\varvec{x}\in \mathbb {X}}\varvec{\pi _u}(\varvec{x})\) is unknown for any unit vector \(\varvec{u}\in (\mathbb {F}_2)^n\). Since \(\mathbb {X}\) has division property \(\mathcal {D}^{1,n}_{\mathbb {K}}\), there must exist a vector \(\varvec{k}\in \mathbb {K}\) such that \(\varvec{u}\succeq \varvec{k}\). Note that \(\varvec{u}\) is a unit vector, thus \(\varvec{u}=\varvec{k}\), which means \(\mathbb {K}\) contains all the n unit vectors. On the other hand, if \(\mathbb {K}\) contains all the n unit vectors, then for any \(\varvec{0}\ne \varvec{u}\in (\mathbb {F}_2)^n\) there must exist a unit vector \(\varvec{e}\in \mathbb {K}\) such that \(\varvec{u}\succeq \varvec{e}\), that is \(\bigoplus _{\varvec{x}\in \mathbb {X}}\varvec{\pi _u}(\varvec{x})\) is unknown. Thus, \(\mathbb {X}\) does not have any integral property.

Proposition 6 (Set without Integral Property)

Assume \(\mathbb {X}\) is a multiset with division property \(\mathcal {D}^{1,n}_{\mathbb {K}}\), then \(\mathbb {X}\) does not have integral property if and only if \(\mathbb {K}\) contains all the n unit vectors.

Denote the output division property after i-round encryption by \(\mathcal {D}^{1,n}_{\mathbb {K}_i}\), and the initial input division property by \(\mathcal {D}^{1,n}_{\varvec{k}}\overset{def}{=}\mathcal {D}^{1,n}_{\mathbb {K}_0}\). If \(\mathbb {K}_{r+1}\) for the first time contains all the n unit vectors, the division property propagation should stop and an r-round distinguisher can be derived from \(\mathcal {D}^{1,n}_{\mathbb {K}_r}\). In this case, \(\mathbb {K}_r \) does not contain all n unit vectors, thus we can always find a unit vector \( \varvec{e}\) such that \( \varvec{e}\notin \mathbb {K}_r \). Since \( \varvec{e}\) is a unit vector, it holds \( \varvec{e}\nsucceq \varvec{k}\) for all \( \varvec{k}\in \mathbb {K}_r \). Therefore, the parity of \(\varvec{\pi _e}(\varvec{x})\) over r-round outputs is even which is a zero-sum property, thus a balanced bit of the output is found. By repeating this process, all balanced bits can be found.

Based on this observation, we only need to detect whether \(\mathbb {K}_r \) contains all unit vectors. According to Proposition 5, in order to check the vectors in \(\mathbb {K}_r \), it is equivalent to check the last vectors of all r-round division trails. Denote \( (a_{n-1}^0,\cdots , a^0_{0})\rightarrow \cdots \rightarrow (a^r_{n-1},\cdots ,a^r_{0})\) an r-round division trail, and let \(\mathcal {L}\) denote a linear inequality system whose feasible solutions are all division trails which start with the given initial division property. It is clear that \(\mathcal {L}\) is a linear inequality system defined on variables \( a_i^j \) (\( i= 0, \cdots , n-1.\ \ j= 0, \cdots ,r \)) and some auxiliary variables. Thus, we can set the objective function as:

$$\begin{aligned} Obj: Min\{a^r_0+ a_1^r + \cdots a^r_{n-1}\} \end{aligned}$$
(5)

Now we get a complete MILP problem by setting \(\mathcal {L}\) as constraints and Obj as objective function. Note that \(\mathbb {K}_i\) does not contain zero vector, in this case, the objective function will never take a value of zero, and the MILP problem will return an objective value greater than zero (if the MILP problem has feasible solutions). In the following we show how to determine whether r-round integral distinguisher exists based on this MILP problem.

4.2 Search Algorithm

Denote \(\mathcal {L}\) a linear inequality description of all r-round division trails with the given initial input division property \(\mathcal {D}^{1,n}_{\varvec{k}}\). Let the sum of the coordinates of the last vector in the division trail be the objective function Obj as in Eq. (5). Denote \( M(\mathcal {L}, Obj) \) the MILP problem composed of \(\mathcal {L}\) and Obj. Algorithm 3 will return whether r-round integral distinguisher exists.

figure c

Our MILP problems are solved by the openly available MILP optimizer Gurobi [1], Algorithm 3 is presented with some Gurobi syntax. We denote the set of last vectors of all division trails by \(\mathbb {K}_r \).

  • Line 2 Initialize \(\mathbb {S}\) as all possible output bit positions.

  • Line 3–24 For an n-bit block cipher, check how many unit vectors there are in \(\mathbb {K}_r \). Moreover, we remove the bit position marked by the unit vectors in \(\mathbb {K}_r \) from \(\mathbb {S}\), and return \(\mathbb {S}\) as the output of the algorithm.

  • Line 4 Check whether the MILP problem has a feasible solution. Note that the initial MILP problem always has feasible solutions. However, along with the execution of the procedure, it will add some constraints (Line 13) in the model which will possibly make the MILP problem unsolvable.

  • Line 5 Optimize the MILP problem M by Gurobi.

  • Line 6–18 M.ObjVal is Gurobi syntax which returns the current value of the objective function after M has been optimized. \( M.ObjVal=1 \) means we have found a division trail which ends up with a unit vector \( \varvec{e}\), thus \( \varvec{e}\in \mathbb {K}_r \). M.getObjective() is a Gurobi function which returns the objective function of the model, which is \( a^r_0+\cdots +a^r_{n-1}\) in our case. The functionality of Line 8–17 is to choose which variable of \( (a^r_0,\cdots ,a^r_{n-1}) \) is equal to one in \( \varvec{e}\) and add a new constraint \( var = 0 \) into M, here var denotes the variable taking a value of one. obj.getVar(i) is used to return the i-th variable of obj which is \( a^r_i \) in this case. \( var.getAttr('x') \) retrieves the value of var under the current solution. Line 12 removes var from \(\mathbb {S}\), since we have found \( \varvec{e}\in \mathbb {K}_r \) whose nonzero position is var which means var can’t be a balanced bit position. \( M.addConstr(var = 0) \) adds a new constraints \( var = 0 \) into M, and this is used to rule out \(\varvec{e}\) from \(\mathbb {K}_r\). Line 14 updates the model since we have added a new constraint.

  • Line 19 This step returns \(\mathbb {S}\), the execution of this step means the objective value of M is larger than one, that is we can no longer find a division trail with the last vector being a unit vector. In this case, we have found all unit vectors in \(\mathbb {K}_r \) which represent undetermined bit positions, and thus we have ruled out all unbalanced bits and get an integral distinguisher.

  • Line 22 M do not have any feasible solutions means we have ruled out all units vectors of \(\mathbb {K}_r \) and made \(\mathbb {K}_r \) an empty set along with the execution. In this case, we can return \(\mathbb {S}\) as output since we have checked all vectors.

  • Line 25 If the for loop do not make the procedure exit, return \(\mathbb {S}\) as output. Usually, in this case \(\mathbb {S}\) is an empty set which means no distinguisher found.

Algorithm 3 always returns a set \(\mathbb {S}\) indicating balanced bit positions. For a block cipher with a given initial division property \(\mathcal {D}^{1,n}_{\varvec{k}}\), we can construct an r-round linear description of division property propagations and use Algorithm 3 to check whether a distinguisher exists. If for the first time the \( (r+1) \)-round model returns an empty set, then the longest distinguisher for the given initial division property is r-round.

5 Applications to SIMON, SIMECK, PRESENT, RECTANGLE, LBlock and TWINE

In this section, we show some applications of our technique. All the source codes are avaiable at https://github.com/xiangzejun/MILP_Division_Property. We applied our algorithm to SIMON, SIMECK, PRESENT, RECTANGLE, LBlock and TWINE block ciphers. The results are listed in Table 1. The Round (Previous) column and Round (Sect. 5) column list the number of rounds of the distinguishers of previous and our results. The Data column represents the number of active bits of the input pattern of the integral distinguisher, the data complexity of the distinguisher is determined by the initial input division property. Balanced bits column represents the number of balanced bits of the distinguisher we found. Time presents the time used by Algorithm 3 for searching the corresponding distinguishers, among which s is short for second and m is short for minute. All the experiments are conducted on the following platform: Intel Core i7-2600 CPU @3.40 GHz, 8.00G RAM, 64-bit Windows 7 system. Moreover, the distinguishes listed in Table 1 are presented in Appendix E. The table shows that we get improved distinguishers for SIMON48/64/96/128, SIMECK48/64, PRESENT and RECTANGLE. For SIMECK32, LBlock and TWINE our results are consistent with the previous best results. The result of SIMON32 is one round less than the result in [19]. However, we only use bit-based division property here, the 15-round distinguisher found in [19] for SIMON32 used bit-based division property using three subset. If bit-based division property is the only technique adopted, 14-round distinguisher is the longest distinguisher we can find.

5.1 Applications to SIMON and SIMECK

SIMON [3] is a family of lightweight block ciphers published by the U.S. National Security Agency (NSA) in 2013. SIMON adopts Fesitel structure and it has a very compact round function which only involves bit-wise And, Xor and circular shift operations. The structure of one round SIMON encryption is depicted in Fig. 1 where \( S^i \) denotes left circular shift by i bits.

Fig. 1.
figure 1

Feistel structure of SIMON round function

1-round Description of SIMON: Denote one round division trail of SIMON2n by \( (a^i_0, \cdots , a^i_{n-1}, b^i_0, \cdots , b^i_{n-1})\rightarrow (a^{i+1}_0,\cdots ,a^{i+1}_{n-1}, b^{i+1}_0, \cdots , b^{i+1}_{n-1}) \). In order to get a linear description of all possible division trails of one round SIMON, we introduce four vectors of auxiliary variables which are \( (u^i_0,\cdots ,u^i_{n-1}) \), \( (v^i_0, \cdots , v^i_{n-1}) \), \( (w^i_0,\cdots ,w^i_{n-1}) \) and \( (t^i_0,\cdots ,t^i_{n-1}) \). We denote \( (u^i_0,\cdots ,u^i_{n-1}) \) the input division property of \( S^1 \). Similarly, denote \( (v^i_0, \cdots , v^i_{n-1}) \) and \( (w^i_0,\cdots ,w^i_{n-1}) \) the input division property of \(S^8 \) and \( S^2 \) respectively. Let \( (t^i_0,\cdots ,t^i_{n-1}) \) denote the output division property of bit-wise And operation. Subsection 3.1 has modeled Copy, And and Xor functions. According to Eq. (1), the following inequalities are sufficient to model the Copy operation used in SIMON2n:

$$\begin{aligned} \mathcal {L}_1:a^i_j - u^i_j - v^i_j - w^i_j - b^{i+1}_j =0 \text { for } j \in \{0,1,\cdots , n-1\} \end{aligned}$$

Since we consider bit-based division property, division property propagation through circular shift is just a circular shift of the coordinates of the vector. Thus, the division property of the output of \( S^1 \) is \( (u^i_1, \cdots , u^i_{n-1}, u^i_0) \). Similarly, the division property of the output of \( S^8 \) and \( S^2 \) are \( (v^i_8,\cdots ,v^i_6,v^i_7) \) and \( (w^i_2,\cdots ,w^i_0,w^i_1) \) respectively. We can model bit-wise And operation used in SIMON by the following inequalities according to Eq. (2):

$$ \mathcal {L}_2:\left\{ \begin{aligned}&t^i_j - u^i_{j+1} \ge 0&\text { for }j \in \{0,1, \cdots , n-1\}\\&t^i_j - v^i_{j+8} \ge 0&\text { for }j \in \{0,1, \cdots , n-1\}\\&t^i_j - u^i_{j+1} - v^i_{j+8} \le 0&\text { for }j \in \{0,1, \cdots , n-1\} \end{aligned} \right. $$

At last, the Xor operations in SIMON2n can be modeled by the following inequalities according to Eq. (3):

$$\begin{aligned} \mathcal {L}_3:a^{i+1}_j - b^i_j - t^i_j - w^i_{j+2} =0 \text { for } j \in \{0,1,\cdots , n-1\} \end{aligned}$$

So far, we have modeled all operations used in SIMON, and get an accurate description \( \{\mathcal {L}_1,\mathcal {L}_2,\mathcal {L}_3\}\) of 1-round division trails. By repeating this procedure r times, we can get a linear inequality system \(\mathcal {L} \) for r-round division property propagation. Given some initial division property, we can add the corresponding constrains into \(\mathcal {L}\) and estimate whether a useful distinguisher exists by Algorithm 3. The results for SIMON family are listed in Table 1.

For SIMON48/64/96/128, we found the best distinguishers so far. Note that by using bit-based division property under the framework of [19], it is computationally impractical to search distinguishers for these versions. Using Algorithm 3, distinguishers can be searched in practical time.

SIMECK [25] is a family of lightweight block cipher proposed at CHES 2015. The round function of SIMECK is very like SIMON except the rotation constants. We applied our technique to SIMECK, and 15-, 18- and 21-round distinguishers are found for SIMECK32, SIMECK48 and SIMECK64 respectively, which shows that SIMON has better security than SIMECK with respect to division property based integral cryptanalysis.

We found that the 14-round distinguisher of SIMON32 we found is the same as the 14-round distinguisher of SIMON32 in [19] based on bit-based division property. Surprisingly, the 15-round distinguisher for SIMECK32 in [19] is found by bit-based division property using three subsets, however, we also find the same distinguisher for SIMECK32 by only using bit-based division property.

In [9], the authors investigated the differential and linear behavior of SIMON family regarding rotation parameters, and they presented some interesting alternative parameters among which (1, 0, 2) is optimal for the differential and linear characteristics with the restriction that the second rotation parameter is zero. In this paper, we investigated the integral property of this parameter by our technique. The results are listed in Table 2 (h in the time column represents hour). The third column lists the rounds of the distinguishers we found. The results show that (1, 0, 2) is a very bad choice with respect to division property based integral cryptanalysis.

Table 2. Results on SIMON(1,0,2).

5.2 Applications to PRESENT and RECTANGLE

PRESENT [5] and RECTANGLE [28] are two SP-network block ciphers, of which the linear layers are bit permutations. Figure 2 illustrates one round encryption of PRESENT.

Fig. 2.
figure 2

One round SP structure of PRESENT

1-round Description of PRESENT: Denote one round division trail of PRES-ENT by \( (a^i_{63},\cdots ,a^i_0)\rightarrow (a^{i+1}_{63}, \cdots , a^{i+1}_0) \). We first model the division property propagation of Sbox layer. Denote the division property of the output of Sbox by \( (b^i_{63}, \cdots , b^i_0) \). Subsection 3.2 has studied how to calculate the division trails of Sbox and model those trails by linear inequalities. Appendix C shows the 11 inequalities of PRESENT Sbox. For each of the 16 Sboxes of PRESENT, we introduce 11 inequalities and thus the Sbox layer of PRESENT can be modeled by \( 11\times 16 = 176 \) inequalities which is denoted by \(\mathcal {L}_1 \). The linear layer of PRESENT is a bit permutation, thus, the division property propagation through linear layer is just a permutation of the coordinates of the vector, that is

$$ \mathcal {L}_2:\left\{ \begin{aligned}&a^{i+1}_{16j\mod 63} = b^i_j&j\in \{0,1,\cdots ,62\}\\&a^{i+1}_j = b^i_j&j = 63 \end{aligned} \right. $$

Note that \(\mathcal {L}_1 \) is a linear inequality system defined on variables \( (a^i_{63}, \cdots , a^i_0) \) and \( (b^i_{63},\cdots ,b^i_0) \), we can use the equalities in \(\mathcal {L}_2 \) to replace the variables \( (b^i_{63},\cdots ,b^i_0) \) in \(\mathcal {L}_1 \) in order to save auxiliary variables.

Now we have get a linear inequality system to describe one round division propagation of PRESENT. By repeating this procedure, an r-round linear inequality system can be constructed. For a given initial division property \(\mathcal {D}^{1,64}_{\varvec{k}}\), we add this information into the linear inequality system and use Algorithm 3 to estimate whether there exists an integral distinguisher.

The result for PRESENT is listed in Table 1. We found a 9-round integral distinguisher for PRESENT which is two more rounds than the previous best results in [22].

The modeling procedure of RECTANGLE is very like to PRESENT, we only list the result here in Table 1. The previous longest integral distinguisher of RECTANGLE is found by the designers, and they gave a 7-round distinguisher. In this paper we find a 9-round distinguisher which is two more rounds.

5.3 Applications to LBlock and TWINE

This subsection applies our technique to two generalized Feistel block cipher LBlock and TWINE. The round function of these two ciphers are alike, and the round function composed of Copy, Sbox and Xor operations. We have showed how to model Copy and Xor operations in SIMON and Sbox in PRESENT, thus, we omit the details for these two ciphers due to the limit of space. The number of division trails and linear inequalities required to describe those division trails of LBlock and TWINE Sboxes are presented in Table 3. The \( \sharp \{D.C\}\) column represents the number of division trails of the corresponding Sbox, and the \( \sharp \{Ine\}\) column represents the number of linear inequalities we found to accurately describe the division trails. Note that we chose the first inequality in the sixth line of Algorithm 1, however, other choice rather than the first one may result in different set of inequlities.

Table 3. Sbox properties regrading division trails.

Our experimental results regarding LBlock and TWINE are listed in Table 1. The distinguishers found in this paper are the same as the distinguishers found for these two ciphers in [26].

Experiments. To illustrate the validity of the technique proposed in this paper, we presented some integral distinguishers found by our technique with a small number of active bits, and we run experiments on these distinguishers. The experiments are presented at Appendix D. Our experiments showed that the distinguishers found by our technique are sound. Moreover, the results on PRESENT and RECTANGLE illustrate that our technique can find quite accurate distinguishers, that is the balanced bits found by Algorithm 3 are exactly in accordance with experimental results. For PRESENT cipher, we retrieved and improved the 5-round distinguisher found in [22], our technique found all the four balanced bits of the outputs of the fifth round given the same input pattern as in [22], while Wu et al. could only prove the balancedness of only one bit.

6 Summary and Discussion

In this paper we introduced a new technique to search integral distinguishers based on bit-based division property. We first proposed a new notion division trail and used this new notion to characterize the division property propagation, then we showed that it is sufficient to check the last vectors of all r-round division trails in order to estimate whether an r-round distinguisher exists.

Based on the observations on division trails, we proposed to construct a linear inequality system to characterize the division property propagations. We first studied how to model division property propagations of Copy, And and Xor operations by linear inequalities. For another basic component Sbox used in block ciphers, we studied the bit-based division property propagations for the first time, and we proposed an algorithm to compute the division trails of an Sbox. Moreover, we used those division trails to derive a set of linear inequalities whose feasible solutions are exactly all division trails. Thus, for a block cipher we can construct a linear inequality system whose solutions are all r-round division trails of the cipher, and we used this linear inequality system as constraints of the MILP problem. Then, the stopping rule in the search of integral distinguisher were studied and we converted it into an objective function of an MILP problem. To be specific, we set the sum of the coordinates of the last vector in an r-round division trail as objective function. Thus, we can get a complete MILP problem, based on which we presented an algorithm to estimate whether an r-round integral distinguisher exists by checking how many unit vectors are contained in the last vectors of all division trails.

We applied our technique to SIMON, SIMECK, PRESENT, RECTANGLE, LBlock and TWINE. For SIMON48/64/96/128, SIMECK48/64, PRESENT and RECTANGLE, we get much longer distinguishers than previous results based on division property in the open literature. Moreover, our results on PRESENT and RECTANGLE show that we can get better integral distinguishers by using the algebraic normal form of the Sboxes. Our results show that, by using our technique, we can search integral distinguishers based on bit-based division property in practical time for block ciphers with block size larger than 32, which is impractical under the traditional framework.

In [19], Todo et al. also introduced bit-based division property using three subsets, and they found 15-round distinguisher for SIMON32. However, we have not found a way to model this framework by an MILP problem at present. A surprising result is, by using our technique we also derived the 15-round distinguisher of SIMECK32 which are constructed by bit-based division property using three subsets [19]. We also used our technique on some Sbox-based block ciphers such as PRESENT and RECTANGLE, note that their linear layers are all bit permutations. However, this technique can be easily extended to arbitrary linear layers as pointed out by the reviewers, since any linear layer can be viewed as bit-level linear layer which can be treated as bit-wise copy and Xor.