Optimizing SBox Implementations for Several Criteria Using SAT Solvers
 9 Citations
 1.2k Downloads
Abstract
We explore the feasibility of applying SAT solvers to optimizing implementations of small functions such as Sboxes for multiple optimization criteria, e.g., the number of nonlinear gates and the number of gates. We provide optimized implementations for the Sboxes used in Ascon, ICEPOLE, Joltik/Piccolo, Keccak/Ketje/Keyak, LAC, Minalpher, PRIMATEs, Prøst, and RECTANGLE, most of which are candidates in the secound round of the CAESAR competition. We then suggest a new method to optimize for circuit depth and we make tooling publicly available to find efficient implementations for several criteria. Furthermore, we illustrate with the 5bit Sbox of PRIMATEs how multiple optimization criteria can be combined.
Keywords
Sbox SAT solvers Implementation optimization Multiplicative complexity Circuit depth complexity Shortest linear straightline program1 Introduction
Implementations of cryptographic algorithms are typically optimized for one or multiple criteria, such as latency, throughput, power consumption, memory consumption, etc., but also criteria such as the cost of adding masking countermeasures to protect against sidechannel attacks. It is worthwhile to spend time on this optimization, as the implementations are typically used many times. It is usually a hard problem to find an implementation that is actually theoretically minimal with respect to the criteria, e.g., general circuit minimization is \(\sum _2^P\)complete [10]. However, for small functions this is still possible, using, for instance, SAT solvers. Especially for building blocks that can be used in multiple cryptographic algorithms, such as Sboxes, it is useful to look at methods for finding minimal implementations with respect to some given criteria.
In Sect. 2, we first discuss the simpler problem of finding minimal implementations of linear functions. We give a brief overview of methods for finding the shortest linear straightline program.
We then move towards Sboxes and in Sect. 3 we consider known methods [13, 20] that manage to find minimal implementations for the relevant optimization criteria of multiplicative complexity [9], bitslice gate complexity [12], and gate complexity. The definitions of these criteria are given in Sect. 3. We study how feasible the methods actually are by applying them to Sboxes that are used in recent cryptographic algorithms, such as several candidates in the CAESAR competition and lightweight block ciphers. Additionally, we provide tools that allow anyone to conveniently do the same to other small Sboxes.
Then we look at another optimization criterion: the circuit depth complexity. This is relevant in hardware implementations to decrease the delay and to be able to increase the clock frequency. We suggest a new method for encoding the circuit depth complexity decision problem in SAT and we show how feasible this method is in practice by providing efficient lowdepth Sbox implementations for Joltik [17], Piccolo [22], LAC [23], Prøst [18], and RECTANGLE [24] in Sect. 3.5.
Finally, in Sect. 4 it is discussed how several optimization criteria can be combined, by first optimizing the Sbox used by the PRIMATEs [2] for multiplicative complexity and then for gate complexity. This is done by taking the intermediate result after optimizing for multiplicative complexity, identifying the linear parts of this, and by treating these as instances of the shortest linear straightline program problem.

implementations of the Sboxes in Ascon, ICEPOLE, Joltik/Piccolo, Keccak/Ketje/Keyak, LAC, Minalpher, Prøst, and RECTANGLE with a provably minimal number of nonlinear gates;

a new method for encoding the circuit depth complexity decision problem as an instance of SAT;

optimized and sometimes even provably minimal implementations of the Sboxes in Joltik/Piccolo, LAC, Prøst, and RECTANGLE with respect to bitslice gate complexity, gate complexity, and circuit depth complexity;

a method to combine multiple optimization criteria;

an implementation of the Sbox used by the PRIMATEs that is first optimized for multiplicative complexity and then for (bitslice) gate complexity;

tools and documentation to optimize implementations of small nonlinear functions such as Sboxes using SAT solvers, with respect to multiplicative complexity, bitslice gate complexity, gate complexity, or circuit depth complexity, are put into the public domain. These tools are available online.
2 The Shortest Linear StraightLine Program Problem
Before tackling the optimization of Sboxes, let us restrict ourselves to linear functions and let us consider the Shortest Linear Program (SLP) problem over \(GF(2)\). Let \({\varvec{A}}\) be an \(m \times n\) matrix of constants over \(GF(2)\) and let \({\varvec{x}}\) be a vector of \(n\) variables over \(GF(2)\). The SLP problem is to find the program with the smallest number of lines that computes \({\varvec{A}}{\varvec{x}}\), where every program line is of a certain form.
Being able to find the shortest straightline linear program has obvious applications to cryptology. Solving the SLP over \(GF(2)\) is equivalent to finding the shortest circuit to compute a function using only XOR gates. Optimizing implementations of linear operations, such as MixColumns in AES and the linear transformation in certain implementations of SubBytes, can therefore be seen as instances of the SLP problem over \(GF(2)\). However, this method does not apply to nonlinear operations such as Sboxes. We show in Sect. 3 what kind of methods can be used in such cases.
Solving the SLP Problem. Boyar, Matthews, and Peralta showed in [7] that the SLP problem over \(GF(2)\) is NPhard. Offtheshelf SAT solvers can be used to find solutions for small instances of this problem. Fuhs and SchneiderKamp presented a method [16] to encode the SLP problem as an instance of SAT and they show how this can be used to optimize the affine transformation of AES’s SubBytes [15, 16].
For larger instances, exact methods will quickly become infeasible. Alternatively, Boyar and Peralta published an approach to solve the SLP problem over \(GF(2)\) based on a heuristic [8]. In short, the heuristic method uses a base vector set \(S\), initialized with unit vectors for all variables in \({\varvec{x}}\), and a distance vector Dist[] that keeps track of the minimal Hamming distance to \(S\) for each row in \({\varvec{A}}\). Repeatedly, the sum of the pair of base vectors in \(S\) that minimizes the sum of Dist[] is added to \(S\) and Dist[] is updated, until Dist[] is the allzero vector. If there is a tie between two pairs of base vectors, the pair that maximizes the Euclidean length of the new Dist[] vector is chosen. This algorithm makes it possible to find solutions to larger instances of the SLP problem.
3 Optimizing SBox Implementations using SATSolvers
For nonlinear functions such as Sboxes, known approaches based on heuristics [8] all exploit additional algebraic structure that may be available, e.g., as for the Sbox of AES. However, in general this additional structure may not exist and one may need to fall back to generic methods such as SAT solvers.

Multiplicative complexity. The multiplicative complexity of a function [9] is defined as the smallest number of nonlinear gates with fanin 2 required to compute this function. If we restrict our Sbox implementations to the \(\{\texttt {AND},\texttt {OR},\texttt {XOR},\texttt {NOT} \}\) operations, we only need to consider the number of AND s and OR s. Optimizing for this goal is useful in the case of protecting against sidechannel attacks using random masks, where nonlinear gates are typically more expensive to mask. There are also applications in multiparty computation and fully homomorphic encryption, where the cost of nonlinear operations is even more significant [1].

Bitslice gate complexity. The bitslice gate complexity of a function [12] is defined as the smallest number of operations in \(\{\texttt {AND},\texttt {OR},\texttt {XOR},\texttt {NOT} \}\) required to compute this function. This translates directly to efficient bitsliced software implementations, as on most common CPU architectures, there are no instructions for computing NAND, NOR, or XNOR immediately.

Gate complexity. The gate complexity of a function is defined as the smallest number of logic gates required to compute this function. Unlike for bitslice gate complexity, NAND, NOR, and XNOR gates are now also allowed. This translates to efficient hardware implementations, although the different amounts of area required by these types of gates and the different delays still need to be taken into account. Note that we only consider gates with a fanin of at most 2.

Circuit depth complexity. The depth of a circuit is defined as the length of the longest paths from an input gate to an output gate. Every function can be computed by a circuit with depth 2, e.g., by expressing the function in conjunctive or disjunctive normal form. However, this can lead to very wide circuits with a lot of gates, which is typically not desirable. There is somewhat of a tradeoff between circuit depth and number of gates. Still, optimizing for this goal is useful in the case of hardware implementations, to be able to decrease the total delay and therefore to be able to increase the clock frequency. Again, only gates with a fanin of at most 2 are considered.
The decision problems for the other three optimization goals can be defined analogously. Offtheshelf SAT solvers can be used to solve these decision problems. When a SAT solver successfully finds a circuit for some value \(k\) but outputs UNSAT for \(k1\), it is proven that \(k\) is the minimum value. Note that when a SAT solver outputs SAT for some value \(k\), it also provides a satisfying valuation that can be used to reconstruct an implementation of \(f\).“Is there a circuit that implements \(f\) and that uses at most \(k\) nonlinear operations?”
In order to use SAT solvers to solve these decision problems, the problems first have to be encoded in logical formulas in conjunctive normal form (CNF), because that is the input format that the SAT solver requires.
3.1 Notation

\(x_i\) be variables representing Sbox inputs;

\(y_i\) be variables representing Sbox outputs;

\(q_i\) be variables representing gate inputs;

\(t_i\) be variables representing gate outputs;

\(a_i\) be variables representing wiring between gates;

\(b_i\) be variables representing wiring ‘inside’ gates. This will become more clear when they are first used in Sect. 3.3.
In the implementations the logical connectives are used to denote the types of operations, i.e., let \(\wedge \), \(\vee \), \(\oplus \), \(\lnot \) denote AND, OR, XOR, NOT, respectively, and let \(\uparrow \), \(\downarrow \), \(\leftrightarrow \) denote NAND, NOR, XNOR, respectively.
3.2 Optimizing for Multiplicative Complexity

\(\forall i \in \{0,\cdots ,k1\}\): \(t_i = q_{2i} \cdot q_{2i+1}\), to encode the \(k\) AND gates.

\(\forall i \in \{0,\cdots ,2k1\}\): \(q_i = a_{l} + \left( \sum _{j=0}^{n1} a_{l + j + 1} \cdot x_j\right) + \left( \sum _{j=0}^{\left\lfloor \frac{i}{2}\right\rfloor  1} a_{l + n + j + 1} \cdot t_j\right) \), where \(l = i(n+1) + \left\lfloor \frac{i^22i+1}{4}\right\rfloor \), to encode that the inputs of the AND gates can be any linear combination of Sbox inputs and previous AND gate outputs. The single \(a\) represents an optional NOT gate.

\(\forall i \in \{0,\cdots ,m1\}\): \(y_i = \left( \sum _{j=0}^{n1} a_{s + j} \cdot x_j\right) + \left( \sum _{j=0}^{k1} a_{s + n + j} \cdot t_j\right) \), where \(s = 2k(n+1) + k(k1) + i(n+k)\), to encode that the Sbox outputs can be any linear combination of Sbox inputs and AND gate outputs.
\(C'\) is in ANF. The method by Bard, Courtois, and Jefferson [3] for converting sparse systems of lowdegree multivariate polynomials over \(GF(2)\) is used to convert \(C'\) to CNF, such that it is understood by the SAT solver.
Results. This method makes it feasible to find the multiplicative complexity of several 4bit and 5bit Sboxes. Finding the multiplicative complexity comes with an actual implementation that uses this minimal number of nonlinear gates. After Courtois, Hulme, and Mourouzis applied this method to the Sboxes of PRESENT and GOST [12], we show that we can also find results for more recently introduced 4bit and 5bit Sboxes.
We consider the Sboxes, and if applicable, their inverses (denoted by \(^{1}\)), in Ascon [14], ICEPOLE [19], Keccak [4]/Ketje [5]/Keyak [6], all PRIMATEs [2], Joltik [17]/Piccolo [22], LAC [23], Minalpher [21], Prøst [18], and RECTANGLE [24]. Minalpher’s and Prøst’s Sboxes are involutory, which is why their inverses are not listed separately. The inverse Sboxes in Ascon, ICEPOLE, Keccak, Ketje, and Keyak are not actually used in decryption and are therefore not considered.
Multiplicative complexity of Sboxes
Sbox  Size \(n \times m\)  Multiplicative complexity 

Ascon  \(5 \times 5\)  5 
ICEPOLE  \(5 \times 5\)  6 
Keccak/Ketje/Keyak  \(5 \times 5\)  5 
PRIMATEs  \(5 \times 5\)  \(\in \{6,7\}\) 
PRIMATEs\(^{1}\)  \(5 \times 5\)  \(\in \{6,7,8,9,10\}\) 
Joltik/Piccolo  \(4 \times 4\)  4 
Joltik\(^{1}\)/Piccolo\(^{1}\)  \(4 \times 4\)  4 
LAC  \(4 \times 4\)  4 
Minalpher  \(4 \times 4\)  5 
Prøst  \(4 \times 4\)  4 
RECTANGLE  \(4 \times 4\)  4 
RECTANGLE\(^{1}\)  \(4 \times 4\)  4 
These and subsequent results are obtained using MiniSat 2.2.0 ^{1} and CryptoMiniSat 2.9.10 ^{2} using default parameters on a single core of an Intel Xeon E74870 v2 running at 2.30 GHz.

reducing the CNF, e.g., using NICESAT [11];

finetuning SAT solver parameters;

trying other SAT solvers;

trying other SAT solvers that can run in parallel on many cores, such as Plingeling and Treengeling ^{3}; and

letting all of this run for several months on a machine with 120 cores and 3 TB of RAM.
Unfortunately, none of these attempts resulted in an answer as no solver instance has terminated yet. As these SAT solvers typically have much more difficulty with proving the UNSAT case than proving the SAT case, and as the SAT proof for \(k=7\) was found in less than 40 hours, we expect the \(k=6\) case to yield UNSAT and we therefore conjecture the multiplicative complexity of the PRIMATEs Sbox to be 7. In Sect. 4 we go into more detail on optimizing the PRIMATEs Sbox. For the inverse Sbox, we did not manage to find solutions for \(k \in \{6,7,8,9\}\).
3.3 Optimizing for Bitslice Gate Complexity
In [13, 20], a method is also given to optimize for bitslice gate complexity. However, it is only applied on the small CTC2 toy cipher and therefore it remains unclear how practical this method is for realworld ciphers. We investigate this by applying the method to the same Sboxes as in the previous section.

\(\forall i \in \{0,\cdots ,k1\}\): \(t_i = b_{3i} \cdot q_{2i} \cdot q_{2i+1} + b_{3i+1} \cdot q_{2i} + b_{3i+1} \cdot q_{2i+1} + b_{3i+2} + b_{3i+2} \cdot q_{2i}\), to encode the \(k\) AND, OR, XOR or NOT gates. The \(b_i\) determine what kind of gate this will represent, as can be seen in Table 2.

\(\forall i \in \{0,\cdots ,k1\}\): \(0 = b_{3i} \cdot b_{3i+2}\) and \(0 = b_{3i+1} \cdot b_{3i+2}\), to make sure that the gate is either a unary NOT or a binary AND/OR/XOR, but not the XOR of them. This excludes NAND/NOR/XNOR gates.

\(\forall i \in \{0,\cdots ,2k1\}\): \(q_i = \left( \sum _{j=0}^{n1} a_{l + j} \cdot x_j\right) + \left( \sum _{j=0}^{\left\lfloor \frac{i}{2}\right\rfloor  1} a_{l + n + j} \cdot t_j\right) \), where \(l = in + \left\lfloor \frac{i^22i+1}{4}\right\rfloor \), to encode that the inputs of the gates can be any Sbox input bit or any previously computed bit.

\(\forall i \in \{0,\cdots ,2k1\}\), \(\forall j \in \{l,\cdots ,l+n+\left\lfloor \frac{i}{2}\right\rfloor 2\}\),\(\forall u \in \{j+1,\cdots , l+n+\left\lfloor \frac{i}{2}\right\rfloor 1\}\): \(0 = a_j \cdot a_u\), to encode an ‘at most one’ constraint on the gate inputs.

\(\forall i \in \{0,\cdots ,m1\}\): \(y_i = \left( \sum _{j=0}^{n1} a_{s + j} \cdot x_j\right) + \left( \sum _{j=0}^{k1} a_{s + n + j} \cdot t_j\right) \), where \(s = 2kn + k(k1) + i(n+k)\), to encode that the Sbox output bit can be any Sbox input bit or any gate output.

\(\forall i \in \{0,\cdots ,m1\}\), \(\forall j \in \{s,\cdots ,s+n+k2\}\), \(\forall u \in \{j+1,\cdots ,s+n+k1\}\): \(0 = a_j \cdot a_u\), to encode an ‘at most one’ constraint on the Sbox outputs.
Encoding of different types of gates (bitslice gate complexity)
\(b_{3i} b_{3i+1} b_{3i+2}\)  Gate \(t_i\) function 

000  0 
001  \(\lnot q_{2i}\) 
010  \(q_{2i} \oplus q_{2i+1}\) 
011  Prevented by constraint on \(b_{3i+2}\) 
100  \(q_{2i} \wedge q_{2i+1}\) 
101  Prevented by constraint on \(b_{3i+2}\) 
110  \(q_{2i} \vee q_{2i+1}\) 
111  Prevented by constraint on \(b_{3i+2}\) 
Converting \(C\) to \(C'\) and then to CNF is the same process as with the multiplicative complexity decision problem. Note that the ‘constraint equations’ on \(a_i\) and \(b_j\) do not have to be duplicated \(2^n\) times for \(C'\), as they are not renumbered. This saves a lot of redundant clauses.
Bitslice gate complexity of Sboxes
Sbox  Size \(n \times m\)  Bitslice gate complexity  Implementation 

Keccak/Ketje/Keyak  \(5 \times 5\)  \(\le 13\)  3 AND, 2 OR, 5 XOR, 3 NOT 
Joltik/Piccolo  \(4 \times 4\)  10  1 AND, 3 OR, 4 XOR, 2 NOT 
Joltik\(^{1}\)/Piccolo\(^{1}\)  \(4 \times 4\)  10  1 AND, 3 OR, 4 XOR, 2 NOT 
LAC  \(4 \times 4\)  11  2 AND, 2 OR, 6 XOR, 1 NOT 
Minalpher  \(4 \times 4\)  \(\ge 11\)  
Prøst  \(4 \times 4\)  8  4 AND, 4 XOR 
RECTANGLE  \(4 \times 4\)  \(\in \{11, 12\}\)  1 AND, 3 OR, 7 XOR, 1 NOT 
RECTANGLE\(^{1}\)  \(4 \times 4\)  \(\in \{10, 11, 12\}\)  4 OR, 7 XOR, 1 NOT 
For Prøst and the (forward) Sbox of RECTANGLE, it is interesting to note that the SAT solvers are able to find the same implementations as the corresponding authors already suggested. We have proven that their bitsliced implementations are indeed minimal.
3.4 Optimizing for Gate Complexity
A method to encode the gate complexity decision problem was also provided in [13, 20], but again, actual results were only given for the CTC2 toy cipher. We show that it is feasible to compute the gate complexity for realworld 4bit Sboxes as well.

Instead of the previous rule for \(t_i\), the gates are encoded differently: \(\forall i \in \{0,\cdots ,k1\}\): \(t_i = b_{3i} \cdot q_{2i} \cdot q_{2i+1} + b_{3i+1} \cdot q_{2i} + b_{3i+1} \cdot q_{2i+1} + b_{3i+2}\), to encode the \(k\) gates. The \(b_i\) determine what kind of gate this will represent, as can be seen in Table 4.

The additional constraints on the \(b_i\) are completely omitted.
Encoding of different types of gates (gate complexity)
\(b_{3i} b_{3i+1} b_{3i+2}\)  Gate \(t_i\) function 

000  0 
001  1 
010  \(q_{2i} \oplus q_{2i+1}\) 
011  \(q_{2i} \leftrightarrow q_{2i+1}\) 
100  \(q_{2i} \wedge q_{2i+1}\) 
101  \(q_{2i} \uparrow q_{2i+1}\) 
110  \(q_{2i} \vee q_{2i+1}\) 
111  \(q_{2i} \downarrow q_{2i+1}\) 
Converting \(C\) to \(C'\) and then to CNF is similar to the previous optimization goals.
Gate complexity of Sboxes
Sbox  Gate complexity  Implementation 

Joltik/Piccolo  8  2 OR, 1 XOR, 2 NOR, 3 XNOR 
Joltik\(^{1}\)/Piccolo\(^{1}\)  8  2 OR, 1 XOR, 2 NOR, 3 XNOR 
LAC  10  1 AND, 3 OR, 2 XOR, 4 XNOR 
Prøst  8  4 AND, 4 XOR 
RECTANGLE  \(\in \{10, 11\}\)  1 AND, 1 OR, 2 XOR, 1 NAND, 1 NOR, 5 XNOR 
RECTANGLE\(^{1}\)  \(\in \{10, 11\}\)  1 AND, 1 OR, 6 XOR, 1 NAND, 1 NOR, 1 XNOR 
3.5 Optimizing for Depth Complexity
There are many situations in highspeed hardware implementations where the implementer wants to keep the depth of the circuit as low as possible, in order to be able to increase the clock frequency, without having to use significantly more gates. We provide a novel method to find lowdepth implementations of small functions such as Sboxes using SAT solvers. This method is inspired by the encoding of the gate complexity decision problem, but modified in some important ways.
In the encoding of the gate complexity decision problem, we expressed that every gate can use the Sbox input and the outputs of previous gates as its input. The key idea here is to divide the circuit into depth layers and to encode the notion that a gate can only use the Sbox input and the output of gates in the previous layers as its input. This is made more precise later.
First we note that it is necessary to limit the potential increase of the number of gates when reducing the depth of a circuit. We introduce a fixed maximum layer width \(w\) to address this, so we allow at most \(w\) gates to be executed in parallel. For some function \(f\), we want to be able to answer questions such as: “is there a circuit implementing \(f\) with depth \(k\) and with at most \(w\) gates on each depth layer?”.

\(\forall i \in \{0,\cdots ,kw1\}\): \(t_i = b_{3i} \cdot q_{2i} \cdot q_{2i+1} + b_{3i+1} \cdot q_{2i} + b_{3i+1} \cdot q_{2i+1} + b_{3i+2}\), to encode the \(kw\) gates. The \(b_i\) determine what kind of gate this will represent, as can be seen in Table 4.

\(\forall i \in \{0,\cdots ,2kw1\}\): \(q_i = \left( \sum _{j=0}^{n1} a_{l + j} \cdot x_j\right) + \left( \sum _{j=0}^{v1} a_{l + n + j} \cdot t_j\right) \), where \(v = \left\lfloor \frac{i}{2w}\right\rfloor w\) and \(l = in + v\left( ivw\right) \), to encode that the inputs of the gates can be any Sbox input bit or any previously computed bit.

\(\forall i \in \{0,\cdots ,2kw1\}\), \(\forall j \in \{l,\cdots ,l+n+v2\}\), \(\forall u \in \{j+1,\cdots ,l+n+v1\}\): \(0 = a_j \cdot a_u\), to encode an ‘at most one’ constraint on the gate inputs.

\(\forall i \in \{0,\cdots ,m1\}\): \(y_i = \left( \sum _{j=0}^{n1} a_{s + j} \cdot x_j\right) + \left( \sum _{j=0}^{kw1} a_{s + n + j} \cdot t_j\right) \), where \(s = kw(2n + kw  w) + i(n+kw)\), to encode that the Sbox output bit can be any Sbox input bit or any gate output.

\(\forall i \in \{0,\cdots ,m1\}\), \(\forall j \in \{s,\cdots ,s+n+kw2\}\), \(\forall u \in \{j+1,\cdots ,s+n+kw1\}\): \(0 = a_j \cdot a_u\), to encode an ‘at most one’ constraint on the Sbox outputs.
Converting \(C\) to \(C'\) and subsequently expressing this in CNF is again the same process as before.
Depth complexity of Sboxes
Sbox  Depth complexity  w  Implementation  UNSAT boundaries 

Joltik/Piccolo  4  2  2 OR, 1 XOR,  \(k=4,w=1\) 
2 NOR, 3 XNOR  \(k=3,w=10\)  
Joltik\(^{1}\)/Piccolo\(^{1}\)  4  3  3 OR, 5 XOR,  \(k=4,w=2\) 
1 NOR, 3 XNOR  \(k=3,w=10\)  
LAC  3  6  3 OR, 4 XOR,  \(k=3,w=4\) 
4 NAND, 4 XNOR  \(k=2,w=10\)  
Prøst  4  3  4 AND, 1 OR, 4 XOR,  \(k=4,w=2\) 
1 NAND, 1 XNOR  \(k=3,w=10\)  
RECTANGLE  3  6  2 AND, 3 OR, 5 XOR,  \(k=3,w=4\) 
1 NAND, 1 NOR, 3 XNOR  \(k=2,w=10\)  
RECTANGLE\(^{1}\)  3  6  1 OR, 8 XOR,  \(k=3,w=4\) 
3 NAND, 2 NOR, 2 XNOR  \(k=2,w=10\) 
4 Combining Criteria: Optimizing the PRIMATEs SBox
So far, we have seen how to optimize for one specific goal. However, a result that is optimized for multiplicative complexity may contain more XOR gates than is desired, and a result that is optimized for gate complexity may contain more nonlinear gates than is desired for a masked implementation. Here we show how multiple optimization goals can be combined by looking at the 5bit PRIMATEs Sbox. We first optimize for multiplicative complexity to have a minimal number of nonlinear gates, and subsequently we minimize the number of linear gates. The result is an implementation that has 4 AND, 3 OR, 31 XOR, and 5 NOT gates.
The PRIMATEs Sbox is an almost bent permutation with a maximum linear and differential probability of \(2^{4}\). It is chosen because of its low area consumption in hardware implementations.
It is not hard to see that there are a lot of redundant XOR operations in this implementation. We distinguish between XOR operations before the nonlinear gates (on \(x_i\)) and XOR operations after the nonlinear gates (on \(t_i\)). It is possible to see them as two straightline linear programs, where the first describes the linear part of the Sbox approached from the input and the second describes the linear part approached from the Sbox output.
We are able to decrease the previous result of 58 XOR gates to only 31 XOR gates.
Tools. We provide tools to generate \(C'\) in ANF for all discussed optimization goals and to convert a SAT solver solution back to an Sbox implementation. We place those tools into the public domain. They and additional documentation are available online at https://github.com/Ko/sboxoptimization.
5 Conclusion
SAT solvers can be used to find minimal implementations for small functions such as Sboxes with respect to criteria as the multiplicative complexity, bitslice gate complexity, gate complexity, and circuit depth complexity. We have shown how this can be done and how multiple criteria can be combined. However, for 8bit Sboxes and larger functions these methods quickly become infeasible. One will then have to resort to approaches based on heuristics.
Footnotes
Supplementary material
References
 1.Albrecht, M.R., Rechberger, C., Schneider, T., Tiessen, T., Zohner, M.: Ciphers for MPC and FHE. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 430–454. Springer, Heidelberg (2015)Google Scholar
 2.Andreeva, E., Bilgin, B., Bogdanov, A., Luykx, A., Mendel, F., Mennink, B., Mouha, N., Wang, Q., Yasuda, K.: PRIMATEs v1.02. CAESAR submission (2015). http://competitions.cr.yp.to/round2/primatesv102.pdf, http://primates.ae/
 3.Bard, G.V., Courtois, N.T., Jefferson, C.: Efficient methods for conversion and solution of sparse systems of lowdegree multivariate polynomials over GF(2) via SATsolvers. IACR Cryptology ePrint Archive, Report 2007/024 (2007). http://eprint.iacr.org/
 4.Bertoni, G., Daemen, J., Peeters, M., Van Assche, G.: The Keccak reference, January 2011. http://keccak.noekeon.org/
 5.Bertoni, G., Daemen, J., Peeters, M., Van Assche, G., Van Keer, R.: Ketje v1. CAESAR submission (2014). http://competitions.cr.yp.to/round1/ketjev11.pdf, http://ketje.noekeon.org/
 6.Bertoni, G., Daemen, J., Peeters, M., Van Assche, G., Van Keer, R.: Keyak v2. CAESAR submission (2015). http://competitions.cr.yp.to/round2/keyakv2.pdf, http://keyak.noekeon.org/
 7.Boyar, J., Matthews, P., Peralta, R.: On the shortest linear straightline program for computing linear forms. In: Ochmański, E., Tyszkiewicz, J. (eds.) MFCS 2008. LNCS, vol. 5162, pp. 168–179. Springer, Heidelberg (2008)CrossRefGoogle Scholar
 8.Boyar, J., Peralta, R.: A new combinational logic minimization technique with applications to cryptology. In: Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 178–189. Springer, Heidelberg (2010)CrossRefGoogle Scholar
 9.Boyar, J., Peralta, R., Pochuev, D.: On the multiplicative complexity of Boolean functions over the basis \((\wedge,\oplus,1)\). Theoret. Comput. Sci. 235(1), 43–57 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
 10.Buchfuhrer, D., Umans, C.: The complexity of Boolean formula minimization. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 24–35. Springer, Heidelberg (2008)CrossRefGoogle Scholar
 11.Chambers, B., Manolios, P., Vroon, D.: Faster SAT solving with better CNF generation. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2009, 3001 Leuven, Belgium, Belgium, pp. 1590–1595. European Design and Automation Association (2009)Google Scholar
 12.Courtois, N., Hulme, D., Mourouzis, T.: Solving circuit optimisation problems in cryptography and cryptanalysis. Cryptology ePrint Archive, Report 2011/475 (2011). http://eprint.iacr.org/
 13.Courtois, N., Mourouzis, T., Hulme, D.: Exact logic minimization and multiplicative complexity of concrete algebraic and cryptographic circuits. Int. J. Adv. Intell. Syst. 6(3 and 4), 165–176 (2013)Google Scholar
 14.Dobraunig, C., Eichlseder, M., Mendel, F., Schläffer, M.: Ascon v1.1. CAESAR submission (2015). http://competitions.cr.yp.to/round2/asconv11.pdf, http://ascon.iaik.tugraz.at
 15.Fuhs, C., SchneiderKamp, P.: Optimizing the AES Sbox using SAT. In: IWIL@ LPAR, pp. 64–70. Citeseer (2010)Google Scholar
 16.Fuhs, C., SchneiderKamp, P.: Synthesizing shortest linear straightline programs over GF(2) using SAT. In: Strichman, O., Szeider, S. (eds.) SAT 2010. LNCS, vol. 6175, pp. 71–84. Springer, Heidelberg (2010)CrossRefGoogle Scholar
 17.Jean, J., Nikolic, I., Peyrin, T.: Joltik v1.3. CAESAR submission (2015). http://competitions.cr.yp.to/round2/joltikv13.pdf, http://www1.spms.ntu.edu.sg/~syllab/m/index.php/Joltik
 18.Kavun, E.B., Lauridsen, M.M., Leander, G., Rechberger, C., Schwabe, P., Yalçın, T.: Prøst v1.1. CAESAR submission (2014). http://competitions.cr.yp.to/round1/proestv11.pdf
 19.Morawiecki, P., Gaj, K., Homsirikamol, E., Matusiewicz, K., Pieprzyk, J., Rogawski, M., Srebrny, M., Wójcik, M.: ICEPOLE v2. CAESAR submission (2015). http://competitions.cr.yp.to/round2/icepolev2.pdf
 20.Mourouzis, T.: Optimizations in Algebraic and Differential Cryptanalysis. PhD thesis, UCL (University College London) (2015)Google Scholar
 21.Sasaki, Y., Todo, Y., Aoki, K., Naito, Y., Sugawara, T., Murakami, Y., Matsui, M., Hirose, S.: Minalpher v1.1. CAESAR submission (2015). http://competitions.cr.yp.to/round2/minalpherv11.pdf
 22.Shibutani, K., Isobe, T., Hiwatari, H., Mitsuda, A., Akishita, T., Shirai, T.: Piccolo: an ultralightweight blockcipher. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 342–357. Springer, Heidelberg (2011)CrossRefGoogle Scholar
 23.Zhang, L., Wenling, W., Wang, Y., Shengbao, W., Zhang, J.: LAC: A lightweight authenticated encryption cipher. CAESAR submission (2014). http://competitions.cr.yp.to/round1/lacv1.pdf
 24.Zhang, W., Bao, Z., Lin, D., Rijmen, V., Yang, B., Verbauwhede, I.: RECTANGLE: a bitslice ultralightweight block cipher suitable for multiple platforms. Cryptology ePrint Archive, Report 2014/084 (2014). http://eprint.iacr.org/