Changing of the Guards: A Simple and Efficient Method for Achieving Uniformity in Threshold Sharing
 8 Citations
 3k Downloads
Abstract
Since they were first proposed as a countermeasure against differential power analysis (DPA) and differential electromagnetic analysis (DEMA) in 2006, threshold schemes have attracted a lot of attention from the community concentrating on cryptographic implementations. What makes threshold schemes so attractive from an academic point of view is that they come with an informationtheoretic proof of resistance against a specific subset of sidechannel attacks: firstorder DPA. From an industrial point of view they are attractive as a careful threshold implementation forces adversaries to DPA of higher order, with all its problems such as noise amplification. A threshold scheme that offers the mentioned provable security must exhibit three properties: correctness, incompleteness and uniformity. A threshold scheme becomes more expensive with the number of shares that must be implemented and the required number of shares is lower bound by the algebraic degree of the function being shared plus 1. Defining a correct and incomplete sharing of a function of degree d in \(d+1\) shares is straightforward. However, up to now there is no generic method to achieve uniformity and finding uniform sharings of degreed functions with \(d+1\) shares has been an active research area. In this paper we present a generic, simple and potentially cheap method to find a correct, incomplete and uniform \(d+1\)share threshold scheme of any Sbox layer consisting of degreed invertible Sboxes. The uniformity is not implemented in the sharings of the individual Sboxes but rather at the Sbox layer level by the use of feedforward and some expansion of shares. When applied to the Keccak\(p\) nonlinear step \(\chi \), its cost is very small.
Keywords
Sidechannel attacks Threshold schemes Uniformity Keccak1 Introduction
Systems such as digital rights management (DRM) or banking cards try to offer protection against adversaries that have physical access to platforms performing cryptographic computations, allowing them to measure computation time, power consumption or electromagnetic radiation. Adversaries can use this side channel information to retrieve cryptographic keys. A particularly powerful attack against implementations of cryptographic algorithms is differential power analysis (DPA) introduced by Kocher et al. [20]. This attack can exploit even the weakest dependence of the power consumption (or electromagnetic radiation) on the value of the manipulated data by combining the measurements of many computations to improve the signaltonoise ratio. The simplest form of DPA is firstorder DPA, that exploits the correlation between the data and the power consumption. To make side channel attacks impractical, system builders implement countermeasures, often multiple at the same time.
In threshold schemes, as proposed by Rijmen et al. [23, 24, 25] one represents each sensitive variable by a number of shares (typically denoted by \(d+1\)) such that their (usually) bitwise sum equals that variable. These shares are initially generated in such a way that any subset of d shares gives no information about the sensitive variable. Functions (Sboxes, mixing layers, round functions ...) are computed on the shares of the inputs resulting in the output as a number of shares. Threshold schemes must be correct: the sum of the output shares equals the result of applying the implemented function on the sum of the input shares. Another essential property of a threshold implementation of a function is incompleteness: each output share shall be computed from at most d input shares, or equivalently, in the computation of each output share at least one input share is not used. Incompleteness guarantees that each individual output share computation cannot leak information about sensitive variables. The resulting output is then typically subject to some further computation, again in the form of separate and incomplete computation on shares. For these subsequent computations to not leak information about the sensitive variables, the output of the previous stage must still be uniform. Therefore, in an iterative cryptographic primitive such as a block cipher, we need a threshold implementation of the round function that yields a uniformly shared output if its input is uniformly shared. This property of the threshold implementation is called uniformity.
Threshold schemes form a good protection mechanism against DPA. In particular, using it allows building cryptographic hardware that is guaranteed to be unattackable with firstorder DPA, assuming certain leakage models of the cryptographic hardware at hand and for a plausible definition of “first order”. De Cnudde et al. have an interesting work [13] on such assumptions and their validity in the real world. Still, threshold schemes remain a very attractive technique for building cipher implementations that offer a high level of resistance against DPA and differential electromagnetic analysis (DEMA).
Constructing an incomplete threshold implementation of a nonlinear function is rather straightforward and can be done in the following way. One can express the function algebraically as the sum of monomials. Then one replaces each shared variable by the sum of its shares. Subsequently, one can work out the expressions resulting in a larger number of monomials, where the factors are bits (or in general, components) of the shares. A monomial of degree d can have factors from at most d shares. So if there are \(d+1\) shares, such a monomial is incomplete: there is at least one share missing. It follows that to build an incomplete sharing of a function of algebraic degree d, it suffices to take \(d+1\) shares. Clearly, the implementation cost of a function increases exponentially with its degree: a monomial of degree d requires \(d+1\) shares and explodes into the sum of \((d+1)^d\) monomials. To reduce the implementation cost, Stoffelen applies techniques for representing Sboxes with minimum number of nonlinear operations [31]. Kutzner et al. on the other hand factor Sboxes of some degree as the composition of functions of lower algebraic degree [21]. Such techniques, combined with tower field representation, are also applied in the sharing of the AES Sbox, that natively has algebraic degree 7. We refer again to De Cnudde et al. for an example [14]. These publications demonstrate that these techniques are quite powerful, but serial composition comes at a prize. It requires the insertion of registers (or latches) between the combinatorial circuits that increase latency.
Constructing a correct, incomplete and uniform sharing is widely perceived as a challenge and an important research problem. Several publications have been devoted to the classification of 3, 4 and 5bit Sboxes with respect to cryptographic properties, and the minimum number of shares for which a uniform sharing is known is an important criterion. Examples include the study of Bilgin et al. [8] and that of Božilov et al. [10]. Other papers propose solutions, sometimes only partial, for large classes of Sboxes. We refer again to Bilgin et al. [9], Kutzner et al. [21], and Beyne et al. [5]. A wellknown example of an Sbox that is problematic in this context is the Keccak Sbox, known as \(\chi \). It has algebraic degree 2 and no uniform incomplete 3share threshold implementations is known. We proposed a number of different solutions with varying degrees of efficiency in [6]. One solution is the transition from 3 to 4 or even 5 shares. Another is the compensation of loss of uniformity by injecting fresh randomness. As argued by Reparaz et al. [29], this technique brings the threshold scheme in the realm of private circuits as proposed by Ishai et al. [19].
Given a nonuniform threshold implementation, it is not immediate how to exploit its nonuniformity in an attack. We made a start in explorations in that direction in [16, 17]. However, uniformity of a threshold implementation is essential in its informationtheoretical proof of resistance against firstorder DPA. In short, if one has a uniform sharing, one does not have to give additional arguments why the threshold scheme would be secure against firstorder DPA.
In this paper we present a simple and efficient technique for building a threshold implementation with \(d+1\) shares of any invertible Sbox layer of degree d that is correct, incomplete and uniform. When applied to the nonlinear layer in Keccak, \(\chi \), it can be seen as the next logical step of the methods discussed in Sect. 3 of our paper [6]. In that method 4 fresh random bits must be introduced every round to restore uniformity. The added value of the technique in this paper is that it no longer needs any fresh randomness and that it can convert a correct and incomplete sharing of any Sbox into a correct, incomplete and uniform sharing of a layer of such Sboxes.
1.1 The “Changing of the Guards” Idea in a Nutshell

The shared Sboxes are arranged in a linear array. These sharings must be correct and incomplete.

Each share at the output of Sbox i is made uniform by bitwise adding to it one or two shares from the input of Sbox \(i1\).

The state is augmented with d dummy components, called guards, to be added to the output of the first Sbox in the array.

The new value of the guards are taken from the input of the last Sbox in the array.

Uniformity is proven by giving an algorithm that computes the shared input from the shared output of this mapping.
For threshold sharings that have a socalled multitransformation property, the guards can be reduced in size and so does the amount of bits fed forward.
1.2 Notation
Assume we have a nonlinear mapping that consists of a layer of invertible Sboxes. We denote the width of the Sboxes by n and their total number by m. So the layer operates on an array of \(n\times m\) bits. We denote the input as \(x = (x_1, x_2, x_3, \ldots x_m)\) and the output as \(X = (X_1, X_2, X_3, \ldots X_m)\), with each of the \(x_i\) and \(X_i\) an nbit array.
In general the Sboxes can differ per position. We denote the Sbox at position i by \(S_i\), so \(X_i = S_i(x_i)\).
We denote addition in GF(2) by \(+\).
1.3 Overview of the Paper
In Sect. 2 we explain and prove the soundness of the method applied to the simplest possible case. In Sect. 3 we formulate the method for a more general case and in Sect. 4 we apply it to the nonlinear layer used in Keccak, Keyak and Ketje. Finally in Sect. 5 we discuss some implementation aspects.
2 The Basic Method Applied to 3Share Threshold Schemes
At the basis of our “Changing of the Guards” technique for achieving uniformity is the expansion of the shared representation. In particular, for the input we expand share b with an additional dummy component that we denote as \(b_0\) and do the same for c. In this sharing x is represented by (a, b, c) where a has m components and both b and c have \(m+1\) components. A triplet (a, b, c) is a uniform sharing of x if all possible values (a, b, c) compliant with x are equiprobable. As there are \(2^{(3m+2)n}\) possible triplets (a, b, c) and being compliant with x requires the satisfaction of mn independent linear binary equations, there are exactly \(2^{(3m+2)n mn} = 2^{2(m+1)n}\) encodings (a, b, c) of any particular value x. The same holds for the sharing (A, B, C) of the output X.
Definition 1
We can now prove the following theorem.
Theorem 1
If S is an invertible Sbox and \((S_a,S_b,S_c)\) is a correct and incomplete sharing of S, the sharing of Definition 1 is a correct, incomplete and uniform sharing of an Sbox layer with S as component.
Proof
For uniformity, we observe that for each input x or each output X there are exactly \(2^{2(m+1)n}\) valid sharings. If the mapping of Definition 1 is an invertible mapping from (a, b, c) to (A, B, C), it implies that if (a, b, c) is a uniform sharing of x, then (A, B, C) is a uniform sharing of X. It is therefore sufficient to show that the mapping of Definition 1 is invertible. We will do that by giving a method to compute (a, b, c) from (A, B, C).
The term “guards” refers to the dummy components \(b_0\) and \(c_0\) that are there to guard uniformity and that are “changed” to \(B_0\) and \(C_0\) by the shared implementation of the Sbox layer.
The cost of this method is the addition of 4 XOR gates per bit of x and the expansion of the representation by 2n bits. The cost of additional XOR gates is typically not negligible but still relatively modest compared to the gates in the Sbox sharing. For a typical Sbox layer the expansion of the state is very small.
When applying this method to an iterated cipher that has a round function consisting of an Sbox layer and a linear layer, one can do the following. The sharing of the Sbox layer maps (a, b, c) to (A, B, C) and the linear layer is applied to the shares separately. In the linear mapping the guard components \(B_0\) and \(C_0\) are simply mapped to the components \(b_0\) and \(c_0\) of the next round by the identity.
It is likely that the swapping that takes place between the guards is not necessary, but it does simplify the proof for the incompleteness aspect.
3 Generalization to Any Invertible Sbox Layer
Here we give a method for an Sbox layer with only restriction that the component Sboxes have the same width and are all invertible. So this includes the case that the Sboxes are different and even the case that they have different algebraic degrees. We assume the maximum degree over all Sboxes of the layer is d and so we can produce a correct and incomplete threshold scheme with \(d+1\) shares. We denote the shares by \(x^0\) to \(x^d\) and component j of share i by \(x^j_i\).
In the generalization there are d guard components instead of two. Similarly to the threeshare implementation, there is no guard for the first share (a or \(x^{0}\)). The schedule for adding shares from the neighboring Sbox is somewhat more complicated. There are four cases, depending on the index j of the output share considered:
 \(j>2\)

: add input shares \(j1\) and \(j2\) of its neighboring Sbox;
 \(j=2\)

: add input share 1 of its neighboring Sbox;
 \(j=1\)

: add input share d of its neighboring Sbox;
 \(j=0\)

: add input shares d and \(d1\) of its neighboring Sbox.
Definition 2
We can now prove the following theorem.
Theorem 2
Let \(\mathcal {S}\) be an Sbox layer consisting of invertible nbit Sboxes \(S_i\) with \(1 \le i \le m\), where the Sboxes \(S_i\) may be different and where d is the maximum degree over all these Sboxes. Let \((S^0_i,S^1_i,\ldots S^d_i)\) with \(1 \le i \le m\) be correct and incomplete sharing of \(S_i\) with \(d+1\) shares. Then the sharing of Definition 2 is a correct, incomplete and uniform sharing of the Sbox layer with \(S_i\) as components.
Proof
For uniformity, it is again sufficient to show that the mapping of Definition 2 is invertible. We give a method to compute \((x^0,x^1,x^2,\ldots , x^d)\) from \((X^0,X^1,X^2,\ldots , X^d)\).
As said, our method applies also to heterogeneous Sbox layers, i.e., Sbox layers with different Sboxes. Such layers are quite rare in modern cryptography, especially after the benefits of symmetry became clear. The block cipher DES [26] is a notable exception to this, but one may argue whether that is a modern cipher. In any case, one may ask how the method applies to Sbox layer in the DES Ffunction as it consists of noninvertible Sboxes. Remarkably, as was stated by Boss et al. [11] and mathematically explained by the same team [12], in Feistel networks where the Sbox layer is embedded in a function whose output is (bitwise) added to part of the state, uniformity is achieved automatically. Basically, thanks to the Feistel construction the shared round function is a permutation and hence uniform. So, if the algebraic degree of the Sbox layer is d, it is sufficient to represent the state by \(d+1\) shares and have a threshold implementation for the Sboxes that is correct and incomplete.
4 Application to the Sharing \(\chi '\) for Keccak
Keccak\(p\) is the permutation underlying our hash function Keccak [2, 28], our authenticated encryption schemes Keyak [4] and Ketje [3] and is defined in the Keccak reference [2] and NIST standard [28].
4.1 The Sharing \(\chi '\) of the Nonlinear Layer in Keccak
4.2 The Multitransformation Property
The mapping \(\chi '\) has a remarkable property that we can exploit to reduce the overhead due to the “Changing of the Guards” method. We call this a multitransformation property, inspired by the concept of multipermutations proposed by Schnorr and Vaudenay [30]. Loosely speaking, an nbit transformation has a multitransformation property of order r if for any input, the bits in r specific positions in the input and the bits in \(nr\) specific positions in the output, with \(r<n\), together fully determine the remaining \(nr\) bits of the input. We now give a more rigorous definition.
Definition 3 (Transformation property with respect to an index subset)
Clearly, any nbit transformation f has a transformation property of order n with respect to \(S= \mathbb {Z}_n\). So if it has the transformation property with respect to an additional set \(S\), we call it a multitransformation. Note that any permutation f has a transformation property of order 0 with respect \(S= \mathbb {Z}_{2n} \setminus \mathbb {Z}_{n}\). In the context of this paper we are interested in finding a multitransformation property in Sbox threshold implementations that are not uniform and hence are not permutations.
4.3 Using the Multitransformation Property of \(\chi '\)
The mapping \(\chi '\) restricted to a single row is a transformation operating on 15 bits. We can show it has a transformation property of order 6. The consequence of this is that we can reduce the size of the guards from 10 bits to 4 bits and the number of bitwise addition operations per row to 8.
We first need to introduce some notation. For a 5bit vector s, let \(\mathrm {L}(s) \triangleq (s^0,s^1,s^2)\) and \(\mathrm {R}(s) \triangleq (s^3,s^4)\). Similarly, we define \(\mathrm {L}(a,b,c) \triangleq (a^0,b^0,c^0,a^1,b^1,c^1,a^2,b^2,c^2)\) and \(\mathrm {R}(a,b,c) \triangleq (a^3,b^3,c^3,a^4,b^4,c^4)\).
Lemma 1
For any of the \(2^{15}\) choices of \(\mathrm {L}(A,B,C),\mathrm {R}(a,b,c)\), there is exactly one solution \(\mathrm {L}(a,b,c),\mathrm {R}(A,B,C)\) such that \((A,B,C) = \chi '(a,b,c)\).
Proof
We can use Lemma 1 to apply a variant of the “Changing of the Guards” method to \(\chi '\) that requires less state expansion and XOR gates due to the feedforward. We call it \(\chi ''\).
Definition 4
Note that \(\mathrm {L}(b_0)\), \(\mathrm {L}(c_0)\), \(\mathrm {L}(B_0)\) and \(\mathrm {L}(C_0)\) do not occur in the computations. We can therefore reduce the guards to their 2bit right parts: \(\mathrm {R}(b_0)\), \(\mathrm {R}(c_0)\), \(\mathrm {R}(B_0)\) and \(\mathrm {R}(C_0)\).
The total expansion of the state reduces from 2 times the Sbox width (totalling to 10 bits) to 4 bits. Moreover, there are only 8 XOR gates per Sbox, i.e. 1.6 per native bit instead of 4 additional XOR gates per native bit. In the context of the \(\chi '\) sharing the computational overhead is very small, as implementing Eq. (1) requires 9 XOR gates and 9 (N)AND gates per native bit. Note that the multitransformation technique can be applied to other primitives that use a variant of \(\chi \) as nonlinear layer.
We can now prove the following theorem.
Theorem 3
\(\chi ''\) as defined in Definition 4 is a correct, incomplete and uniform sharing of \(\chi \).
Proof

\(\mathrm {R}(a_i) = \mathrm {R}(S^{1}(A_i + B_i + C_i)) + \mathrm {R}(b_i) + \mathrm {R}(c_i)\)

compute \(\mathrm {L}(a_{i},b_{i},c_{i})\) from \(\mathrm {L}(A_{i},B_{i},C_{i}), \mathrm {R}(a_{i},b_{i},c_{i})\) using Lemma 1

\(\mathrm {R}(b_{i1}) = \mathrm {R}(S_c(a_i,b_i)) + \mathrm {R}(C_{i})\)

\(\mathrm {R}(c_{i1}) = \mathrm {R}(S_b(a_i,c_i)) + \mathrm {R}(B_{i})\). \(\square \)
5 Implementation Aspects
In this section we discuss suitability of our method for decomposed Sboxes, parallel and serial architectures.
5.1 Compatibility with Serial Decomposition of Sboxes
To reduce the number of shares, one has proposed the serial decomposition of Sboxes in Sboxes of lower degree. Notably, Kutzner et al. decomposed all 4bit Sboxes of algebraic degree 3 into component degree2 mappings [21] in such a way that for each of the components a correct, incomplete and uniform 3share threshold scheme can be found. One may wonder whether our method can be combined with such decomposition.
As a matter of fact, when “Changing of the Guards” is applied, the requirements on the decomposition due to sharing vanish: it suffices to find a decomposition of an invertible Sbox as the series of two degree2 Sboxes. If such a decomposition exists, but if no uniform sharing for one or both component Sboxes can be found, our method comes to the rescue. In Fig. 3 we illustrate it for the case that the “Changing of the Guards” is applied to both layers. Note that the uniformity of the composed mapping follows directly from the uniformity of the component mappings.
In the case of more complex decompositions that combine serial and parallel composition, our “Changing of the Guards” cannot be readily applied. Especially if the decomposition contains building blocks that are not permutations. A wellknown example of such decompositions are the ones applied to the Sbox of our cipher Rijndael [15] by Moradi et al. [22] and Bilgin et al. [7]. As the Rijndael Sbox has algebraic degree 7 in GF(2) and hence would require 8 shares, a straightforward implementation of our proposed method would be very expensive. Due to the status of Rijndael as worldwide block cipher standard [27], it would be interesting further work to find a decomposition of the Rijndael Sbox in terms of components that are all permutations of low algebraic degree.
5.2 Implementation Cost in Parallel Architectures
5.3 Implementation Cost in Serial Architectures

The Sbox input arrives in the boxes indicated by in. Depending on the architecture these can be registers, the output of another combinatorial block or a multiplexer.
 The operation of the guard registers:

At the beginning of the computation, they are initialized to random values (not depicted).

While processing an Sbox layer, they get their input from the in boxes.

After processing the last Sbox of a layer, they keep their value but swap contents (not depicted).


The Sbox output is presented in the boxes indicated by out for further processing or storage. The guards never leave the guard registers.

During operation the first stage will always be one Sbox ahead of the second stage. This implies that the processing of a layer of m Sboxes will take \(m+1\) cycles.

The guard register of the first stage operate similarly to the singlestage case. The only difference is that after the last Sbox of a layer has been processed, they get their values from the guard registers of the second stage (not depicted to not overload the figure).
 The operation of the guard registers of the second stage:

At the beginning of the computation, they are initialized to random values (not depicted).

While processing an Sbox layer, they get their input from the from the registers or latches, indicated by reg, in between the two stages.

After processing the last Sbox of a stage, they get their value from the guard registers of the first stage.

In a serial implementation the guard registers have a higher relative overhead when comparing to the combinatorial circuit alone. However, when the real estate for keeping the state is also counted, an additional share is much more expensive than some additional XOR gates and guard registers. The exercise by Bilgin et al. [6] reports on 4share serialized architectures that are 30% more expensive than a 3share guardslike one.
6 Conclusions
In this paper we introduce a simple and low cost technique for achieving a 3share correct, incomplete and uniform threshold implementation of the nonlinear layer in Keccak. We have generalized this to a generic technique for achieving a \(d+1\)share correct, incomplete and uniform threshold implementation of any Sbox layer of invertible Sboxes that have degree at most d. Looking for Sboxes with uniform threshold implementations with the minimum (\(d+1\)) number of shares has therefore lost relevance. On the other hand, it becomes now interesting to look for Sboxes that have \(d+1\)share implementations with a suitable multitransformation property, such as observed in the nonlinear layer of Keccak.
Notes
Acknowledgements
I thank Gilles Van Assche, Vincent Rijmen, Begül Bilgin, Svetla Nikova and Ventzi Nikov for working with me on the paper [6], that already contained an idea very close to the “Changing of the Guards” technique, Guido Bertoni for inspiring discussions and finally Lejla Batina and Amir Moradi for useful feedback on earlier versions of this text.
References
 1.Bertoni, G., Daemen, J., Peeters, M., Van Assche, G.: Building power analysis resistant implementations of Keccak. In: Second SHA3 Candidate Conference, August 2010Google Scholar
 2.Bertoni, G., Daemen, J., Peeters, M., Van Assche, G.: The Keccak reference, January 2011. http://keccak.noekeon.org/
 3.Bertoni, G., Daemen, J., Peeters, M., Van Assche, G., Van Keer, R.: CAESAR submission: Ketje v2, September 2016. http://ketje.noekeon.org/
 4.Bertoni, G., Daemen, J., Peeters, M., Van Assche, G., Van Keer, R.: CAESAR submission: Keyak v2, document version 2.2, September 2016. http://keyak.noekeon.org/
 5.Beyne, T., Bilgin, B.: Uniform firstorder threshold implementations. IACR Cryptology ePrint Archive 2016:715 (2016)Google Scholar
 6.Bilgin, B., Daemen, J., Nikov, V., Nikova, S., Rijmen, V., Assche, G.: Efficient and FirstOrder DPA resistant implementations of Keccak. In: Francillon, A., Rohatgi, P. (eds.) CARDIS 2013. LNCS, vol. 8419, pp. 187–199. Springer, Cham (2014). doi: 10.1007/9783319083025_13 Google Scholar
 7.Bilgin, B., Gierlichs, B., Nikova, S., Nikov, V., Rijmen, V.: A more efficient AES threshold implementation. In: Pointcheval, D., Vergnaud, D. (eds.) AFRICACRYPT 2014. LNCS, vol. 8469, pp. 267–284. Springer, Cham (2014). doi: 10.1007/9783319067346_17 CrossRefGoogle Scholar
 8.Bilgin, B., Nikova, S., Nikov, V., Rijmen, V., Stütz, G.: Threshold implementations of all 3 \(\times \) 3 and 4 \(\times \) 4 Sboxes. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, pp. 76–91. Springer, Heidelberg (2012). doi: 10.1007/9783642330278_5 CrossRefGoogle Scholar
 9.Bilgin, B., Nikova, S., Nikov, V., Rijmen, V., Tokareva, N.N., Vitkup, V.: Threshold implementations of small Sboxes. Cryptogr. Commun. 7(1), 3–33 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 10.Božlov, D., Bilgin, B., Sahin, H.: A note on 5bit quadratic permutations’ classification. IACR Trans. Symmetric Cryptol. 2017(1), 398–404 (2017)Google Scholar
 11.Boss, E., Grosso, V., Güneysu, T., Leander, G., Moradi, A., Schneider, T.: Strong 8bit Sboxes with efficient masking in hardware. In Gierlichs, B., Poschmann, A.Y. (eds.) [18], pp. 171–193 (2016)Google Scholar
 12.Boss, E., Grosso, V., Güneysu, T., Leander, G., Moradi, A., Schneider, T.: Strong 8bit sboxes with efficient masking in hardware extended version. J. Cryptogr. Eng. 7(2), 149–165 (2017)CrossRefGoogle Scholar
 13.De Cnudde, T., Bilgin, B., Gierlichs, B., Nikov, V., Nikova, S., Rijmen, V.: Does coupling affect the security of masked implementations? IACR Cryptology ePrint Archive 2016:1080 (2016)Google Scholar
 14.De Cnudde, T., Reparaz, O., Bilgin, B., Nikova, S., Nikov, V., Rijmen, V.: Masking AES with d + 1 shares in hardware. In: Gierlichs, B., Poschmann, A.Y. (eds.) [18], pp. 194–212 (2016)Google Scholar
 15.Daemen, J., Rijmen, V.: The Design of Rijndael — AES, the Advanced Encryption Standard. Springer, Heidelberg (2002)zbMATHGoogle Scholar
 16.Daemen, J.: Spectral characterization of iterating lossy mappings. IACR Cryptology ePrint Archive 2016:90 (2016)Google Scholar
 17.Daemen, J.: Spectral characterization of iterating lossy mappings. In: Carlet, C., Hasan, M.A., Saraswat, V. (eds.) SPACE 2016. LNCS, vol. 10076, pp. 159–178. Springer, Cham (2016). doi: 10.1007/9783319494456_9 CrossRefGoogle Scholar
 18.Gierlichs, B., Poschmann, A.Y. (eds.) Cryptographic Hardware and Embedded Systems  CHES 2016–Proceedings of the 18th International Conference, Santa Barbara, CA, USA, 17–19 August 2016. LNCS, vol. 9813. Springer (2016)Google Scholar
 19.Ishai, Y., Sahai, A., Wagner, D.: Private circuits: securing hardware against probing attacks. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 463–481. Springer, Heidelberg (2003). doi: 10.1007/9783540451464_27 CrossRefGoogle Scholar
 20.Kocher, P., Jaffe, J., Jun, B.: Differential power analysis. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999). doi: 10.1007/3540484051_25 Google Scholar
 21.Kutzner, S., Nguyen, P.H., Poschmann, A.: Enabling 3share threshold implementations for all 4Bit Sboxes. In: Lee, H.S., Han, D.G. (eds.) ICISC 2013. LNCS, vol. 8565, pp. 91–108. Springer, Cham (2014). doi: 10.1007/9783319121604_6 Google Scholar
 22.Moradi, A., Poschmann, A., Ling, S., Paar, C., Wang, H.: Pushing the limits: a very compact and a threshold implementation of AES. In: Paterson, K.G. (ed.) EUROCRYPT 2011. LNCS, vol. 6632, pp. 69–88. Springer, Heidelberg (2011). doi: 10.1007/9783642204654_6 CrossRefGoogle Scholar
 23.Nikova, S., Rechberger, C., Rijmen, V.: Threshold implementations against sidechannel attacks and glitches. In: Ning, P., Qing, S., Li, N. (eds.) ICICS 2006. LNCS, vol. 4307, pp. 529–545. Springer, Heidelberg (2006). doi: 10.1007/11935308_38 CrossRefGoogle Scholar
 24.Nikova, S., Rijmen, V., Schläffer, M.: Secure hardware implementation of nonlinear functions in the presence of glitches. In: Lee, P.J., Cheon, J.H. (eds.) ICISC 2008. LNCS, vol. 5461, pp. 218–234. Springer, Heidelberg (2009). doi: 10.1007/9783642007309_14 CrossRefGoogle Scholar
 25.Nikova, S., Rijmen, V., Schläffer, M.: Secure hardware implementation of nonlinear functions in the presence of glitches. J. Cryptol. 24(2), 292–321 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
 26.NIST: Federal information processing standard 46, data encryption standard (DES), October 1999Google Scholar
 27.NIST: Federal information processing standard 197, advanced encryption standard (AES), November 2001Google Scholar
 28.NIST: Federal information processing standard 202, SHA3 standard: permutationbased hash and extendableoutput functions, August 2015. doi: 10.6028/NIST.FIPS.202
 29.Reparaz, O., Bilgin, B., Nikova, S., Gierlichs, B., Verbauwhede, I.: Consolidating masking schemes. In: Gennaro, R., Robshaw, M. (eds.) CRYPTO 2015. LNCS, vol. 9215, pp. 764–783. Springer, Heidelberg (2015). doi: 10.1007/9783662479896_37 CrossRefGoogle Scholar
 30.Schnorr, C.P., Vaudenay, S.: Parallel FFThashing. In: Anderson, R.J. (ed.) FSE 1993. LNCS, vol. 809, pp. 149–156. Springer, Heidelberg (1994). doi: 10.1007/3540581081_18 CrossRefGoogle Scholar
 31.Stoffelen, K.: Optimizing Sbox implementations for several criteria using SAT solvers. In: Peyrin, T. (ed.) FSE 2016. LNCS, vol. 9783, pp. 140–160. Springer, Heidelberg (2016). doi: 10.1007/9783662529935_8 CrossRefGoogle Scholar