1 Introduction

High-Order Masking. Side-channel analysis is a class of attacks which exploits the physical environment of a cryptosystem during its execution, to reveal the secrets being manipulated. The masking countermeasure is an efficient technique to protect sensitive data against this threat. To protect a sensitive data x, the masking technique consists in generating a random variable r and manipulating the masked variable and the random r separately, instead of x directly. In that case, every intermediate variable has the uniform distribution and any first-order attack is thwarted. However by combining information from both leakage points \(x'\) and r, a second-order attack can still be feasible (see for example [OMHT06]).

A natural countermeasure against high-order attacks is to use a high-order masking, where each variable x is split into n Boolean shares \({x = x_1 \oplus x_2 \oplus \cdots \oplus x_n}\), with \(n > t\) for security against t probes. Initially the shares are generated uniformly at random under this condition; for example one can generate \(x_1,\ldots ,x_{n-1}\) randomly and let \({x_n=x \oplus x_1 \oplus \cdots \oplus x_{n-1}}\). The shares are then processed separately in masked operations (also called gadgets) that enable to compute the underlying secret variables in a secure way.

The study of circuits resistant against probing attacks was initiated by Ishai, Sahai and Wagner in [ISW03]. They showed how to transform any circuit of size s into a circuit of size \(\mathcal{O}(t^2s)\) secure against any adversary who can probe at most t wires. The ISW construction is based on secret sharing every variable x into \(x= x_1 \oplus x_2 \oplus \cdots \oplus x_n\) as above, with \(n=2t+1\) shares to guarantee security against t probes. Processing a XOR gate is straightforward as the shares can be xored separately. For processing an AND gate \(z=xy\), one computes all cross-products \(x_i y_j\) in Eq. (1) below, and then uses a randomized algorithm to recombine the \(n^2\) cross-products into an n-sharing of the output z.

(1)

Every AND gate is then expanded into a gadget of size \(\mathcal{O}(t^2)\) and the resulting circuit has size \(\mathcal{O}(t^2s)\).

The ISW construction was adapted to AES by Rivain and Prouff in [RP10], by working in \(\mathbb {F}_{2^8}\) instead of \(\mathbb {F}_{2}\). The authors observed that the non-linear part \(S(x)=x^{254}\) of the AES SBox can be efficiently evaluated with only 4 non-linear multiplications over \(\mathbb {F}_{2^8}\), and a few linear squarings. Each of those 4 multiplications can in turn be evaluated with the previous ISW gadget based on Eq. (1), by working over \({\mathbb F}_{2^8}\) instead of \({\mathbb F}_2\).

Proving Security. The approach initiated in [ISW03] for proving security against a t-probing adversary is based on simulation; one must show that the view of an adversary probing at most t wires can be perfectly simulated without knowing the secret variables from the original circuit. To this aim, one shows that any set of t probed variables can be perfectly simulated from the knowledge of at most \(n-1\) input shares. Since any subset of \(n-1\) input shares is uniformly and independently distributed, this ensures that the adversary learns nothing from the t probes, since he could simulate them by himself. It was shown in [DDF14] that security against t probes implies security against noisy leakage, under the assumption that every variable leaks independently.

Recently, the notions of (Strong) Non-Interference (NI/SNI) were introduced by Barthe et al. in [BBD+16], to allow easy composition of gadgets. The authors showed that the ISW multiplication gadget does satisfy the stronger t-SNI security definition. They also showed that with some additional mask refreshing, the Rivain-Prouff countermeasure for the full AES can be made secure with \(n= t+1\) shares only, instead of \(n =2t+1\) shares in [ISW03].

More recently, a new security notion was introduced by Cassiers and Standaert in [CS18], called PINI, that allows even simpler composition of gadgets. Namely it suffices to ensure that all gadgets are PINI, and the composite gadget is then also PINI, which also implies security against t probes. With its power and simplicity, the PINI definition appears to be the “right” notion for gadget security and composition; therefore we will use this definition in this paper, either by proving the PINI property of a gadget directly, or by first proving the t-SNI property and then PINI.

Minimizing Randomness Complexity. High-order masking countermeasures against side-channel attacks usually require plenty of randomness during their execution. The secure AND operation from [ISW03] with \(t+1\) shares requires \({t(t+1)/2}\) random bits, and therefore the randomness complexity of the ISW countermeasure is \(\mathcal{O}(t^2s)\), where s is the circuit size. More concretely, the evaluation of the AES SBox in Rivain-Prouff [RP10] requires the execution of 4 secure multiplications and 2 mask refreshing; each of those 6 gadgets requires \(t(t+1)/2\) fresh random bytes. For the 16 SBoxes and the 10 rounds of the AES, this amounts to generating \(6 \times 16 \times 10 \times t(t+1)/2 = 480t(t+1)\) random bytes, which gives 2880 bytes for second-order security (\(t=2)\).

However running a True Random Number Generator (TRNG) can be costly in practice and become a major bottleneck on embedded devices such as smart-cards. Thus, high-order resistant algorithms can rapidly become impractical when the number of shares grows. The main question is therefore how to minimize the number of TRNG calls while still guaranteeing t-probing security as in [ISW03].

Several attempts have been made to reduce the randomness complexity of private circuits. In [BBP+16], the authors showed a variant of the ISW multiplication with roughly \(t^2/4\) randoms instead of \( t^2/2\) in ISW. In [FPS17], the authors showed how to re-use randomness within several gadgets, thereby reducing the total amount of randomness needed, for small values of t (\(t \le 7\)). However the two above approaches only reduce the randomness complexity by a constant factor; that is, their asymptotic complexity is still \(\mathcal{O}(t^2s)\) for circuit size s, as in the original ISW countermeasure.

A natural idea to reduce the number of calls to the TRNG is to use a pseudo-random generator (PRG) to generate all randoms in the circuit, while only a small seed will be generated by the TRNG. Obviously the PRG circuit should also be secure against probing attacks. We recall below that such approach, initiated by Ishai et al. in [IKL+13] with the concept of robust PRG, enables to reduce the randomness complexity of t-private circuits from \(\mathcal{O}(t^2s)\) to \(\mathcal{O}(t^4(\log s+ \log t))\); with respect to the circuit size s, this is therefore an exponential improvement. Our main contribution is this paper will be to reduce this complexity further down to \( \mathcal{O}(t^2(\log s+ \log t))\), and to describe a concrete implementation of AES based on this approach. We refer to Table 2 below for the number of bytes required to protect AES against t-th order attacks; we see that for small values of t, we obtain almost two orders of magnitude improvement compared to previous methods.

Robust PRGs and Private Circuits. In [IKL+13], the authors introduced the notion of robust pseudo-random number generator (PRG). A robust PRG must remain secure even if an adversary can probe at most t intermediate variables in the PRG circuit. The authors showed that such robust PRG can be used in the ISW countermeasure to minimize the randomness complexity. Namely the resulting circuit uses a short random seed only, and remains secure against t-th order attacks.

Recall that the original ISW countermeasure requires \(\mathcal{O}(t^2 s)\) bits of randomness, where s is the circuit size. Following [IKL+13], we first recall how this can be reduced to \(\mathcal{O}(t^4(\log t+ \log s))\), using a trivial construction of robust PRG. More precisely, the construction is based on r-wise independent PRG. A PRG is said to be r-wise independent if any subset of at most r output bits of the PRG is uniformly and independently distributed. The authors show that the ISW countermeasure can be adapted so that any wire in the ISW circuit depends on at most \(\ell =\mathcal{O}(t^2)\) bits of randomness; such parameter \(\ell \) is called the locality of the randomness and will play a crucial role in this paper. Since the adversary can probe at most t wires, the adversary’s side-channel observation can then depend on at most \(t \cdot \ell =\mathcal{O}(t^3)\) bits of randomness. Therefore, instead of using a TRNG, it is sufficient to use an r-wise independent PRG with parameter \(r=t \cdot \ell =\mathcal{O}(t^3)\); if the r-wise PRG is secure against t probes, as shown in [IKL+13] the resulting circuit will remain secure against t probes.

It is easy to obtain an r-wise independent PRG by evaluating a degree \(r-1\) polynomial on distinct inputs in a finite field \({\mathbb F}\); the r coefficients of the polynomials are initially generated at random in \({\mathbb F}\); this is the seed of the PRG. From r fresh randoms in \({\mathbb F}\), one can then obtain m pseudo-randoms with the r-wise independence property, as long as \(m \le |{\mathbb F}|\). To obtain an r-wise independent PRG with robustness against t probes, as observed in [IKL+13] a trivial construction consists in xoring the output of \(t+1\) PRGs, so that at least one PRG has not been probed. One can therefore obtain an r-wise independent PRG robust against t probes by using \(r \cdot (t+1)=\mathcal{O}(t^4)\) fresh randoms in \({\mathbb F}\) as input, and such PRG can then generate \(m \le |{\mathbb F}|\) pseudo-randoms in \({\mathbb F}\). Since the original ISW countermeasure requires \(m=\mathcal{O}(t^2s)\) randoms (where s is the circuit size), using \({\mathbb F}=\mathbb {F}_{2^k}\) one can take \(k=\mathcal{O}(\log m)=\mathcal{O}(\log t + \log s)\). One therefore needs \(\mathcal{O}(t^4(\log t+ \log s))=\mathcal{\tilde{O}}(t^4)\) bits of randomnessFootnote 1, instead of \(\mathcal{O}(t^2s)\). The number of input random bits is then independent of the circuit size s (up to some logarithmic factor). In summary, any t-private circuit in which each wire depends on at most \(\ell \) bits of randomness can be converted into a t-private circuit using roughly \(t^2 \ell \) bits of randomness via the use of robust r-wise PRGs. As written by the authors: “Improving the randomness locality \(\ell \) of private circuits would immediately yield a corresponding improvement [in the number of input random bits].”.

In [IKL+13], the authors describe an improved construction of robust PRG, based on unbalanced bipartite expander graphs. Using the Guruswami-Umans-Vadhan construction of expander graphs [GUV09], they obtain r-wise independence and resistance against \(t=r\) probes with \(r^{1+\eta }\) bits of true randomness as input, for any \(\eta >0\). In the context of the ISW countermeasure, this enables to use \(\mathcal{\tilde{O}}(t^{3+\varepsilon })\) random bits as input for any \(\varepsilon >0\), instead \(\mathcal{\tilde{O}}(t^4)\).

Our Contribution. Our main contribution is a countermeasure against side-channel attacks where the number of random bits is only \(\mathcal{\tilde{O}}(t^2)\) for security against t probes, independently of the circuit size (up to a logarithmic factor), and without using expander graphs. Moreover the running time of pseudo-random generation goes down from \(\mathcal{\tilde{O}}(t^{4})\) to \(\mathcal{\tilde{O}}(t)\). We summarize in Table 1 below the asymptotic complexities of existing techniques and our new techniques. We proceed in two steps.

In the first step, we show how to improve the locality \(\ell \) of private circuits from \(\ell =\mathcal{O}(t^2)\) down to \(\ell =\mathcal{O}(t)\). As illustrated in the third line of Table 1 below, reducing \(\ell \) from \( \mathcal{O}(t^2)\) to \(\mathcal{O}(t)\) enables to reduce the r-wise independence parameter from \(r=\mathcal{O}(t^3)\) down to \(r=\mathcal{O}(t^2)\); the number of input random bits is then now decreased from \(\mathcal{\tilde{O}}(t^4)\) to \(\mathcal{\tilde{O}}(t^3)\) with the trivial construction (and from \(\mathcal{\tilde{O}}(t^{3+\varepsilon })\) to \(\mathcal{\tilde{O}}(t^{2+\varepsilon })\) with expander graphs). Our technique is as follows. The authors of [IKL+13] obtain \(\ell =\mathcal{O}(t^2)\) by performing a mask locality refreshing at the end of each ISW multiplication gadget. Instead we modify the ISW multiplication by performing a series of internal locality refreshing. For this we consider successive \(i \times i\) ISW submatrices and perform a mask refreshing after the processing of each submatrix; these internal mask refreshing enable to bring the locality down to \(\ell =\mathcal{O}(t)\). We have also performed a formal verification of our new algorithms, using the CheckMasks tool [Cor18], for both the locality and the security properties; we provide the source code in [Cor19a]. This first step is described in Sect. 3.

In the second step, our technique consists in using multiple independent PRGs instead of a single one. This has two main advantages. The first advantage is that for ISW circuits, one can show that the robustness property of the PRG is not required anymore; this implies that we can use a very simple PRG based on polynomial evaluation as above. The second advantage is that the locality with respect to each subset of randoms generated by each PRG becomes \(\ell =\mathcal{O}(1)\). Therefore each independent PRG can be r-wise independent with a much smaller parameter \(r=\mathcal{O}(t)\) instead of \(r=\mathcal{O}(t^3)\), and therefore requires only \(r=\mathcal{O}(t)\) randoms in the finite field (since robustness is not needed). In that case, we need \(\mathcal{O}(t^2)\) independent PRGs and therefore the size of the input randomness is \(\mathcal{\tilde{O}}(t^3)\); see Line 4 of Table 1. Finally, when using internal locality refreshing as in the first step above, we only need \(\mathcal{O}(t)\) independent PRGs, and eventually the number of input random bits is reduced to \(\mathcal{\tilde{O}}(t^2)\), instead of \(\mathcal{\tilde{O}}(t^{3+\varepsilon })\) with expander graphs in [IKL+13] (see Line 5 of Table 1). We stress that this asymptotic improvement over [IKL+13] is obtained without using expander graphs, that is we can use a simple PRG based on polynomial evaluation in a finite field (see Sect. 4).Footnote 2

As mentioned previously, we found that expander graphs PRG are impractical for minimizing the amount of input randomness. However expander graphs can still be useful for optimizing the time generation of each pseudo-random; namely the output locality of an expander graph PRG (i.e., the number of inputs on which each output depends) can be at most polylogarithmic in the seed length (as opposed to linear for a PRG based on polynomial evaluation); hence in Table 1 the pseudo-random time generation is always \(\mathcal{\tilde{O}}(1)\). In Sect. 2.3 we give an example of a simple construction based on expander graph that achieves very fast pseudo-random generation, at the cost of significantly more input randomness.

Table 1. Asymptotic efficiency of various constructions. The Locality Refreshing (LR) is performed either at the end of each gadget (Line 2 and Line 4), or sequentially within each gadget (Line 3 and Line 5). The trivial construction of PRG is based on xoring \(t+1\) linear PRGs to get robustness against t probes.

Finally, we describe in Sect. 5 an application of our countermeasure to AES. We show that for AES we only need 48 bytes of randomness to get second-order security (\(t=2\)), instead of 2880 in the original Rivain-Prouff countermeasure. We see in Table 2 below that for small values of t, our construction reduces the randomness complexity of masking AES by almost 2 orders of magnitude. In Sect. 5, we also provide the results of a concrete implementation. When implemented on an ARM-based embedded device with a relatively slow TRNG, we obtain a \(50\%\) speed-up compared to Rivain-Prouff for \(t=2\). We provide the source code in C in [Cor19b]. Needless to say, we do not claim that in practice our implementation would be secure against a t-th order attack. Namely the implementation is only provided for illustrative purpose, and timing comparisons. Obtaining a secure implementation would require to (at least) carefully examine the assembly code, and perform a leakage test with concrete acquisitions from an oscilloscope.

Table 2. Number of bytes of randomness to get t-th order security for AES.

2 Definitions and Previous Work

2.1 Private Circuits

In 2003, Ishai, Sahai and Wagner [ISW03] initiated the study of securing circuits against an attacker who can probe a fraction of its wires. They showed how to transform any circuit of size |C| into a larger circuit of size \(O(|C| \cdot t^2)\) with the same functionality but secure against a t-probing adversary, based on splitting each variable x into \(n=2t+1\) shares with \(x = x_1 \oplus x_2 \oplus \cdots \oplus x_n\).

Definition 1

(Private circuit). A private circuit for \(f :\{0,1\}^{n_i} \rightarrow \{0,1\}^{n_o}\) is a triple (ICO) where \(I:\{0,1\}^{n_i} \rightarrow \{0,1\}^{\hat{n}_i}\) is a randomized input encoder, C is a randomized boolean circuit with input \({\hat{\omega }} \in \{0,1\}^{\hat{n}_i}\), output \({\hat{y}} \in \{0,1\}^{\hat{n}_o}\), and randomness \(\rho \in \{0,1\}^m\), and \(O: \{0,1\}^{\hat{n}_o} \rightarrow \{0,1\}^{n_o}\) is an output decoder, such that for any input \(\omega \in \{0,1\}^{n_i}\) we have \(\Pr [O(C(I(\omega ),\rho ))=f(\omega )]=1\), where the probability is over the randomness of I and \(\rho \).

For I and O we consider the canonical encoder and decoder: I encodes each input bit \(\omega _i\) by a vector of \(2t+1\) random bits with parity \(\omega _i\), and O takes the parity of each block of \(2t+1\) bits.

Definition 2

(t-privacy). We say that C is a t-private implementation of f with encoder I and decoder O is t-private (or t-probing secure) if for any \(\omega ,\omega ' \in \{0,1\}^{n_i}\) and any set P of t wires in C, the distributions \(C_P(I(\omega ),\rho )\) and \(C_P(I(\omega '), \rho )\) are identical, where \(C_P\) denotes the set of t values on the wires from P.

2.2 PINI and t-SNI Security

The Probe Isolating Non-Interference (PINI) security notion was introduced in [CS18] to enable easy composition of gadgets. Let n be the number of shares. We let \(x_\star =(x_i)_{i=1,\ldots n}\) be an n-sharing of x if \(x=\bigoplus _{i=1}^n x_i\). Given a subset \(I \subset [1,n]\) of share indices, we denote by \(x_{|I}:=\{x_i : i \in I \}\) the corresponding subset of shares. A gadget with m inputs and \(\ell \) outputs is a circuit with mn input shares grouped into m n-sharings denoted \((x_{\star ,1},\ldots x_{\star ,m})\), and similarly \(\ell n\) output shares denoted \((y_{\star ,1},\ldots y_{\star ,\ell })\). For a given share index i, we also use the notation \(x_{i,\star }=\{x_{i,j} : 1 \le j \le m \}\) to denote all shares with index \(1 \le i \le n\); similarly, we also write \(x_{|I,\star }=\{x_{i,\star } : i \in I\}\). Below we recall the Probe Isolating Non-Interference (PINI) definition from [CS18]; we actually use a slightly simplified (and equivalent) definition compared to [CS18]; we explain the difference in the full version of our paper [CGZ19].

Definition 3

(PINI [CS18] (adapted)). Let G be a gadget with input shares \(x_{i,\star }\) and output shares \(y_{i,\star }\) for \(1 \le i \le n\). The gadget G is PINI if for any \(t _1\in {\mathbb N}\), any set of \(t_1\) intermediate variables and any subset \(\mathcal{O}\) of output indices, there exists a subset \(I \subset [1,n]\) of input indices with \(|I| \le t_1\) such that the \(t_1\) intermediate variables and the output shares \(y_{|\mathcal{O},\star }\) can be perfectly simulated from the input shares \(x_{|I \cup \mathcal{O},\star }\).

It is straightforward to show that a PINI gadget with n shares is secure against \(t=n-1\) probes. We recall the proof of PINI composition (under our slightly modified definition) in the full version of our paper [CGZ19].

Proposition 1

(PINI security [CS18]). Any PINI gadget with n shares is \((n-1)\)-probing secure.

Proposition 2

(PINI composition [CS18]). Any composite gadget made of PINI composing gadgets is PINI.

Below we recall the SNI security notion introduced in [BBD+15]. We consider a gadget taking as input two n-tuples \((x_i)_{1\le i\le n}\) and \((y_i)_{1\le i\le n}\) of shares, and outputting a single n-tuple \((z_i)_{1 \le i \le n}\). As previously, given a subset \(I \subset [1,n]\), we denote by \(x_{|I}\) all elements \(x_i\) such that \(i \in I\).

Definition 4

(t-SNI security). Let G be a gadget taking as input n shares \((x_i)_{1 \le i \le n}\) and n shares \((y_i)_{1\le i\le n}\), and outputting n shares \((z_i)_{1 \le i \le n}\). The gadget G is said to be t-SNI secure if for any set of \(t_1\) probed intermediate variables and any subset \(\mathcal {O}^{}\) of output indices, such that \(t _1+ |\mathcal {O}^{} |\le t\), there exist two subsets \(I\) and \(J\) of input indices which satisfy \(|I| \le t_1\) and \(|J| \le t_1\), such that the \(t_1\) intermediate variables and the output variables \(z_{|\mathcal {O}^{}}\) can be perfectly simulated from \(x_{|I}\) and \(y_{|J}\).

Intuitively, the t-SNI security definition provides an “isolation” between the output shares and the input shares, so that the number of input variables required for the simulation is upper-bounded by the number of internal probes \(t_1\), and does not depend on the number of output variables that must be simulated, as long as \(t _1+ |\mathcal {O}^{}| \le t\). There is an analogous definition for a gadget with a single input \((x_i)_{1 \le i \le n}\); in that case, the simulation is performed from \(x_{|I}\) with \(|I| \le t_1\).

It is easy to see that for a single input gadget, \((n-1)\)-SNI security implies PINI security. Moreover, for a 2-input \((n-1)\)-SNI gadget as considered in Definition 4, as shown in [CS18] we can obtain a PINI gadget by pre-refreshing one of the inputs with a \((n-1)\)-SNI mask refreshing algorithm; this is the double-SNI approach (see Fig. 1). A mask refreshing gadget takes as input the n-sharing of a value x and outputs a randomized n-sharing of the same value x. Therefore, in this paper, our strategy for proving gadget security is either to directly prove the PINI property, or to first prove the t-SNI property and then apply the “double-SNI” strategy. Note that for specific circuits such as the AES SBox, one can use some optimization; for example the full SBox computation can be proven t-SNI and therefore PINI with 4 multiplications and 2 mask refreshing only (instead of 4 mask refreshing as in the naive “double-SNI” strategy).

Proposition 3

(Double-SNI [CS18]). Let G be a \((n-1)\)-SNI gadget taking as input \((a_i)_{1 \le i \le n}\) and \((b_i)_{1\le i\le n}\), and outputting \((c_i)_{1 \le i \le n}\). Let R be a \((n-1)\)-SNI gadget taking as input \((x_i)_{1 \le i \le n}\) and outputting \((y_i)_{1 \le i \le n}\). The composite gadget \(G'\) taking as input \((x_i)_{1 \le i \le n}\) and \((b_i)_{1\le i\le n}\), and outputting \((c_i)_{1 \le i \le n}\), with \(G'((x_i),(b_i))=G(R((x_i)),(b_i))\) is PINI.

Fig. 1.
figure 1

The double-SNI approach: when both gadgets G and R are \((n-1)\)-SNI, the composite gadget \(G'\) is PINI.

Finally, we recall in Appendix B the \(\mathsf {SecMult}\) gadget used in [RP10] for protecting AES against t-th order attacks. It is an extension to \(\mathbb {F}_{2^k}\) of the original ISW countermeasure [ISW03] described in \(\mathbb {F}_{2}\). The \(\mathsf {SecMult}\) gadget was proven t-SNI in [BBD+16]. We also recall in the full version of our paper [CGZ19] the mask refreshing gadget FullRefresh introduced by Duc et al. in [DDF14], based on \(\mathsf {SecMult}\); it was also proven t-SNI in [BBD+16]. We can therefore use the FullRefresh gadget to apply the above “double-SNI” strategy. Moreover, in this paper, when we describe a variant of \(\mathsf {SecMult}\), we apply the same modifications to the FullRefresh gadget; this is straightforward, since the FullRefresh gadget can be seen as a SecMult with one input equal to \((1,0,\ldots ,0)\).

2.3 r-wise Independent PRG: Definition and Construction

We recall the definition of an r-wise independent pseudo-random generator (PRG). We denote by \(U_n\) the uniform distribution in \(\{0,1\}^n\).

Definition 5

(r-wise independent PRG). A function \(G: \{0,1\}^n \rightarrow \{0,1\}^m\) is an r-wise independent pseudo-random generator if any subset of r bits of G(x) is uniformly and independently distributed when \(x \leftarrow U_n\).

We can construct an r-wise independent PRG via polynomial evaluation in a finite field \({\mathbb F}\). Letting \(\mathbf {a}=(a_0,\ldots ,a_{r-1} )\in {\mathbb F}^r\), we consider the polynomial:

$$ h_{\mathbf {a}}(x)=\sum \limits _{i=0}^{r-1} a_i x^i$$

For any \(m \le |{\mathbb F}|\), we can define the function \(G:{\mathbb F}^r \rightarrow {\mathbb F}^m\) by letting:

$$ G(\mathbf {a})=(h_{\mathbf {a}} (0),\ldots ,h_{\mathbf {a}}(m-1))$$

where we assume that we have some indexing of the field elements in \({\mathbb F}\). The function G is an r-wise independent PRG because there is a bijection between the r coefficients of a polynomial of degree at most \(r-1\) and its evaluation at r distinct points \(x_i\).

For \({\mathbb F}={\mathbb F}_{2^k}\), this gives an r-wise independent PRG taking as input rk bits and outputting at most \(k \cdot 2^k\) bits. Namely when working over \({\mathbb F}_{2^k}\) and generating k-bit pseudo-randoms, we can use each individual bit of the k-bit pseudo-random, and the PRG function remains r-wise independent. The parameter k determines the expansion factor of the PRG. For our application to AES in Sect. 5, for simplicity we will work over \({\mathbb F}_{2^{16}}\), using \({\mathbb F}_{2^{8}}\) as a subfield. For a block-cipher using single bits, one would work in \({\mathbb F}_{2^k}\) and use each of the k bits of \({\mathbb F}_{2^k}\) separately.

A Simple 3-wise Independent PRG. We also consider a very simple PRG that achieves 3-wise independence only. We consider a set of 2d random bits \(x_i\) and \(y_i\) for \(1 \le i \le d\). We define the following function \(G: \{0,1\}^{2d} \rightarrow \{0,1\}^{d^2}\):

$$ G(x_1,\ldots ,x_d,y_1,\ldots ,y_d)=(x_i \oplus y_j)_{1 \le i,j \le d}$$

The function G can be seen as a PRG based on expander graph; see the full version of our paper [CGZ19].

Lemma 1

The function G is a 3-wise independent PRG.

Proof

We must show that any 3 variables \((x_{i_1} \oplus y_{j_1})\), \((x_{i_2} \oplus y_{j_2})\) and \((x_{i_3} \oplus y_{j_3})\) are uniformly and independently distributed.

We distinguish 3 cases. If \(\#\{i_1,i_2,i_3\}=3\), then the three values are independent thanks to randoms \(x_{i_1}\), \(x_{i_2}\) and \(x_{i_3}\). If \(i_1 = i_2 = i_3\), then we must have \(\#\{j_1,j_2,j_3\}=3\) and the three values are independent thanks to randoms \(y_{j_1}\), \(y_{j_2}\) and \(y_{j_3}\). Eventually, if exactly two indices among \(i_1\), \(i_2\) and \(i_3\) are equal, say wlog \(i_1 = i_2 \ne i_3\), then we must have \(j_1 \ne j_2\) and the randoms \(y_{j_1}\), \(y_{j_2}\) and \(x_{i_3}\) ensure the independence of the three values.   \(\square \)

2.4 Robust PRG: Definition and Trivial Construction

In [IKL+13], the authors introduced the notion of robust pseudo-random number generator (PRG), which should remain secure even if an adversary can probe at most k intermediate variables in the PRG circuit. We recall the definition of (strongly) robust PRG from [IKL+13] below. Under this definition, the output bits of the PRG must remain r-wise independent outside some set T of bounded size, conditioned on the values of any set S of at most k probes in the PRG circuit and the outputs in T.

In this paper we actually use a slightly weaker definition of strong robustness compared to [IKL+13], in which we allow the output bits outside the set T to be only \((r-q|S|)\)-wise independent, instead of r-wise independent, where \(|S| \le k\) is the number of probes and q a parameter. In other words, we allow the r-wise independence of the PRG to degrade gracefully with the number of probes. This will give slightly more efficient constructions; in particular, the trivial construction of xoring \(k+1\) PRGs will only require the r-wise independence of each PRG, instead of the \((r+k)\)-wise independence in [IKL+13]. Obviously we need to ensure that a robust PRG under our definition can still be embedded in a private circuit with the same parameters as in [IKL+13]; see Theorem 1 below.

Definition 6

(Strong robust PRG [IKL+13] (adapted)). A circuit implementation C of a PRG \(G:\{0,1\}^n \rightarrow \{0,1\}^m\) is strong (rkq)-robust if given \(Y=G(X)\) where \(X \leftarrow \{0,1\}^n\), for any set S of at most k probes in C, there is a set T of at most q|S| output bits such that conditioned on any fixing of the values \(C_S\) of the wires in S and of \(Y_T\), the values \(Y_{\bar{T}}\) of the output bits not in T are \((r-q|S|)\)-wise independent and uniformly distributed.

Trivial Construction. As noted in [IKL+13], we can obtain a strong (rk, 1)-robust PRG by taking the xor of \(k+1\) PRGs, each with the r-wise independence property. More precisely, letting \(g :\{0,1\}^n \rightarrow \{0,1\}^m\), we let \(G : \{0,1\}^{n \cdot (k+1)} \rightarrow \{0,1\}^m\):

$$ G(x_1,\ldots ,x_{k+1})=g(x_1) \oplus g(x_2) \oplus \cdots \oplus g(x_{k+1})$$

where the xors are performed from left to right.

Lemma 2

(Strong robustness of G). If g is an r-wise independent PRG, then G is a strong (rk, 1)-robust PRG.

Proof

Since there are at most k probes and \(k+1\) PRGs, there exists an index \(i^\star \) such that \(g(x_{i^\star })\) has not been probed. In the following, we fix all inputs \(x_i\) except \(x_{i^\star }\).

Let \( t\le k\) be the number of probes. We consider the set T of indices \(j \in [1,m]\) such that the j-th bit of any partial sum \(g(x_1) \oplus \dots \oplus g(x_i)\) is probed. We must have \(|T| \le t\). Since g is an r-wise independent PRG, by definition any set of r output bits of \(g(x_i^\star )\) is uniformly and independently distributed; this implies that any set of \(r-t\) output bits of \(g(x_i^\star )\) with indices outside T are uniformly and independently distributed, even conditioned on the output bits in T and the other probes. Since we have fixed the inputs of all other PRGs, this also applies for the output of G. Therefore G is a strong (rk, 1)-robust PRG.   \(\square \)

Expander Graph Construction. Using an explicit construction of a bipartite expander graph [GUV09], the authors of [IKL+13] obtain a construction of a strong (rkq)-robust PRG with \(r,k=n^{1-\eta }\) where n is the number of random input bits, for any \(\eta >0\). In the full version of our paper [CGZ19] we provide a simplified proof of strong robustness for expander graph based PRG, based on the proof of weak robustness from [IKL+13]. We also argue that for minimizing the amount of input randomness, while asymptotically better than the trivial construction, expander graph based constructions are actually impractical. Namely in our analysis the expander graph PRG construction based on [GUV09] becomes better than the trivial construction only for \(r \ge 2^{18}\) and at least \(2^{36}\) random input bits.

2.5 Application to Private Circuits

We recall below the main theorem from [IKL+13], showing that we can plug a robust PRG in a private circuit to generate all randomness from a small random seed, and the resulting construction remains secure against probing attacks. Firstly an important parameter is the locality \(\ell \) of the randomness in the circuit.

Definition 7

(Randomness locality [IKL+13]). A circuit C is said to make an \(\ell \)-local use of its randomness if the value of each of its wires is determined by its (original, unmasked) input and at most \(\ell \) bits of the randomness used in the circuit.

Theorem 1

(Private circuit with PRG [IKL+13] (adapted)). Suppose \(C(\hat{\omega },\rho )\) is a qk-private implementation of f with encoder I and decoder O, where C makes an \(\ell \)-local use of its randomness, and uses at most m bits of randomness. Let \(G:\{0,1\}^n \rightarrow \{0,1\}^m\) be a strong (rkq)-robust linear PRG with \(r \ge k \cdot \max (\ell ,q)\). Then, the circuit \(C'\) defined by \(C'(\hat{\omega },\rho ')= C(\hat{\omega },G(\rho '))\) is a k-private implementation of f with encoder I and decoder O which uses n random bits.

The proof of Theorem 1 is based on showing that the view of any adversary who attacks with t probes an implementation in which the randomness is generated by a PRG, can be simulated given the view of an adversary with at most qt probes who attacks an implementation with a true source of randomness; see Fig. 3 for an illustration.

In the full version of our paper [CGZ19] we provide a proof that is essentially the same as in [IKL+13, Theorem 30], except that we use our slightly weaker definition of robustness. We recall the main steps of the proof below. We start with the following Lemma, which is similar to [IKL+13, Lemma 29]. As illustrated in Fig. 2, any output of at most \(r-q|S|\) bits of the robust PRG can be replaced by a TRNG and any set S of at most k probes in the PRG can be perfectly simulated using a subset T of the output with \(|T| \le q|S|\). This means that probing |S| probes within the PRG is not better for the adversary than probing q|S| outputs of the TRNG. To simplify notation, we will use G to denote both the function computed by a robust PRG and its circuit implementation. For a set S of k wires in G, we denote by \(G_S\) the value of these wires; similarly, for a subset T of output bits of G, we denote by \(G_T\) the values of these output bits.

Fig. 2.
figure 2

With a strong (rkq)-robust PRG, any output of at most \(r-q|S|\) bits of the PRG can be replaced by a TRNG and any set S of at most k probes can be perfectly simulated using a subset T of the output with \(|T| \le q|S|\).

Lemma 3

(Robust PRG). Let \(G :\{0,1\}^n \rightarrow \{0,1\}^m\) be a strong (rkq)-robust linear PRG with \(r \ge kq\). Let S be any set of at most k wires in G. Let \(L \subset [m]\) be any subset of \(r-q|S|\) bits. There exists a subset T with \(|T| \le q|S|\) such that the distribution of \(Y=G_{L \cup T}(X)\) is uniform in \(\{0,1\}^{|L \cup T|}\) when \(X \leftarrow \{0,1\}^n\) and moreover \(G_S(X)\) can be efficiently simulated given \(Y_T\) only.

Thanks to Lemma 3 we can now prove Theorem 1. As illustrated in Fig. 3, we can simulate any t probes within the PRG with a simulator SIM that uses qt random bits from the TRNG (see Fig. 2); these qt random bits can actually be queried by probing the original circuit C. This shows that when probing the PRG in \(C'\) the adversary does not learn more than by probing the circuit C with true randomness, as required; see the full version of our paper [CGZ19] for the details.

Fig. 3.
figure 3

Security proof when plugging a PRG into a private circuit.

2.6 Locality Refreshing

As recalled in Theorem 1, the r-wise independence parameter r of the PRG depends on the randomness locality \(\ell \) of the circuit (see Definition 7). The goal is therefore to minimize the parameter \(\ell \). In the original ISW construction, the parameter \(\ell \) would grow linearly with the circuit size; namely some wires can depend on almost all the randoms used in the circuit. To keep a small \(\ell =\mathcal{O}(t^2)\), the authors of [IKL+13] use a mask refreshing at the end of each ISW gadget. Such locality refreshing, that we denote by \(\mathsf{LR}\), proceeds as described in Algorithm 1; see Fig. 4 for an illustration.

figure a

At the end of the algorithm, we have \(y_i=s_i\) for all \(1 \le i \le n-1\), and \(y_n= x \oplus s_1 \oplus \cdots \oplus s_{n-1}\) for the secret \(x=x_1 \oplus \cdots \oplus x_n\). Therefore one can show recursively over the circuit that the internal variables of the ISW multiplication depend on at most \(\ell =\mathcal{O}(t^2)\) randoms, and this actually holds for any variable in the circuit. The following Lemma shows that the LR gadget is PINI, so that it can be included in a circuit without degrading its security.

Fig. 4.
figure 4

Locality refreshing algorithm.

Lemma 4

(PINI security of LR). Let \((x_i)_{1 \le i \le n}\) be the input shares of the mask refreshing Algorithm LR. For any \(t \in {\mathbb N}\), any set of t intermediate variables and any subset \(\mathcal {O}^{}\) of output indices, there exists a subset \(I \subset [1,n]\) of indices such the t intermediate variables and the output shares \(y_{|\mathcal {O}^{}}\) can be perfectly simulated from the input shares \(x_{|I \cup \mathcal {O}^{}}\), with \(|I| \le t\).

Proof

We consider the following simple gadget G: \((x_1,x_n) \rightarrow (s_1, x_n \oplus (x_1 \oplus s_1))\), where \(s_1\) is a random value. We start by showing that in Gadget G, we can always simulate t probes and \(|\mathcal {O}^{}|\) output variables from the input shares \(x_{|I \cup \mathcal {O}^{}}\), with \(|I| \le t\).

If \(t + |\mathcal {O}^{}| \ge 2\), we can let \(I=\{1,n\} \setminus \mathcal {O}^{}\) which gives \(I \cup \mathcal {O}^{}=\{1,n\}\) and all variables can be simulated from the input shares \(x_{|I \cup \mathcal {O}^{}}\). Moreover we have \(|I| = |\{1,n\} \setminus \mathcal {O}^{}| \le 2 - |\mathcal {O}^{}| \le t\). If \(t + |\mathcal {O}^{}|=1\), we distinguish two cases. If \(|\mathcal {O}^{}|=1\) and \(t=0\), then we can simulate either \(s_1\) or \(x_n \oplus (x_1 \oplus s_1)\) by generating a random value. If \(t=1\) and \(|\mathcal {O}^{}|=0\), we can simulate \(x_1\) or \(x_n\) with \(I=\{1\}\) or \(I=\{n\}\); the other variables can be simulated by a random value.

We now consider the following gadget \(G_i\) for \(1 \le i \le n-1\):

$$G_i: (x_1,\dots , x_i, \dots , x_n) \rightarrow (x_1, \dots , x_{i-1},s_i,x_{i+1}, \dots , x_n \oplus (x_i \oplus s_i))$$

which is similar to Gadget G, but with n input shares instead of 2, and \(n-2\) unmodified input shares. As previously, we can always simulate t probes and \(|\mathcal {O}^{}|\) output variables from the input shares \(x_{|I \cup \mathcal {O}^{}}\), with \(|I| \le t\). This implies that the gadget \(G_i\) is PINI. Since the LR gadget is the composition of \(G_1,\ldots ,G_{n-1}\), from Proposition 2 the LR gadget is also PINI.   \(\square \)

In [IKL+13] the LR algorithm is then applied after each ISW gadget. In particular, for the SecMult gadget recalled in Appendix B, we obtain the following \({\mathsf{SecMultFLR}}\) gadget. Since the original SecMult is t-SNI, the \(\mathsf{SecMultFLR}\) gadget is also t-SNI. The same LR algorithm is applied after the \(\mathsf{Xor}\) gadget and the \(\mathsf{FullRefresh}\) gadgets (see the full version of our paper [CGZ19]).

figure b

Application to Private Circuits. We recall Claim 31 and Corollary 32 from [IKL+13]; we also recall the proof in the full version of our paper [CGZ19]. We use the notation \(f(\lambda )=\mathcal{\tilde{O}}(g(\lambda ))\) if \(f(\lambda )=\mathcal{O}(g(\lambda ) \log ^k \lambda )\) for some \(k \in {\mathbb N}\). We assume that the circuit size \(s(\lambda )\) and the number of probes \(t(\lambda )\) are both polynomial in the security parameter \(\lambda \).

Lemma 5

(Private circuit with PRG [IKL+13]). Any function f with circuit size s admits a t-private implementation (ICO) with the canonical encoder I and decoder O, where C uses \(\mathcal{O}(t^2s)\) random bits and makes an \(\ell =\mathcal{O}(t^2)\)-local use of its randomness. Consequently, f admits a t-private implementation \((I,C',O)\), where \(C'\) uses \(\mathcal{\tilde{O}}(t^4)\) bits of randomness, and runs in time \(\mathcal{\tilde{O}}(t^6 s)\), using the trivial construction. Using the expander graph construction, for any \(\varepsilon >0\), it uses \(\mathcal{\tilde{O}}(t^{3+\varepsilon })\) random bits and runs in time \(\mathcal{\tilde{O}}(t^2s)\).

2.7 Composing \(\ell \)-local Gadgets

In this section we provide an explicit definition of locality for a gadget, so that the locality property can be composed over a full circuit (as for the PINI definition for security against probing). As in [IKL+13], the basic technique is to perform a locality refresh (such as Algorithm 1) of the output of each gadget. We say that a set of wires \((y_i)_{1 \le i \le n}\) is locality refreshed if \(y_i=s_i\) for all \(1 \le i \le n-1\), for randoms \(s_i\), and \(y_n=y \oplus s_1 \oplus \cdots \oplus s_{n-1}\), where y is the original unmasked variable. In the definition below of gadget locality, we take into account the randomness of the (locality refreshed) inputs.

Definition 8

(\(\ell \)-local gadget). Let G be a gadget whose output is locality refreshed. Consider the circuit C where G is given locality refreshed inputs \(x_{\star ,\star }\). Let \(\rho \) be the randomness used by C, including the randomness from the inputs. The gadget G is said to make an \(\ell \)-local use of its randomness if C makes an \(\ell \)-local use of its randomness \(\rho \).

Theorem 2

(Composition of \(\ell \)-local gadgets). Any composite gadget made of \(\ell \)-local gadgets is \(\ell \)-local.

Proof

We consider m gadgets \(G_1, \cdots , G_m\) that we order as a direct acyclic graph from output to input in a reverse topological sort order. We assume that each gadget \(G_i\) makes an \(\ell \)-local use of its randomness, with locality refreshed outputs. We prove by recurrence on n that the composition of \(\ell \)-local gadgets is \(\ell \)-local.

If \(n=1\), then there is only one gadget and this is straightforward since by assumption the gadget is \(\ell \)-local. Now we assume that the composition of gadgets \(G_1, \cdots , G_n\) is \(\ell \)-local and we prove that the composition of gadgets \(G_1, \cdots , G_{n+1}\) is still \(\ell \)-local. Since the composition of gadgets \(G_1, \cdots , G_n\) is \(\ell \)-local, and since by definition the inputs of the gadget \(G_n\) are locality refreshed because they correspond to outputs of Gadget \(G_{n+1}\) which are locality refreshed, we get that the composition of both parts \(G_{n+1}\) and \(G_1, \cdots , G_n\) does not increase the global locality. Namely, the global locality corresponds to the maximum locality between both parts. Since the composition of gadgets \(G_1, \cdots , G_n\) is \(\ell \)-local and since Gadget \(G_{n+1}\) is also \(\ell \)-local, the maximum locality is \(\ell \) and the composition of gadgets \(G_1, \cdots , G_{n+1}\) is \(\ell \)-local.   \(\square \)

In the above definition, in order to determine the locality \(\ell \) of a gadget, we must therefore assume that it receives locality refreshed inputs, and the randomness from this locality refreshed inputs must be taken into account when computing \(\ell \). Below we provide an example with the \(\mathsf{Xor}\) gadget; the Xor gadget takes as input \(a_i\) and \(b_i\) for \(1 \le i \le n\), and returns \(c_i= a_i \oplus b_i\) for all \(1 \le i \le n\).

Lemma 6

(Locality of Xor). The Xor gadget followed by a locality refresh makes an \(\ell \)-local use of its randomness, with \(\ell =2(n-1)\).

Proof

The gadget takes as input \(a_i\) and \(b_i\) for \(1 \le i \le n\), and then computes \(c_i= a_i \oplus b_i\) for all \(1 \le i \le n\), and finally \(d_{n,j}=c_n \oplus (\oplus _{i=1}^j a_i \oplus b_i \oplus s_i)\) for \(1 \le j \le n-1\), with outputs \(d_i=s_i\) for \(1 \le i \le n-1\) and \(d_n=d_{n,n-1}\). We must consider \(a_i=s^{(a)}_i\) for \(1 \le i \le n-1\) and \(a_n=a \oplus s^{(a)}_1 \oplus \cdots \oplus s^{(a)}_{n-1}\), and similarly for \(b_i\). Therefore \(c_n\) depends on \(2(n-1)\) randoms, while \(d_{n,j}\) depends on \(2(n-1)-j\) randoms, which proves the lemma.   \(\square \)

We also compute the concrete locality \(\ell \) of the \(\mathsf{SecMultFLR}\) algorithm introduced above; in [IKL+13] only the asymptotic bound \(\ell =\mathcal{O}(n^2)\) was proved. Such concrete locality computations will be important when implementing the countermeasure for AES in Sect. 5; namely for a locality \(\ell \), from Theorem 1 the r-wise independence parameter of the PRG must be set to \(r=\ell t\) for security against t probes. We refer to the full version of our paper [CGZ19] for the proof.

Lemma 7

(Locality of \(\mathsf{SecMultFLR}\)). The SecMult algorithm followed by a final locality refresh (\(\mathsf{SecMultFLR}\)) is an \(\ell \)-local gadget with \(\ell =n^2/4+5n/2-c\), where \(c=3\) for even n, and \(c=11/4\) for odd n.

3 Improving the Locality of the Multiplication Gadget

In this section we describe two variants of the SecMult algorithm that improve the randomness locality of t-private circuits from \(\ell =\mathcal{O}(t^2)\) to \(\ell =\mathcal{O}(t)\). We show that this decreases the randomness complexity of private circuits from \(\mathcal{\tilde{O}}(t^4)\) to \(\mathcal{\tilde{O}}(t^3)\) using the trivial robust PRG construction. For our two new algorithms \(\mathsf {SecMultILR}\) and \(\mathsf {SecMultILR}2\), we summarize in Table 3 below the number of required randoms and their locality \(\ell \). Since these randoms are eventually generated by a PRG, one should minimize their locality \(\ell \). We introduce \(\mathsf {SecMultILR}\) first because the t-SNI proof of \(\mathsf {SecMultILR}2\) is significantly more complex.

Table 3. Summary of the multiplication gadgets, their locality and security. We have \(c=3\) for even n, and \(c=11/4\) for odd n.

3.1 First Construction with Internal Locality Refreshing (\(\mathsf {SecMultILR}\))

We describe below a variant of the \(\mathsf{SecMultFLR}\) algorithm with locality \(\ell =\mathcal{O}(t)\) instead of \(\ell =\mathcal{O}(t^2)\). Our new \(\mathsf {SecMultILR}\) is described below. The idea is to process the ISW matrix differently. In the original SecMult the final encoding is obtained by summing over all rows of the \(n \times n\) ISW matrix. Instead we compute the partial sums over the rows of the successive \(j \times j\) submatrices for \(2 \le j \le n\). At each step we perform a locality refreshing of the j shares of the partial sum. In particular, the output of the algorithm is locality refreshed, so there is no need to apply the LR algorithm again.

figure c

We see that lines 6 to 9 are the same as in the original SecMult (see Appendix B), except that they are processed in a different order, since the loop starts with j instead of i. This implies that at Step 10 we have processed the \(j \times j\) submatrix of the ISW matrix, and therefore the first j shares \(c_i\) must satisfy the equality:

$$\begin{aligned} c_1 \oplus \cdots \oplus c_j=(a_1 \oplus \cdots \oplus a_j) \cdot (b_1 \oplus \cdots \oplus b_j) \end{aligned}$$
(2)

From lines 11 to 15 we then perform a locality refresh of these j shares \((c_i)_{i=1}^j\) using new randoms \(s_{ij}\); therefore after the locality refresh the new shares \(c_i\) satisfy the same equality (2), but now they only depend on the \(j-1\) randoms \(s_{ij}\) for \(1 \le i \le j-1\), and not on the \(r_{ij}\)’s. This implies that at the next step of the loop (for index \(j+1\)), the shares \(c_i\) will only depend on a linear number of randoms \(r_{ij}\), instead of quadratic in the original SecMult. Thanks to these internal locality refreshings, the new locality parameter becomes \(\ell =\mathcal{O}(t)\) instead of \(\ell =\mathcal{O}(t^2)\).

Lemma 8

(Locality of \(\mathsf {SecMultILR}\)). The \(\mathsf {SecMultILR}\) algorithm is an \(\ell \)-local gadget with \(\ell =4n-5\) for \(n \ge 3\).

Theorem 3

(Completeness of \(\mathsf {SecMultILR}\)). The \(\mathsf {SecMultILR}\) algorithm, when taking \(a_1,\ldots ,a_n\) and \(b_1,\ldots ,b_n\) as inputs, outputs \(c_1,\ldots ,c_n\) such that \(c_1 \oplus \cdots \oplus c_n= (a_1 \oplus \ldots \oplus a_n) \cdot (b_1 \oplus \ldots \oplus b_n)\).

Theorem 4

(t-SNI of \(\mathsf {SecMultILR}\)). The \(\mathsf {SecMultILR}\) algorithm is t-SNI for any \(1 \le t \le n-1\).

One can therefore use a robust PRG with r-wise independence parameter \(r= \ell \cdot t=\mathcal{O}(t^2)\) instead of \(r=\mathcal{O}(t^3)\) in [IKL+13]. With the trivial construction of xoring \(t+1\) PRGs, the number of input randoms in the finite field becomes \(r \cdot (t+1)=\mathcal{O}(t^3)\) instead of \(\mathcal{O}(t^4)\). This gives the following lemma, which improves over Lemma 5 from [IKL+13].

Lemma 9

(Efficiency properties of \(\mathsf {SecMultILR}\)). Any function of circuit size s admits a t-private implementation (ICO) with the canonic encoder I and decoder O, where C uses \(\mathcal{\tilde{O}}(t^3)\) bits of randomness using the trivial construction, and runs in time \(\mathcal{\tilde{O}}(s \cdot t^5)\).

3.2 Second Construction with Less Randomness (\(\mathsf {SecMultILR}2\))

We describe in the full version of our paper [CGZ19] a variant called \(\mathsf {SecMultILR}2\) of the previous algorithm, that achieves the same locality \(\ell \) as \(\mathsf {SecMultILR}\) but with roughly half as many randoms. It uses the same number of randoms as \(\mathsf{SecMultFLR}\) from [IKL+13], but with locality \(\mathcal{O}(t)\) instead of \(\mathcal{O}(t^2)\). Therefore it is strictly better than both \(\mathsf{SecMultFLR}\) and \(\mathsf {SecMultILR}\); see Table 3.

Lemma 10

(Locality of \(\mathsf {SecMultILR}2\)). The SecMultILR2 gadget uses \(\ell \)-local randomness, with \(\ell = 4n-6\) for \(n \ge 3\).

Theorem 5

(Completeness of \(\mathsf {SecMultILR}2\)). The \(\mathsf {SecMultILR}2\) algorithm, when taking \(a_1,\ldots ,a_n\) and \(b_1,\ldots ,b_n\) as inputs, outputs \(c_1,\ldots ,c_n\) such that \(c_1 \oplus \cdots \oplus c_n= (a_1 \oplus \ldots \oplus a_n) \cdot (b_1 \oplus \ldots \oplus b_n)\).

Theorem 6

(t-SNI of \(\mathsf {SecMultILR}2\)). The \(\mathsf {SecMultILR}2\) is t-SNI for any \(1 \le t \le n-1\).

3.3 Formal Verification of Locality and Security

We have performed a formal verification of the above locality and security lemmas, using the CheckMasks tool [Cor18]. We refer to the full version of this paper [CGZ19] for the details.

4 Private Circuits with Multiple PRGs Without Robustness

In the previous section we have described two variants of \(\mathsf{SecMult}\) where following the [IKL+13] paradigm a single robust PRG is used to generate all the randoms from the circuit; by improving the locality parameter from \(\ell =\mathcal{O}(t^2)\) to \(\ell =\mathcal{O}(t)\), we have decreased the number of input random bits from \(\mathcal{\tilde{O}}(t^4)\) to \(\mathcal{\tilde{O}}(t^3)\), that is independent of the circuit size s (up to logarithmic factors). In this section, we show that by using multiple independent PRGs instead of a single one, the robustness property of the PRG is not required anymore, and therefore much more efficient PRG constructions can be used; this allows to decrease the randomness complexity of private circuits down to \(\mathcal{\tilde{O}}(t^2)\).

We start with a simple observation. In the security proof of ISW, if the attacker probes a given random \(r_{ij}\) in some SecMult gadget, then it is easy to see that we could give away to the attacker not only the probed \(r_{ij}\), but actually all randoms \(r^{(k)}_{ij}\) for the same ij in all other SecMult gadgets k; namely in the ISW security proof with global index I, one would have \(i \in I\), and therefore each \(r^{(k)}_{ij}\) would then be simulated by letting \(r^{(k)}_{ij} \leftarrow {\mathbb F}\) as in the original circuit, so it could be given to the attacker without requiring the knowledge of more input shares.

Now assume that for every pair (ij) we use an independent PRG to generate the randoms \(r^{(k)}_{ij}\) for all gadgets k. In that case the attacker has no advantage in probing the intermediate variables of the PRG circuit, since in our extended probing model he could get all corresponding randoms \(r^{(k)}_{ij}\) with a single probe anyway. Therefore when each \(r_{ij}\) has a dedicated PRG (see Fig. 5 for an illustration), the robustness property of the PRG is not required anymore, and we can use a simple PRG with r-wise independence only, as for example the PRG based on polynomial evaluation from Sect. 2.3.

Fig. 5.
figure 5

In Construction 1, each \(r_{ij}\) has its dedicated PRG across all gadgets, from a random seed \(\rho '_{ij}\).

Moreover, if a mask locality refreshing is performed at the end of each multiplication gadget, it is easy to see that any intermediate variable of the circuit can depend on at most a single random \(r_{ij}^{(k)}\) for a fixed ij, and therefore the locality with respect to each randomness subset \(\rho _{ij}=\{r_{ij}^{(k)} : 1 \le k \le s \}\) is \(\ell =1\); this is because the locality refresh at the end of each multiplication gadget cancels the dependence on the internal \(r_{ij}^{(k)}\). In that case, with t probes on intermediate variables the adversary can get information on at most t randoms within such set. Therefore these randoms can be generated by a PRG with r-wise independence parameter \(r=t\). Since the robustness property is not required, we can use a PRG based on polynomial evaluation that requires only \(r=t\) coefficients in a finite field, and therefore \(\mathcal{\tilde{O}}(t)\) random bits per PRG. Since there are \(\mathcal{O}(t^2)\) randoms \(r_{ij}\), we need \(\mathcal{O}(t^2)\) independent PRGs to generate all of them, and the total number of input random bits is therefore \(\mathcal{\tilde{O}}(t^3)\), as in our single PRG constructions from Sect. 3. Note that the time to generate a pseudo-random is now \(\mathcal{\tilde{O}}(t)\), instead of \(\mathcal{\tilde{O}}(t^3)\) in Sect. 3.

We can improve the above randomness complexity as follows. Firstly, we observe as previously that in the security proof of ISW, whenever the attacker probes a random \(r_{ij}\), we can actually give to the attacker the complete row of \(r_{ij}\)’s, that is for a given i, all \(r_{ij}\) with \(i<j \le n\); and more generally, for a fixed i, all randoms \(r^{(k)}_{ij}\) with \(i <j \le n\) in all SecMult gadgets k. Therefore as previously we can use for each \(1 \le i <n\) a dedicated PRG to generate all \(r^{(k)}_{ij}\) for all \(i <j \le n\) in all gadgets k, without needing the robustness property. Since we generate the complete row of \(r_{ij}\)’s (see Fig. 8 for an illustration), we only need \(\mathcal{O}(t)\) independent PRGs, instead of \(\mathcal{O}(t^2)\).

Moreover, if we perform internal mask refreshing as in the \(\mathsf {SecMultILR}\) algorithm from Sect. 3 (instead of only at the end of the SecMult gadget), then no intermediate variable can depend on two distinct \(r_{ij}\)’s in the same row i. This implies that the locality with respect to the randomness subset \(\rho _{i}= \{r^{(k)}_{ij} : i < j \le n, 1 \le k \le s \}\) is still equal to 1. Therefore a PRG can be used to generate all \(r^{(k)}_{ij}\) from a given row i in all gadgets k, still with r-wise independence parameter \(r=t\). Since we need only \(\mathcal{O}(t)\) independent PRGs instead of \(\mathcal{O}(t^2)\) previously, the number of input random bits goes down to \(\mathcal{\tilde{O}}(t^2)\), while the time to generate a pseudo-random is still \(\mathcal{\tilde{O}}(t)\). Asymptotically this is the most efficient technique (see Table 1), and also the most efficient in practice (see Sect. 5 for our implementation results on AES).

4.1 Security with Multiple PRGs

The following lemma shows that the PRG robustness is not needed when the PRG generates only a subset \(\rho \) of the randomness, and the adversary can get \(\rho \) with a single probe; the lemma is analogous to Theorem 1 for a single robust PRG. We first consider a circuit C where we split the randomness in two parts \(\rho \) and \(\bar{\rho }\), where only the randomness \(\rho \) will be replaced by pseudo-randoms. We consider an extended security model in which the attacker can get \(\rho \) with a single probe. Intuitively probing the PRG that generates \(\rho \) does not help the attacker, since in the extended security model he can get \(\rho \) with a single probe.

Lemma 11

(Security from r-wise independent PRG). Suppose C is a t-private implementation of f with encoder I and decoder O, where \(C(\hat{\omega },\rho ,\bar{\rho })\) uses m random bits \(\rho \) and makes an \(\ell \)-local use of its randomness \(\rho \), and the adversary can obtain \(\rho \) with a single probe. Let \(G:\{0,1\}^{n_r} \rightarrow \{0,1\}^m\) be a linear \(\ell t\)-wise independent PRG. Then, the circuit \(C'\) defined by \(C'(\hat{\omega },\rho ',\bar{\rho })= C(\hat{\omega },G(\rho '),\bar{\rho })\) is a t-private implementation of f with encoder I and decoder O which uses \(n_r\) random bits \(\rho '\) and random \(\bar{\rho }\).

Proof

We show that the view of an adversary \(A'\) who attacks \(C'(\hat{\omega },\rho ',\bar{\rho })\) by probing a set S of \(t' \le t\) wires in G and a set of P of \(t-t'\) wires in C is independent of the secret input \(\omega \). Since C is t-private, it suffices to show that the view of \(A'\) can be simulated given the view of an adversary A who probes at most t wires in \(C(\hat{\omega },\rho ,\bar{\rho })\), and who can obtain the randomness \(\rho \) with a single probe.

Since C makes an \(\ell \)-local use of its randomness \(\rho \), the \(t-t'\) probes from the set P in the circuit C can depend on at most \(\ell (t-t') \le \ell t\) bits of \(\rho \). More precisely, for any \({\hat{\omega }}\) and \(\bar{\rho }\), let \(Q_{{\hat{\omega }},\bar{\rho }}(\rho )=C_P(\hat{\omega },\rho ,\bar{\rho })\) be the value of these probes; the function \(Q_{{\hat{\omega }},\bar{\rho }}\) depends on at most \(\ell t\) bits of \(\rho \). Let \(T \subset [1,m]\) be the corresponding subset of bits of \(\rho \) on which \(Q_{{\hat{\omega }},\bar{\rho }}\) depends, with \(|T| \le \ell t\); we can write \(Q_{{\hat{\omega }},\bar{\rho }}(\rho )=Q'(\rho _T)\), where \(\rho _T\) is the corresponding subset of \(\rho \).

We now proceed as follows. Instead of generating the PRG seed \(X \leftarrow \{0,1\}^{n_r}\) and then the PRG output \(G_T(X)\) corresponding to T, we can first generate the PRG output \(\rho _T \leftarrow \{0,1\}^{|T|}\) and then sample the PRG seed; this is possible because G is a linear \(\ell t\)-wise independent PRG, and moreover \(|T| \le \ell t\). More precisely, since G is a linear \(\ell t\)-wise PRG, there exists a randomized simulator Sim that can perfectly sample the PRG input and therefore the probes within the PRG, given at most \(\ell t\) bits of PRG output; formally this means \((G_S(X),G_T(X)) \equiv (\mathsf{Sim}(\rho _T),\rho _T)\) where \(X \leftarrow \{0,1\}^{n_r}\) and \(\rho \leftarrow \{0,1\}^m\). We obtain:

$$(G_S(X),Q'(G_T(X))) \equiv (\mathsf{Sim}(\rho _T),Q'(\rho _T)) $$

We now distinguish two cases. If the number of probes within the PRG is such that \(t' \ge 1\), we let \(\mathsf{Sim}'(\rho _T,v)=(\mathsf{Sim}(\rho _T),v)\) and we obtain:

$$ (G_S(X),Q'(G_T(X))) \equiv (\mathsf{Sim}(\rho _T),Q'(\rho _T)) \equiv \mathsf{Sim}'(\rho _T,Q'(\rho _T)) $$

which gives \((G_S(X),Q_{I(\omega ),{\bar{\rho }}}(G(X))) \equiv \mathsf{Sim}'(\rho _T,Q_{I(\omega ),{\bar{\rho }}}(\rho )).\) In this case, the distribution to which Sim’ is applied captures the view of an adversary A who corrupts a set \(T \cup P\) of wires in C, where \(|P| \le t-t'\) and by definition \(\rho _T\) can be obtained with a single probe, which gives a total of at most \(t-t'+1 \le t\) probes in C. Since by assumption C is t-private, this view is independent of the secret \(\omega \). Since the distribution on the left hand side captures the view of \(A'\), it follows that the view of \(A'\) is also independent of \(\omega \), as required.

In the second case, G is not probed by the adversary \(A'\). Since G is \(\ell t\)-wise independent and the view of \(A'\) depends on at most \(\ell t\) bits of \(\rho \), the view of \(A'\) is the same as the view of an adversary A probing the same wires in C. More precisely, we have from \(G_T(X) \equiv \rho _T\):

$$ Q_{I(\omega ),{\bar{\rho }}}(G(X)) \equiv Q_{I(\omega ),{\bar{\rho }}}(\rho ) $$

As previously, the right hand side corresponds to the view of an adversary A who corrupts a set P of at most t wires in C and the distribution of the left hand side captures the view of \(A'\); therefore the view of \(A'\) is independent of \(\omega \) also in the second case.   \(\square \)

We now consider the main theorem where the circuit randomness \(\rho \) can be split into \((\rho _i)_{i=1}^k\), and when considering each \(\rho _i\) separately, the circuit C makes an \(\ell \)-local use of \(\rho _i\); moreover we assume that C remains t-private even if the adversary can obtain each \(\rho _i\) with a single probe. The proof follows from a recursive application of Lemma 11.

Theorem 7

(Security with multiple PRGs). Suppose C is a t-private implementation of f with encoder I and decoder O, where the circuit \(C(\hat{\omega },\rho _1,\ldots ,\rho _k)\) uses for each \(1 \le i \le k\), m random bits \(\rho _i\), and makes an \(\ell \)-local use of \(\rho _i\), and the adversary can obtain each \(\rho _i\) with a single probe. Let \(G:\{0,1\}^{n_r} \rightarrow \{0,1\}^m\) be a linear \(\ell t\)-wise independent PRG. Then, the circuit \(C'\) defined by \(C'(\hat{\omega },\rho '_1,\ldots ,\rho '_k)= C(\hat{\omega },G(\rho '_1),\ldots ,G(\rho '_k))\) is a t-private implementation of f with encoder I and decoder O which uses \(k \cdot n_r\) random bits.

4.2 Extended Security Model: PINI-R

In Theorem 7 above we have considered an extended model of security, where the adversary can get any randomness subset \(\rho _i\) in the circuit with a single probe. Therefore, we define a variant of the PINI notion from [CS18], called PINI-R, in which the adversary can also get access to a subset of the randoms in a gadget, using a single probe.

Definition 9

(PINI-R). Let G be a gadget with input shares \(x_{i,\star }\) and output shares \(y_{i,\star }\). Let \((\rho _i)_{1 \le i \le n}\) be a partition of the randoms used by G. The gadget G is PINI-R if for any \(t _1\in {\mathbb N}\), any set of \(t_1\) intermediate variables, any subset \(\mathcal{O}\) of output indices and any subset \(R \subset [1,n]\), there exists a subset \(I \subset [1,n]\) of input indices with \(|I| \le t_1\) such that the \(t_1\) intermediate variables, the output shares \(y_{|\mathcal{O} \cup R,\star }\) and the randoms \(\rho _i\) for \(i \in R\) can be perfectly simulated from the input shares \(x_{|I \cup \mathcal{O} \cup R,\star }\).

The following proposition is analogous to Proposition 1. It shows that if a gadget with \(n=t+1\) shares is PINI-R, then a t-probing adversary learns nothing about the underlying secrets, even in an extended model of security where the adversary can get each randomness subset \(\rho _i\) with a single probe. We provide the proof in the full version of our paper [CGZ19].

Proposition 4

(PINI-R security). Let G be a gadget with input shares \(x_{i,\star }\) and output shares \(y_{i,\star }\) for \(1 \le i \le n\). Let \((\rho _i)_{1 \le i \le n}\) be a partition of the randomness used by G. If G is PINI-R, then G is \((n-1)\)-probing secure in an extended model of security where the adversary can get each \(\rho _i\) with a single probe.

In the composition theorem below, the attacker can get the union of all corresponding subsets of randoms from all gadgets, still with a single probe; see the full version of our paper [CGZ19] for the proof.

Theorem 8

(Composition of PINI-R). Any composite gadget made of PINI-R composing gadgets \(G_i\) for \(i \in K\) is PINI-R, where for the composite gadget we take the randomness partition \(\rho _i=\bigcup _{k \in K} \rho ^{(k)}_i\) for \(1 \le i \le n\).

It is straightforward to prove the PINI-R property of the locality refreshing algorithm from Sect. 2.6, with the randomness partition \(\rho _i=\{s_i\}\) for \(1 \le i \le n-1\). In the full version of our paper [CGZ19] we consider an analogous extension of the t-SNI property, called t-SNI-R, which we prove for the SecMult and \(\mathsf {SecMultILR}\) constructions, and the corresponding \(\mathsf{FullRefresh}\). More precisely, we show that those gadgets remain secure in an extended model of security where the adversary can get all randoms \(r_{ij}\) (and all randoms \(s_{ij}\) for \(\mathsf {SecMultILR}\)) for a given i with a single probe. Moreover the “double-SNI” approach still works for the t-SNI-R and PINI-R notions. This implies that we can base our construction on t-SNI-R and PINI-R gadgets, and the resulting construction will be PINI-R. Note that the t-SNI security proof of \(\mathsf {SecMultILR}2\) is already complex, so we will not try to prove the t-SNI-R property of \(\mathsf {SecMultILR}2\); therefore we will use the multiple PRGs approach for \(\mathsf{SecMultFLR}\) and \(\mathsf {SecMultILR}\) only.

4.3 Constant Locality with Respect to a Randomness Subset

In this section we show that we can achieve constant locality, even \(\ell =1\), when we consider different subsets of randomness. Therefore we first provide a definition of gadget locality with respect to a subset of the gadget randomness only (and excluding the randomness of the inputs, as opposed to Sect. 2.7), and then a locality composition theorem as in Sect. 2.7.

Definition 10

(\(\ell \)-local gadget with randomness subset). Let G be a gadget and let \(\rho \) be a subset of the randomness used by G. The gadget G is said to make an \(\ell \)-local use of its randomness \(\rho \) if any intermediate variable of G depends on at most \(\ell \) bits of \(\rho \).

For example, the SecMult gadget makes a 1-local use of its randomness \(\rho =\{r_{ij}\}\) for any \(1 \le i < j \le n\); this is obvious, since \(\rho \) contains a single random bit. We can now state our composition theorem for locality with respect to a randomness subset. It shows that the gadget locality \(\ell \) is kept the same in the composite gadget, while the locality of the randoms used for output refreshing is equal to 3 with respect to each subset \(\{s^{(k)}_i , k \in K \}\) for \(1 \le i \le n-1\). We refer to the full version of our paper [CGZ19] for the proof.

Theorem 9

(Locality composition with randomness subset). Let \(G_k\) for \(k \in K\) be a set of fan-in 2 gadgets which all make an \(\ell \)-local use of a subset \(\rho _k\) of their randomness. Consider the gadgets \(G'_k\) for \(k \in K\) where the output of \(G_k\) is locality refreshed with randoms \(s^{(k)}_i\) for \(1 \le i \le n-1\). Any composite gadget made of \(G'_k\) makes an \(\ell \)-local use of the randomness \(\bigcup _{k \in K} \rho _k\), and for any \(1 \le i \le n-1\), it makes a 3-local use of the randoms in \(\{s^{(k)}_i : k \in K \}\).

For example if we compose a number of SecMultFLR gadgets, in the composite gadget the locality with respect to the randoms \(r^{(k)}_{ij}\) for fixed ij is \(\ell =1\), while the locality with respect to the randoms \(s^{(k)}_i\) for fixed i from the output locality refreshing is \(\ell =3\). We stress that in the final implementation all the randomness (including the randomness from the locality refreshing) will be generated by the PRGs. Finally, we show in the full version of our paper [CGZ19] that the latter locality can be brought down to 1; for this it suffices to additionally perform a locality refreshing of the two inputs of each gadget, with independent sets of PRGs for the two inputs.

4.4 First Construction: Multiple PRGs with \(\mathsf{SecMultFLR}\)

Our first construction is described in Fig. 6. It consists in using the SecMult algorithm and perform a locality refresh after each gadget; this includes the SecMult gadget, the Xor gadget and the FullRefresh gadget. For every \(1 \le i <j \le n\), an independent PRG generates all randoms \(r^{(k)}_{ij}\) in the SecMult and FullRefresh gadgets. Similarly, for each \(1 \le i \le n-1\), an independent PRG generates all randoms \(s^{(k)}_{i}\) in all locality refreshing gadgets.

Fig. 6.
figure 6

Private circuit construction with multiple PRGs with \(\mathsf{SecMultFLR}\).

From the locality composition theorem (Theorem 9), in the global construction the locality with respect to the randoms \(\{ r^{(k)}_{ij} : k \in K\}\) is \(\ell _r=1\), while the locality with respect to the randoms \(\{ s^{(k)}_i : k \in K\}\) is \(\ell _s=3\). From the PINI-R property of the gadgets and Theorem 8, the full circuit is PINI-R. Therefore, from Proposition 4, it is secure in an extended model of security in which the adversary can get the previous randomness subsets with a single probe. From Lemma 11, the PRGs for the \(r_{ij}\)’s must be t-wise independent, while the PRGs for the \(s_i\)’s must be 3t-wise independent. Since one requires \(n(n-1)/2\) independent PRGs for the \(r_{ij}\)’s, and \(n-1\) independent PRGs for the \(s_i\)’s, the number of input randoms in the finite field is therefore, with \(n\,=\,t\,+\,1\), \( n_r\,=\, n(n\,-\,1)\,/\,2 \cdot t \,+\, (n\,-\,1) \cdot 3t \,=\, \mathcal{O}(t^3).\) Thus we have shown the following lemma. Compared to Lemma 9 for a single robust PRG with our \(\mathsf {SecMultILR}\) algorithm, the randomness complexity is the same but the total running time goes down from \(\mathcal{\tilde{O}}(s t^5)\) to \(\mathcal{\tilde{O}}(s t^3)\).

Lemma 12

(multiple PRGs with \(\mathsf{SecMultFLR}\)). Any function of circuit size s admits a t-private implementation (ICO) with the canonic encoder I and decoder O, where C uses \(\mathcal{O}(t^3 \cdot \log (st) )\) bits of randomness, and runs in time \(\mathcal{O}(s \cdot t^3 \cdot \log ^2 (st)) \).

4.5 Second Construction: Multiple PRGs with \(\mathsf {SecMultILR}\)

Our second construction is described in Fig. 7, based on the \(\mathsf {SecMultILR}\) algorithm. As illustrated in Fig. 8, a dedicated PRG generates the \(r_{ij}\)’s for a given row i, in all gadgets. We first show that the \(\mathsf {SecMultILR}\) algorithm makes a 1-local use of each row of randoms \(r_{ij}\) and a 2-local use of each row of randoms \(s_{ij}\); see the full version of our paper [CGZ19] for the proof.

Lemma 13

(Locality of \(\mathsf {SecMultILR}\)). The \(\mathsf {SecMultILR}\) algorithm makes a 1-local use of each randomness set \(\rho _i=\{r_{ij} : i < j \le n\}\) and a 2-local use of each randomness set \(\rho '_i=\{s_{ij} : i < j \le n\}\).

From Lemma 13 and Theorem 9, in the global construction the locality with respect to the subsets of randoms \(\rho _i= \{ r^{(k)}_{ij} : i<j \le n, \,k \in K\}\) is equal to 1, the locality with respect to the subsets of randoms \(\rho '_i=\{s^{(k)}_{ij} : i<j \le n, \,k \in K\}\) is equal to 2, and the locality with respect to the subsets of randoms \(\rho ''_i=\{s^{(k)}_{i} : k \in K\}\) is still equal to 3, for each \(1 \le i <n\). As previously, from the PINI-R property of the gadgets and Proposition 8, the full circuit is PINI-R. Therefore, it is secure in an extended model of security in which the adversary can get the previous randomness subsets with a single probe. From Lemma 11, the corresponding PRGs must therefore have r-wise independence parameter \(r=t\), \(r=2t\) and \(r=3t\) respectively. The main difference is that now there are only \(n-1\) independent PRGs to generate the \(r^{(k)}_{ij}\) (instead of \(n(n-1)/2\) previously), because a given PRG generates those randoms for all indices j. The total number of input randoms in the finite field is therefore \( n_r= (n-1) \cdot t + (n-1) \cdot 2t + (n-1) \cdot 3t = \mathcal{O}(t^2).\) Thus we have shown the following lemma. Asymptotically this is the most efficient technique (see Table 1 for a comparison), and also the most efficient in practice (see the next section for our implementation results on AES).

Fig. 7.
figure 7

Private circuit construction with multiple PRGs with \(\mathsf {SecMultILR}\).

Lemma 14

(multiple PRGs with \(\mathsf {SecMultILR}\)). Any function of circuit size s admits a t-private implementation (ICO) with the canonic encoder I and decoder O, where C uses \(\mathcal{O}(t^2 \cdot \log (st))\) bits of randomness, and runs in time \(\mathcal{O}(s \cdot t^3 \cdot \log ^2 (st))\).

Fig. 8.
figure 8

In Construction 2, a dedicated PRG generates the \(r_{ij}\)’s for a given row i in all gadgets, from a random seed \(\rho '_{i}\).

5 Application to AES

In this section we describe a concrete implementation of our techniques for AES; the goal is to minimize the total amount of randomness used to protect AES against t-th order attack. We provide the source code in C in [Cor19b].

5.1 The AES Circuit and the Rivain-Prouff Countermeasure

To implement the AES SBox, we need to perform 4 multiplications, and 2 mask refreshing per byte; see [RP10] for the sequence of operations. For the mask refreshing, we use the multiplication based refreshing FullRefresh recalled in the full version of our paper [CGZ19]. We refer to [BBD+16] for the proof that the \(x^{254}\) gadget is \((n-1)\)-SNI; this implies that the gadget is PINI. Thus, this amounts to performing 6 multiplications per byte. Since there are 16 bytes to process per round, the number of required multiplications is \(6 \times 16 =96\) per round. Thus for the 10 rounds of the AES, one will perform \(96 \times 10 = 960\) multiplications.

5.2 Implementation with Single Robust PRG

We first consider an implementation with a single robust PRG as in Sect. 3, with 3 possible algorithms: the original [IKL+13] construction with a locality refresh after each multiplication gadget (\(\mathsf{SecMultFLR}\)), and our new SecMultILR and SecMultILR2 algorithms. For those three algorithms, we provide in Table 4 the total number of pseudo-randoms to be generated for the AES circuit, the corresponding locality parameter \(\ell \), and the number of 8-bit randoms from the TRNG to generate the seed of the PRG, as a function of the number of shares n, for security against t probes with \(n=t+1\).

Table 4. For AES, total number of pseudo-randoms and number of 8-bit TRNG calls, for a single robust PRG, as a function of the number of shares n. We have \(c=3\) for even n, and \(c=11/4\) for odd n. We assume that \(n \le 12\).

We now explain the content of Table 4. For each of the 3 algorithms, the number of pseudo-randoms is the number of randoms from Table 3 in Sect. 3, multiplied by 960, since one must perform 960 multiplications. Furthermore, the MixColumns operation requires 48 xors. Normally we should perform a locality refresh after each xor, but in the particular case of the AES, we can do the locality refresh only after the 3 xors of the MixColumns for each byte. In that case, the locality parameter with respect to MixColumns is then \(4(n-1)\), instead of \(2(n-1)\) for a single xor. The locality of the global circuit is then the max of locality parameter \(\ell \) from Table 3 and \(4(n-1)\). Equivalently, we can perform such locality refresh as input of the SubByte operation, which enables to keep the MixColumns unmodified. For the MixColumns, one therefore needs to perform 16 locality refresh per round, which gives a total of 160 locality refresh for the 10 rounds of the AES, which requires \(160(n-1)\) pseudo-randoms. Finally, we assume that the round keys are already masked without PRG, and so we don’t need to perform a locality refreshing after the AddRoundKey.

Let m the total number of pseudo-randoms over \({\mathbb F}_{2^8}\) that must be generated. To determine the finite field \({\mathbb F}={\mathbb F}_{2^{8 k}}\) used by the PRG, we must ensure \( m \le k \cdot |{\mathbb F}_{2^{8k}}| = k \cdot 2^{8 k}\). Namely a single polynomial evaluation over \({\mathbb F}_{2^{8 k}}\) generates k bytes of pseudo-random. One must then use a PRG with r-wise independence parameter \(r=\ell \cdot (n-1)\). Using the trivial construction with the xor of \(n=t+1\) polynomial evaluations (to provide resistance against t probes), the total number of fresh random values over \({\mathbb F}_{2^8}\) is then \( n_r=k\cdot n \cdot r= k\cdot n(n-1) \cdot \ell \).

For the three algorithms one can work over \({\mathbb F}_{2^{16}}\) for \(n \le 12\); therefore for simplicity we take \(k=2\) in Table 4. For SecMultILR and SecMultILR2, the total number of TRNG calls over \({\mathbb F}_{2^8}\) is then \(n_r=k \cdot n(n-1) \cdot 4(n-1)=4k \cdot n(n-1)^2\) with \(k=2\) for \(n \le 12\), and \(k=3\) for \(13 \le n \le 229\), instead of \(480n(n-1)\) for the original Rivain-Prouff countermeasure; therefore one needs fewer TRNG calls than Rivain-Prouff for \(n \le 40\). We summarize in Table 6 below the number of input random bytes required for AES for small values of n, compared with the original Rivain-Prouff countermeasure.

5.3 Implementation with Multiple PRGs

We now consider an implementation of AES with multiple PRGs, as in Sect. 4. We consider the \(\mathsf{SecMultFLR}\) algorithm corresponding to Construction 1, and the \(\mathsf {SecMultILR}\) algorithm corresponding to Construction 2. As previously, we provide in Table 5 the total number of pseudo-randoms to be generated for the AES circuit, and the number of 8-bit randoms from the TRNG.

As previously, we only perform a locality refresh after the 3 xors of the MixColumns (equivalently, before each SubByte). Moreover we don’t perform the LR algorithm after \(\mathsf {SecMultILR}\) as in Construction 2, since the output of \(\mathsf {SecMultILR}\) is already locality refreshed. Therefore the number of pseudo-randoms is the same as in the previous section. We use two classes of independent PRGs. The first class of independent PRGs is used to generate the \(r_{ij}\)’s from \(\mathsf{SecMultFLR}\) and \(\mathsf {SecMultILR}\) algorithms, with locality \(\ell _r=1\); therefore the PRGs must be \(\ell _r t\)-wise independent. We need \(n(n-1)/2\) such PRGs for \(\mathsf{SecMultFLR}\), and only \(n-1\) for \(\mathsf {SecMultILR}\). Working over \({\mathbb F}_{2^{16}}\), each PRG requires \(2 \ell _r t=2(n-1)\) random bytes. Similarly, the second class of PRGs is used to generate randoms \(s_i\) from the locality refresh, and also the randoms \(s_{ij}\) for the internal locality refresh in \(\mathsf {SecMultILR}\), with locality \(\ell _s=5\). Namely we only perform the locality refresh after the 3 xors of the MixColumns, and therefore the locality is \(\ell _s=5\) (instead of \(\ell _s=3)\). Note that for \(\mathsf {SecMultILR}\) we can use the same class of PRGs to generate the randoms \(s_{ij}\)’s from \(\mathsf {SecMultILR}\) and the randoms \(s_i\)’s from LR, instead of two classes in Construction 2 from Sect. 4; namely it is easy to see that the locality with respect to the corresponding randomness subsets is still equal to 5. Therefore the PRGs must be \(\ell _s t\)-wise independent; working over \({\mathbb F}_{2^{16}}\), each PRG requires \(10(n-1)\) bytes of TRNG.

In summary, for \(\mathsf{SecMultFLR}\), the total number of 8-bit TRNG calls is therefore \(n_r=n(n-1)/2 \cdot 2(n-1) + (n-1) \cdot 10 (n-1)=(n+10)(n-1)^2\) and for \(\mathsf {SecMultILR}\), we get \(n_r=(n-1) \cdot 2(n-1) + (n-1) \cdot 10 (n-1) =12(n-1)^2\) instead of \(480n(n-1)\) in the original Rivain-Prouff countermeasure.

Table 5. For AES, total number of Pseudo-random and True-random values to generate with the multiple PRGs approach, as a function of the number of shares n. Values for the Rivain-Prouff countermeasure are also recalled for comparison.

A Simple 3-wise Independent PRG. Finally, we consider the simple 3-wise independent PRG from Sect. 2.3:

$$ G(x_1,\ldots ,x_d,y_1,\ldots ,y_d)=(x_i \oplus y_j)_{1 \le i,j \le d}$$

Since the PRG function G expands from 2d to \(d^2\) bits (or bytes), the number of input randoms becomes \(\mathcal{O}(\sqrt{s})\) instead of \(\mathcal{O}(s)\), where s is the circuit size. Note that this is worse than the polynomial-based PRG used previously that requires only \(\mathcal{O}(\log s)\) randoms, but the above function G is very fast since generating a pseudo-random only takes a single xor.

Since the above PRG only achieves 3-wise independence, we want to minimize the locality. Therefore, we perform a locality refresh of the 2 inputs of each gadget (with two distinct sets of independent PRGs), and we perform a locality refresh of the outputs of each gadget (SecMult, Xor and FullRefresh), using another distinct set of independent PRGs. As shown in the full version of our paper [CGZ19], the locality with respect to each subset of randoms is then always \(\ell =1\); therefore, we can use a PRG with r-wise independence \(r=t=n-1\). This implies that this specific PRG only works for \(n=3\) and \(n=4\) shares. We argue in the full version of our paper [CGZ19] that the total number of input bytes for AES is 642 for \(n=3\) and 1056 for \(n=4\), instead of 2880 and 5760 respectively for the original Rivain-Prouff countermeasure.

Table 6. For AES, total number of TRNG bytes to generate for single and multiple PRGs methods, depending of the number of shares n. We also provide the number of TRNG bytes for the original Rivain-Prouff countermeasure.

Summary. We summarize in Table 6 the number of input random bytes required for AES for all previous methods, as a function of the number of shares n, in order to achieve t-th order security, with \(t=n-1\). We see that the most efficient method (in terms of minimizing the number of TRNG calls) is the \(\mathsf {SecMultILR}\) algorithm with multiple PRGs. Namely for small values of t we obtain almost two orders of magnitude improvement compared to the original Rivain-Prouff countermeasure. We provide in Appendix A the results of an implementation of our countermeasure on an ARM-based embedded device. We provide the source code in [Cor19b].