1 Introduction

Physical side-channel attacks that exploit leakage emitting from devices are an important threat for cryptographic implementations. Prominent sources of such physical leakages include the running time of an implementation [20], its power consumption [21] or electromagnetic radiation emitting from it [30]. A large body of recent applied and theoretical research attempts to incorporate the information an adversary obtains from the leakage into the security analysis and develops countermeasures to defeat common side-channel attacks [2, 6, 12, 17, 24, 34, 35]. While there is still a large gap between what theoretical models can achieve and what side-channel information is measured in practice, some recent important works propose models that go better in line with the perspective of cryptographic engineering [28, 33, 34]. Our work follows this line of research by analyzing the security of a common countermeasure—the so-called masking countermeasure—in the model of Prouff and Rivain [28]. Our analysis works by showing that security in certain theoretical leakage models implies security in the model of [28] and hence may be seen as a first attempt to unify the large class of different leakage models used in recent results.

The Masking Countermeasure. A large body of work on cryptographic engineering has developed countermeasures to defeat side-channel attacks (see, e.g., [22] for an overview). While many countermeasures are specifically tailored to protect particular cryptographic implementations (e.g., key updates or shielded hardware), a method that generically works for most cryptographic schemes is masking [4, 16, 27, 35]. The basic idea of a masking scheme is to secretly share all sensitive information, including the secret key and all intermediate values that depend on it, thereby making the leakage independent of the secret data. The most prominent masking scheme is the Boolean masking: a bit b is encoded by a random bit string \((b_1, \ldots , b_n)\) such that \(b = b_1 \oplus \cdots \oplus b_n\). The main difficulty in designing masking schemes is to develop masked operations, which securely compute on encoded data and ensure that all intermediate values are protected.

Masking Against Noisy Leakages. Besides the fact that masking can be used to protect arbitrary computation, it has the advantage that it can be analyzed in formal security models. The first work that formally studies the soundness of masking in the presence of leakage is the seminal work of Chari et al. [6]. The authors consider a model where each share \(b_i\) of an encoding is perturbed by Gaussian noise and show that the number of noisy samples needed to recover the encoded secret bit b grows exponentially with the number of shares. As stated in [6], this model matches real-world physical leakages that inherently are noisy. Moreover, many practical solutions exist to amplify leakage noise (see for instance the works of [7, 8, 22]).

One limitation of the security analysis given in [6] is the fact that it does not consider leakage emitting from masked computation. This shortcoming has been addressed in the recent important work of Prouff and Rivain [28], who extend at Eurocrypt 2013 the noisy leakage model of Chari et al. [6] to also include leakage from the masked operations. Specifically, they show that a variant of the construction of Ishai et al. [17] is secure even when there is noisy leakage from all the intermediate values that are produced during the computation. The authors of [28] also generalize the noisy leakage model of Chari et al. [6] to a wider range of leakage functions instead of considering only the Gaussian one. While clearly noisy leakage is closer to physical leakage occurring in real world, the security analysis of [28] has a number of shortcomings which puts strong limitations in which settings the masking countermeasure can be used and achieves the proved security statements. In particular, like earlier works on leakage resilient cryptography [10, 13] the security analysis of Prouff and Rivain relies on so-called leak-free gates. Moreover, security is shown in a restricted adversarial model that assumes that plaintexts are chosen uniformly during an attack and the adversary does not exploit joint information from the leakages and, e.g., the ciphertext. We discuss these shortcomings in more detail in the next section.

1.1 The Work of Prouff and Rivain [28]

Prouff and Rivain [28] analyze the security of a block-cipher implementation that is masked with an additive masking scheme working over a finite field \({\mathbb F}\). More precisely, let t be the security parameter then a secret \(s \in {\mathbb F}\) is represented by an encoding \((X_1,\ldots , X_t)\) such that each \(X_i \leftarrow {\mathbb F}\) is uniformly random subject to \(s = X_1 \oplus \cdots \oplus X_t\). As discussed above the main difficulty in designing secure masking schemes is to devise masked operations that work on masked values. To this end, Prouff and Rivain use the original scheme of Ishai et al. [17] augmented with some techniques from [5, 31] to work over larger fields and to obtain a more efficient implementation. The masked operations are built out of several smaller components. First, a leak-free operation that refreshes encodings, i.e., it takes as input an encoding \((X_1,\ldots , X_t)\) of a secret s and outputs a freshly and independently chosen encoding of the same value. Second, a number of leaky elementary operations that work on a constant number of field elements. For each of these elementary operations the adversary is given leakage f(X), where X are the inputs of the operation and f is a noisy function. Clearly, the noise level has to be high enough so that given f(X) the values of X is not completely revealed. To this end, the authors introduce the notion of a bias, which informally says that the statistical distance between the distribution of X and the conditional distribution X|f(X) is bounded by some parameter.

While noisy leakages are certainly a step in the right direction to model physical leakage, we detail below some of the limitations of the security analysis of Prouff and Rivain [28]:

  1. 1.

    Leak-Free Components. The assumption of leak-free computation has been used in earlier works on leakage resilient computation [10, 13]. It is a strong assumption on the physical hardware and, as stated in [28], an important limitation of the current proof approach. The leak-free component of [28] is a simple operation that takes as input an encoding and refreshes it. While the computation of this operation is supposed to be completely shielded against leakage, the inputs and the outputs of this computation may leak. Notice that the leak-free component of [28] depends on the computation that is carried out in the circuit by taking inputs. In particular, this means that the computation of the leak-free component depends on secret information, which makes it harder to protect in practice and is different from earlier works that use leak-free components [10, 13].

  2. 2.

    Random Message Attacks. The security analysis is given only for random (known) message attacks. This is in contrast to most works in cryptography, which usually consider at least a chosen message attack. Hence, the proof does not cover chosen plaintext or chosen ciphertext attacks. We notice, however, that it is not clear whether chosen message attacks can improve the adversary’s success probability in the context of DPA attacks [36].

  3. 3.

    Mutual-Information-Based Security Statement. The final statement of Theorem 4 in [28] only gives a bound on the mutual information of the key and the leakages from the cipher. While a mutual information analysis is a common method in side-channel analysis to evaluate the security of countermeasures [33], it has important shortcomings such as not including information that an adversary may learn from exploiting joint information from the leakages and plaintext/ciphertext pairs. Notice that such use of mutual information gets particularly problematic under continuous leakage attacks, since multiple plaintext/ciphertext pairs information theoretically completely reveal the secret key. The more standard security notion used in cryptography and also for the analysis of masking schemes, e.g., in the work of Ishai et al., uses a simulation-based approach and does not have these drawbacks.

1.2 Our Contribution

We show in this work how to eliminate limitations 1–3 by a simple and elegant simulation-based argument and a reduction to the so-called t-probing adversarial setting [17] (that in this paper we call the t-threshold-probing model to emphasize the difference between this model and the random-probing model defined later). The t-threshold-probing model considers an adversary that can learn the value of t intermediate values that are produced during the computation and is often considered as a good approximation for modeling higher-order attacks. We notice that limitation 4 from above is what enables our security analysis. The fact that the noise is independent for each elementary operation allows us to formally prove security under an identical noise model as [28], but using a simpler and improved analysis. In particular, we are able to show that the original construction of Ishai et al. satisfies the standard simulation-based security notion under noisy leakages without relying on any leak-free components. We emphasize that our techniques are very different (and much simpler) than the recent breakthrough result of Goldwasser and Rothblum [15] who show how to eliminate leak-free gates in the bounded leakage model. We will further discuss related works in Sect. 1.3.

Our proof considers three different leakage models and shows connections between them. One may view our work as a first attempt to “reduce” the number of different leakage models, which is in contrast to many earlier works that introduced new leakage settings. Eventually, we are able to reduce the security in the noisy leakage model to the security in the t -threshold-probing model. This shows that, for the particular choice of parameters given in [28], security in the t-threshold-probing model implies security in the noisy leakage model. This goes in line with the common approach of showing security against t-order attacks, which usually requires to prove security in the t-threshold-probing model. Moreover, it shows that the original construction of Ishai et al. that has been used in many works on masking (including the work of Prouff and Rivain) is indeed a sound approach for protecting against side-channel leakages when assuming that they are sufficiently noisy. We give some more details on our techniques below.

From Noisy Leakages to Random Probes. As a first step in our security proof we show that we can simulate any adversary in the noisy leakage model of Prouff and Rivain with an adversary in a simpler noise model that we name a random-probing adversary and is similar to a model introduced in [17]. In this model, an adversary recovers an intermediate value with probability \(\epsilon \) and obtains a special symbol \(\bot \) with probability \(1-\epsilon \). This reduction shows that this model is worth studying, although from the engineering perspective it may seem unnatural.

From Random Probes to the t-Threshold-Probing Model. We show how to go from the random-probing adversary setting to the more standard t-threshold-probing adversary of Ishai et al. [17]. This step is rather easy as due to the independency of the noise we can apply Chernoff’s bound almost immediately. One technical difficulty is that the work of Prouff and Rivain considers joint noisy leakage from elementary operations, while the standard t-threshold-probing setting only talks about leakage from wires. Notice, however, that the elementary operations of [28] only depend on two inputs and, hence, it is not hard to extend the result of Ishai et al. to consider “gate probing adversary” by tolerating a loss in the parameters. Finally, our analysis enables us to show security of the masking based countermeasure without the limitations 1–3 discussed above.

Leakage Resilient Circuits with Simulation-Based Security. In our security analysis we use the framework of leakage resilient circuits introduced in the seminal work of Ishai et al. [17]. A circuit compiler takes as input the description of a cryptographic scheme C with secret key K, e.g., a circuit that describes a block cipher, and outputs a transformed circuit \(C'\) and corresponding key \(K'\). The circuit \(C'[K']\) shall implement the same functionality as C running with key K, but additionally is resilient to certain well-defined classes of leakage. Notice that while the framework of [17] talks about circuits the same approach applies to software implementations, and we only follow this notation to abstract our description.

Moreover, our work uses the well-established simulation paradigm to state the security guarantees we achieve. Intuitively, simulation-based security says that whatever attack an adversary can carry out when knowing the leakage; he can also run (with similar success probability) by just having black-box access to C. In contrast to the approach based on Shannon information theory, our analysis includes attacks that exploit joint information from the leakage and plaintext/ciphertext pairs. It seems impossible to us to incorporate the plaintext/ciphertext pairs into an analysis based on Shannon information theory. To see this, consider a block-cipher execution, where, clearly, when given a couple of plaintext/ciphertext pairs, the secret key is information theoretically revealed.Footnote 1 The authors of [28] are well aware of this problem and explicitly exclude such joint information. A consequence of the simulation-based security analysis is that we require an additional mild assumption on the noise—namely, that it is efficiently computable (see Sect. 3.1 for more details). While this is a standard assumption made in most works on leakage resilient cryptography, we emphasize that we can easily drop the assumption of efficiently computable noise (and hence considering the same noise model as [28]), when we only want to achieve the weaker security notion considered in [28]. Notice that in this case we are still able to eliminate the limitations 1 and 2 mentioned above.

1.3 Related Work

Masking & Leakage Resilient Circuits. A large body of work has proposed various masking schemes and studies their security in different security models (see, e.g., [4, 16, 27, 31, 35]). The already mentioned t-threshold-probing model has been considered in the work of Rivain and Prouff [31], who show how to extend the work of Ishai et al. to larger fields and propose efficiency improvements. In [29] it was shown that techniques from multiparty computation can be used to show security in the t-threshold-probing model. The work of Standaert et al. [35] studies masking schemes using the information theoretic framework of [33] by considering the Hamming weight model. Many other works analyze the security of the masking countermeasure, and we refer the reader for further details to [28].

With the emergence of leakage resilient cryptography [2, 12, 24] several works have proposed new security models and alternative masking schemes. The main difference between these new security models and the t-threshold-probing model is that they consider joint leakages from large parts of the computation. The work of Faust et al. [13] extends the security analysis of Ishai et al. beyond the t-threshold-probing model by considering leakages that can be described by low-depth circuits (so-called \(AC^0\) leakages). Faust et al. use leak-free component that have been eliminated by Rohtblum in [32] using computational assumptions. The recent work of Miles and Viola [25] proposes a new circuit transformation using alternating groups and shows security with respect to \(AC^0\) and \(TC^0\) leakages.

Another line of work considers circuits that are provably secure in the so-called continuous bounded leakage model [10, 14, 15, 18]. In this model, the adversary is allowed to learn arbitrary information from the computation of the circuit as long as the amount of information is bounded. The proposed schemes rely additionally on the assumption of “only computation leaks information” of Micali and Reyzin [24].

Noisy Leakage Models. The work of Faust et al. [13] also considers circuit compilers for noisy models. Specifically, they propose a construction with security in the binomial noise model, where each value on a wire is flipped independently with probability \(p \in (0, 1/2)\). In contrast to the work of [28] and our work the noise model is restricted to binomial noise, but the noise rate is significantly better (constant instead of linear noise). Similar to [28] the work of Faust et al. also uses leak-free components. Besides these works on masking schemes, several works consider noisy leakages for concrete cryptographic schemes [12, 19, 26]. Typically, the noise model considered in these works is significantly stronger than the noise model that is considered for masking schemes. In particular, no strong assumption about the independency of the noise is made.

2 Preliminaries

We start with some standard definitions and lemmas about the statistical distance. If \({\mathcal A}\) is a set, then \(U \leftarrow {\mathcal A}\) denotes a random variable sampled uniformly from \({\mathcal A}\). Recall that if A and B are random variables over the same set \({\mathcal A}\) then the statistical distance betweenAandB is denoted as \(\varDelta (A;B)\), and defined as \(\varDelta (A ; B) = \frac{1}{2}\sum _{a \in {\mathcal A}} |{\mathbb P}\left( A = a\right) - {\mathbb P}\left( B = a\right) | = \sum _{a\in {\mathcal A}}\max \{0,{\mathbb P}\left( A = a\right) - {\mathbb P}\left( B = a\right) \}\). If \({\mathcal X},{\mathcal Y}\) are some events, then by \(\varDelta ((A | {\mathcal X}) \ ;\ (B | {\mathcal Y}))\) we will mean the distance between variables \(A'\) and \(B'\), distributed according to the conditional distributions \(P_{A|{\mathcal X}}\) and \(P_{B | {\mathcal Y}}\). If \({\mathcal X}\) is an event of probability 1, then we also write \(\varDelta (A \ ;\ (B | {\mathcal Y}))\) instead of \(\varDelta ((A | {\mathcal X}) \ ;\ (B | {\mathcal Y}))\). If C is a random variable, then by \(\varDelta (A \ ;\ (B | C))\) we mean \(\sum {\mathbb P}\left( C = c\right) \cdot \varDelta (A \ ; \ (B | (C=c)))\).

If AB,  and C are random variables, then \(\varDelta ((B;C)\ | \ A)\) denotes \(\varDelta ((BA); (CA))\). It is easy to see that it is equal to \(\sum _{a} {\mathbb P}\left( A = a\right) \cdot \varDelta ((B|A=a)\ ; \ (C|A=a))\). If \(\varDelta (A;B) \le \epsilon \), then we say that A and B are \(\epsilon \)-close. The “\(\,{\buildrel d \over =}\,\)” symbol denotes the equality of distributions, i.e., \(A \,{\buildrel d \over =}\,B\) if and only if \(\varDelta (A;B)=0\).

2.1 Basic Probability-Theoretic Facts

Here, we state some basic lemmas that will be used later in some proofs.

Lemma 1

Let AB be two (possibly correlated) random variables. Let \(B'\) be a variable distributed identically to B but independent from A. We have

$$\begin{aligned} \varDelta (A;(A|B)) = \varDelta ((B;B') \ | \ A). \end{aligned}$$
(1)

Proof

We have

$$\begin{aligned}&\varDelta (A; (A | B)) \nonumber \\&\quad = \sum _{b} \frac{1}{2}\cdot {\mathbb P}\left( B = b\right) \cdot \sum _{a} \left| {\mathbb P}\left( A = a\right) - {\mathbb P}\left( A=a | B=b\right) \right| \nonumber \\&\quad = \frac{1}{2} \sum _{a,b} \left| {\mathbb P}\left( B = b\right) \cdot {\mathbb P}\left( A = a\right) - {\mathbb P}\left( B= b\right) \cdot {\mathbb P}\left( A= a| B = b\right) \right| \nonumber \\&\quad = \frac{1}{2} \sum _{a,b} \left| {\mathbb P}\left( B' = b \wedge A = a\right) - {\mathbb P}\left( B= b \wedge A = a\right) \right| \nonumber \\&\quad = \varDelta ((B;B') \ | \ A), \end{aligned}$$
(2)

where in (2) we used the fact that \(B'\) is a variable distributed identically to B and it is independent from A. \(\square \)

Lemma 2

For any random variables A and B and an event \({\mathcal E}\) we have

$$\begin{aligned} \varDelta ((A \, | \, \lnot {\mathcal E}) ; B) \le \varDelta (A;B) + {\mathbb P}\left( {\mathcal E}\right) , \end{aligned}$$

where \(\lnot {\mathcal E}\) denotes the negation of \({\mathcal E}\).

Proof

We have

$$\begin{aligned}&\varDelta ((A \, | \, \lnot {\mathcal E}) ; B) \nonumber \\&\quad = \frac{1}{2} \sum _a |{\mathbb P}\left( A = a \,| \, \lnot {\mathcal E}\right) - {\mathbb P}\left( B=a\right) | \nonumber \\&\quad \le \underbrace{\frac{1}{2} \sum _a |{\mathbb P}\left( A = a\right) - {\mathbb P}\left( A = a \,| \, \lnot {\mathcal E}\right) |}_{=:(*)} + \underbrace{\frac{1}{2} \sum _a |{\mathbb P}\left( A=a\right) - {\mathbb P}\left( B=a\right) |}_{=\varDelta (A;B)}, \qquad \end{aligned}$$
(3)

where (3) comes from the triangle inequality. Now, \((*)\) is equal to

$$\begin{aligned} \sum _{a \in {\mathcal A}} {\mathbb P}\left( A = a \right) - {\mathbb P}\left( A = a \,| \, \lnot {\mathcal E}\right) , \end{aligned}$$
(4)

where \({\mathcal A}\) is the set of all a’s such that \({\mathbb P}\left( A = a \right) \ge {\mathbb P}\left( A = a \,| \, \lnot {\mathcal E}\right) \). Clearly we have that \({\mathbb P}\left( A = a \,| \, \lnot {\mathcal E}\right) \ge {\mathbb P}\left( A = a \wedge \lnot {\mathcal E}\right) \), and hence (4) is at most equal to \({\mathbb P}\left( A = a\right) - {\mathbb P}\left( A = a \wedge \lnot {\mathcal E}\right) \). We therefore have

$$\begin{aligned}&\sum _{a \in {\mathcal A}} {\mathbb P}\left( A = a \right) - {\mathbb P}\left( A = a \wedge \lnot {\mathcal E}\right) \\&\quad = {\mathbb P}\left( A \in {\mathcal A}\right) - {\mathbb P}\left( A \in {\mathcal A}\wedge \lnot {\mathcal E}\right) \\&\quad \le {\mathbb P}\left( {\mathcal E}\right) \end{aligned}$$

Thus, altogether, (4) is at most equal to \({\mathbb P}\left( {\mathcal E}\right) + \varDelta (A;B)\). This finishes the proof. \(\square \)

We will also need the following standard fact.

Lemma 3

(Chernoff bound, see, e.g., [9], Theorem 1.1) Let \(Z = \sum _{i=1}^{n} Z_{i}\), where \(Z_{i}\)’s are random variables independently distributed over [0, 1]. Then for every \(\xi \in [0,1]\) we have

$$\begin{aligned} {\mathbb P}\left( Z \ge (1 + \xi ) {\mathbb E}\left( Z\right) \right) \le \exp \left( - \frac{\xi ^{2}}{3} {\mathbb E}\left( Z\right) \right) . \end{aligned}$$

3 Noise from Set Elements

We start with describing the basic framework for reasoning about the noise from elements of a finite set \({\mathcal X}\). Later, in Sect. 4, we will consider the leakage from the vectors over \({\mathcal X}\), and then, in Sect. 5, from the entire computation. The reason why we can smoothly use the analysis from Sect. 3.1 in the later sections is that, as in the work of Prouff and Rivain, we require that the noise is independent for all elementary operations. By elementary operations, [28] considers the basic underlying operations over the underlying field \({\mathcal X}\) used in a masked implementation. In this work, we consider the same setting and type of underlying operations (in fact, notice that our construction is identical to theirs—except that we eliminate the leak-free gates and prove a stronger statement). Notice that instead of talking about elementary operations, we consider the more standard term of “gates” that was used in the work of Ishai et al. [17].

3.1 Modeling Noise

Let us start with a discussion defining what it means that a randomized function \({ Noise}: {\mathcal X}\rightarrow {\mathcal Y}\) is “noisy”. We will assume that \({\mathcal X}\) is finite and rather small: typical choices for \({\mathcal X}\) would be \(\mathrm {GF}(2)\) (the “Boolean case”), or \(\mathrm {GF}(2^{8})\), if we want to deal with the AES circuit. The set \({\mathcal Y}\) corresponds to the set of all possible noise measurements and may be infinite, except when we require the “efficient simulation” (we discuss it further at the end of this section). As already informally described in Sect. 1.1, our basic definition is as follows: we say that the function \({ Noise}\) is \(\delta \)-noisy if

$$\begin{aligned} \delta = \varDelta (X; (X | { Noise}(X))). \end{aligned}$$
(5)

Of course for (5) to be well-defined we need to specify the distribution of X. The idea to define noisy functions by comparing the distributions of X and “X conditioned on \({ Noise}(X)\)” comes from [28], where it is argued that the most natural choice for X is a random variable distributed uniformly over \({\mathcal X}\). We also adopt this convention and assume that \(X \leftarrow {\mathcal X}\). We would like to stress, however, that in our proofs we will apply \({ Noise}\) to inputs \(\hat{X}\) that are not necessarily uniform and in this case the value of \(\varDelta (\hat{X}; (\hat{X}| { Noise}(\hat{X}))\) may obviously be some non-trivial function of \(\delta \). Of course if \(X \leftarrow {\mathcal X}\) and \(X' \leftarrow {\mathcal X}\), then \({ Noise}(X')\) is distributed identically to \({ Noise}(X)\), and hence, by Lemma 1, Eq. (5) is equivalent to:

$$\begin{aligned} \delta = \varDelta (({ Noise}(X);{ Noise}(X')) \ |\ X), \end{aligned}$$
(6)

where X and \(X'\) are uniform over \({\mathcal X}\). Note that at the beginning this definition may be a bit counter-intuitive, as smaller\(\delta \) means more noise: in particular we achieve “full noise” if \(\delta = 0\), and “no noise” if \(\delta \approx 1\). Let us compare this definition with the definition of [28]. In a nutshell: the definition of [28] is similar to ours, the only difference being that instead of the statistical distance \(\varDelta \) in [28] the authors use a distance based on the Euclidean norm. More precisely, they start with defining \(\mathrm {d}\) as: \( \mathrm {d}(X;Y) := \sqrt{\sum _{x \in {\mathcal X}} ({\mathbb P}\left( X = x\right) - {\mathbb P}\left( Y = y\right) )^{2}}, \) and using this notion they define \(\beta \) as:

$$\begin{aligned} \beta (X|{ Noise}(X)) := \sum _{y \in {\mathcal Y}} {\mathbb P}\left( { Noise}(X) = y\right) \cdot \mathrm {d}(X \ ; \ (X | { Noise}(X) = y)) \end{aligned}$$

(where X is uniform). In the terminology of [28] a function \({ Noise}\) is “\(\delta \)-noisy” if \(\delta = \beta (X|{ Noise}(X))\). Observe that the right hand side of our noise definition in Eq. (5) can be rewritten as:

$$\begin{aligned} \sum _{y \in {\mathcal Y}} {\mathbb P}\left( { Noise}(X) = y\right) \cdot \varDelta (X \ ; \ (X | { Noise}(X) = y)), \end{aligned}$$

hence the only difference between their approach and ours is that we use \(\varDelta \) where they use the distance \(\mathrm {d}\). The authors do not explain why they choose this particular measure. We believe that our choice to use the standard definition of statistical distance \(\varDelta \) is more natural in this setting, since, unlike the “\(\mathrm {d}\)” distance, it has been used in hundreds of cryptographic papers in the past. The popularity of the \(\varDelta \) distance comes from the fact that it corresponds to an intuitive concept of the “indistinguishability of distributions” —it is well-known, and simple to verify, that \(\varDelta (X;Y) \le \delta \) if and only if no adversary can distinguish between X and Y with advantage better than \(\delta \).Footnote 2 Hence, e.g., (6) can be interpreted as:

\(\delta \) is the maximum probability, over all adversaries \({\mathcal A}\), that \({\mathcal A}\) distinguishes between the noise from a uniform X that is known to him, and a uniform \(X'\) that is unknown to him.

It is unclear to us if a \(\mathrm {d}\) distance has a similar interpretation. We emphasize, however, that the choice whether to use \(\varDelta \) or \(\beta \) is not too important, as the following inequalities between these measures hold for every X and Y distributed over \({\mathcal X}\) (cf. [28]):

$$\begin{aligned} \frac{1}{2} \cdot \mathrm {d}(X;Y) \le \varDelta (X;Y) \le \frac{\sqrt{\left| {\mathcal X}\right| }}{2} \cdot \mathrm {d}(X;Y), \end{aligned}$$

and consequently

$$\begin{aligned} \frac{1}{2} \cdot \beta (X|{ Noise}(X)) \le \varDelta (X; (X | { Noise}(X)) \le \frac{\sqrt{\left| {\mathcal X}\right| }}{2} \cdot \beta (X|{ Noise}(X)). \end{aligned}$$
(7)

Hence, we decide to stick to the “\(\varDelta \) distance” in this paper. However, to allow for comparison between our work and the one of [28] we will at the end of the paper present our results also in terms of the \(\beta \) measure. (This translation will be straightforward, thanks to the inequalities in (7).) In [28] (cf. Theorem 4) the result is stated in form of Shannon information theory. While such an information theoretic approach may be useful in certain settings [33], we follow the more “traditional” approach and provide an efficient simulation argument. As discussed in the introduction, this also covers a setting where the adversary exploits joint information of the leakage and, e.g., the plaintext/ciphertext pairs. We emphasize, however, that our results can easily be expressed in the information theoretic language, thanks to the following bound: \(I(A;B) \le (2N / \ln 2) \cdot \varDelta (A;(A|B))\), where A is uniformly distributed over a set of cardinality N. This result comes from Proposition 1 in [28] combined with the inequality \(\beta (A|B) \le 2\varDelta (A;(A|B))\), cf. (7).

3.1.1 The Issue of “Efficient Simulation”

To achieve the strong simulation-based security notion, we need an additional requirement on the leakage, namely, that the leakage can efficiently be “simulated”—which typically requires that the noise function is efficiently computable. In fact, for our proofs to go through we actually need something slightly stronger, namely that \({ Noise}\)is efficiently decidable by which we mean that (a) there exists a randomized poly-time algorithm that computes it, and (b) the set \({\mathcal Y}\) is finite and for every x and y the value of \({\mathbb P}\left( { Noise}(x) = y\right) \) is computable in polynomial time. While (b) may look like a strong assumption we note that in practice for most “natural” noise functions (like the Gaussian noise with a known parameter, measured with a very good, but finite, precision) it is easily satisfiable.

Recall that the results of [28] are stated without taking into consideration the issue of the “efficient simulation”. Hence, if one wants to compare our results with [28], then one can simply drop the efficient decidability assumption on the noise. To keep our presentation concise and clean, also in this case the results will be presented in a form “for every adversary \({\mathcal A}\) there exists an (inefficient) simulator \({\mathcal S}\)”. Here the “inefficient simulator” can be an arbitrary machine, capable, e.g., of sampling elements from any probability distributions.

3.2 Simulating Noise by \(\epsilon \)-Identity Functions

Lemma 4 below is our main technical tool. Informally, it states that every \(\delta \)-noisy function \({ Noise}: {\mathcal X}\rightarrow {\mathcal Y}\) can be represented as a composition \({ Noise}' \circ \varphi \) of efficiently computable randomized functions \({ Noise}'\) and \(\varphi \), where \(\varphi \) is a “\(\delta \cdot \left| {\mathcal X}\right| \)-identity function”, defined in Definition 1 below.

Definition 1

A randomized function \(\varphi : {\mathcal X}\rightarrow {\mathcal X}\cup \{\bot \}\) is an \(\epsilon \)-identity if for every x we have that either \(\varphi (x) = x\) or \(\varphi (x) = \bot \) and \({\mathbb P}\left( \varphi (x) \ne \bot \right) = \epsilon \).

This will allow us to reduce the “noisy attacks” to the “random-probing attacks”, where the adversary learns each wire (or a gate, see Sect. 5.5) of the circuit with probability \(\epsilon \). Observe also, that thanks to the assumed independence of noise, the events that the adversary learns each element are independent, which, in turn, will allow us to use the Chernoff bound to prove that with a good probability the number of wires that the adversary learns is small.

Lemma 4

Let \({ Noise}: {\mathcal X}\rightarrow {\mathcal Y}\) be a \(\delta \)-noisy function. Then there exist \(\epsilon \le \delta \cdot \left| {\mathcal X}\right| \) and a randomized function \({ Noise}' : {\mathcal X}\cup \{\bot \} \rightarrow {\mathcal X}\) such that for every \(x \in {\mathcal X}\) we have

$$\begin{aligned} { Noise}(x) \,{\buildrel d \over =}\,{ Noise}'(\varphi (x)), \end{aligned}$$
(8)

where \(\varphi : {\mathcal X}\rightarrow {\mathcal X}\cup \{\bot \}\) is the \(\epsilon \)-identity function. Moreover, if \({ Noise}\) is efficiently decidable then \({ Noise}'(\varphi (x))\) is computable in time that is expected polynomial in \(\left| {\mathcal X}\right| \).

Before we proceed to the proof let us remark that the “\(|{\mathcal X}|\) factor” loss in this lemma (when going from \(\delta \) to \(\epsilon \)) is in general unavoidable. More concretely, in the subsequent work [11] (Sect. 5) it is shown that there exist \(\delta \)-noisy functions that can be reduced (in the sense of Lemma 4) to an \(\epsilon \)-identity function only if \(\epsilon \) is (at least) approximately equal to \(\delta \cdot |{\mathcal X}|/2\).

Proof of Lemma 4

We consider only the case when \({ Noise}\) is efficiently decidable, and hence the \({ Noise}'\) function that we construct will be efficiently computable. The case when \({ Noise}\) is not efficiently decidable is handled in an analogous way (the proof is actually simpler as the only difference is that we do not need to argue about the efficiency of the sampling algorithms).

Very informally speaking, our proof is based on an extension of the standard observation that for any two random variables A and B one can find two events \({\mathcal A}\) and \({\mathcal B}\) such that the distributions \(P_{A|{\mathcal A}}\) and \(P_{B|{\mathcal B}}\) are equal and \({\mathbb P}\left( {\mathcal A}\right) ,{\mathbb P}\left( {\mathcal B}\right) = \varDelta (A;B)\) (see, e.g., [23, Section 1.3]).

Let X and \(X'\) be uniform over \({\mathcal X}\). For every \(y \in {\mathcal Y}\) define

$$\begin{aligned} \pi (y) = \min _{x\in {\mathcal X}}({\mathbb P}\left( { Noise}(x) = y\right) ). \end{aligned}$$

Clearly \(\pi \) is computable in time polynomial in \(\left| {\mathcal X}\right| \). Obviously \(\pi \) is usually not a probability distribution since it does not sum up to 1. The good news is that it sums up “almost” to 1 provided \(\delta \) is sufficiently small. This is shown below. Let \(\epsilon := 1 - \sum _{y \in {\mathcal Y}}\pi (y)\). We now have

$$\begin{aligned} \epsilon= & {} \overbrace{\sum _{y \in {\mathcal Y}} {\mathbb P}\left( { Noise}(X') = y\right) }^{=1} - \sum _{y \in {\mathcal Y}}\pi (y) \nonumber \\= & {} \sum _{y \in {\mathcal Y}} {\mathbb P}\left( { Noise}(X') = y\right) - \min _{x\in {\mathcal X}}({\mathbb P}\left( { Noise}(x) = y\right) ) \nonumber \\= & {} \sum _{y \in {\mathcal Y}} \max _{x\in {\mathcal X}}({\mathbb P}\left( { Noise}(X') = y\right) - {\mathbb P}\left( { Noise}(x) = y\right) ) \nonumber \\\le & {} \sum _{y \in {\mathcal Y}} \sum _{x\in {\mathcal X}}\max (0,{\mathbb P}\left( { Noise}(X') = y\right) - {\mathbb P}\left( { Noise}(x) = y\right) )) \end{aligned}$$
(9)
$$\begin{aligned}= & {} \sum _{x \in {\mathcal X}} \varDelta ({ Noise}(x);{ Noise}(X')) \nonumber \\= & {} \left| {\mathcal X}\right| \cdot \varDelta (({ Noise}(X);{ Noise}(X')) \ | \ X) \nonumber \\= & {} \delta \cdot \left| {\mathcal X}\right| , \end{aligned}$$
(10)

where (9) comes from the fact that the maximum of positive values cannot be larger than their sumFootnote 3, and (10) follows from the assumption that the \({ Noise}\) function is \(\delta \)-noisy. Our construction of the \({ Noise}'\) function is based on the standard technique of rejection sampling. Let \({ Noise}'(x)\) be a distribution defined as follows: for every \(y \in {\mathcal Y}\) and every \(x \ne \bot \) let:

$$\begin{aligned} {\mathbb P}\left( { Noise}'(x) = y\right) = ({\mathbb P}\left( { Noise}(x) = y\right) - \pi (y)) / \epsilon , \end{aligned}$$
(11)

and otherwise:

$$\begin{aligned} {\mathbb P}\left( { Noise}'(\bot ) = y\right) = \pi (y)/(1 - \epsilon ). \end{aligned}$$
(12)

We will later show how to sample \({ Noise}'\) efficiently. Obviously this will automatically imply that (11) and (12) define probability distributions over \({\mathcal Y}\) (which may not be obvious at the first sight). First, however, let us show (8). To this end take any \(x \in {\mathcal X}\) and \(y \in {\mathcal Y}\) and observe that

$$\begin{aligned}&{\mathbb P}\left( { Noise}'(\varphi (x)) = y\right) \\&\quad = {\mathbb P}\left( \varphi (x) = x\right) \cdot {\mathbb P}\left( { Noise}'(x) = y\right) + {\mathbb P}\left( \varphi (x) = \bot \right) \cdot {\mathbb P}\left( { Noise}'(\bot ) = y\right) \\&\quad = \epsilon \cdot ({\mathbb P}\left( { Noise}(x) = y\right) - \pi (y)) / \epsilon + (1 - \epsilon ) \cdot \pi (y) / (1 - \epsilon )\\&\quad = {\mathbb P}\left( { Noise}(x) = y\right) - \pi (y) + \pi (y) \\&\quad = {\mathbb P}\left( { Noise}(x) = y\right) . \end{aligned}$$

Which implies (8). What remains is to show how to sample \({ Noise}'\) efficiently. Let us first show an efficient algorithm \(\mathsf {Alg_{1}}(x)\) for computing \({ Noise}'(x)\) for \(x \ne \bot \):

\(\mathsf {Alg_{1}}(x):\)

  1. 1.

    Sample y from \({ Noise}(x)\).

  2. 2.

    With probability \(\pi (y)/{\mathbb P}\left( { Noise}(x)=y\right) \) resample y, i.e.: go back to Step 1.

  3. 3.

    Output y.

We now argue that \(\mathsf {Alg_{1}}(x)\) indeed computes \({ Noise}'(x)\) efficiently. Let \(R_1 \in \{1,2,\ldots \}\) be a random variable denoting the number of times the algorithm \(\mathsf {Alg_{1}}(x)\) performed Step 1. First observe that the probability of jumping back to Step 1 in Step 2 is equal to

$$\begin{aligned} \sum _{y} {\mathbb P}\left( { Noise}(x) = y\right) \cdot \pi (y)/{\mathbb P}\left( { Noise}(x)=y\right)= & {} \sum _{y} \pi (y) \end{aligned}$$
(13)
$$\begin{aligned}= & {} 1 - \epsilon \end{aligned}$$
(14)

Therefore the probability of not jumping back to Step 1 in Step 2 is \(\epsilon \), and hence the expected number \({\mathbb E}\left( R_1\right) \) of the executions of Step 1 in \(\mathsf {Alg_{1}}(x)\) is equal to \( \sum _{i=1}^{n} i \cdot (1 - \epsilon )^{i-1} \cdot \epsilon = 1/\epsilon \). Moreover for every \(i=0,1,\ldots \) we have:

$$\begin{aligned}&{\mathbb P}\left( \mathsf {Alg_{1}}(x) = y \wedge R_1 = i \ | \ R_1 \ge i\right) \\&\quad = {\mathbb P}\left( { Noise}(x) = y\right) \cdot (1 - (\pi (y)/{\mathbb P}\left( { Noise}(x)=y\right) ))\\&\quad = {\mathbb P}\left( { Noise}(x) = y\right) - \pi (y) \end{aligned}$$

Hence

$$\begin{aligned}&{\mathbb P}\left( \mathsf {Alg_{1}}(x) = y\right) \\&\quad =\sum _{i=0}^{\infty }{\mathbb P}\left( \mathsf {Alg_{1}}(x) = y \wedge R_1 = i\right) \\&\quad = \sum _{i=0}^{\infty }{\mathbb P}\left( \mathsf {Alg_{1}}(x) = y \wedge R_1 = i\ |\ R_1 \ge i\right) \cdot {\mathbb P}\left( R_1 \ge i\right) \\&\quad = \left( {\mathbb P}\left( { Noise}(x) = y\right) - \pi (y) \right) \cdot \sum _{i=1}^{\infty } {\mathbb P}\left( R_1 \ge i\right) \\&\quad = \left( {\mathbb P}\left( { Noise}(x) = y\right) - \pi (y) \right) \cdot {\mathbb E}\left( R_1\right) \\&\quad = ({\mathbb P}\left( { Noise}(x) = y\right) - \pi (y)) / \epsilon , \end{aligned}$$

as required in (11). We now present an efficient algorithm \(\mathsf {Alg_{2}}\) for computing \({ Noise}'(\bot )\). Fix an arbitrary element \(x_{0} \in {\mathcal X}\), and execute the following.

\(\mathsf {Alg_{2}}:\)

  1. 1.

    Sample y from \({ Noise}(x_{0})\).

  2. 2.

    With probability \(1 - (\pi (y)/{\mathbb P}\left( { Noise}(x_{0}) = y\right) )\) resample y, i.e.: go back to Step 1.

  3. 3.

    Output y.

By a similar argument as in the case of \(\mathsf {Alg_{1}}\) we obtain that the expected number \(R_{2}\) of times the algorithm \(\mathsf {Alg_{2}}\) performs Step 1 is equal to \({\mathbb E}\left( R_{2}\right) = 1/(1 - \epsilon )\). Moreover for every \(i=1,2,\ldots \) we have:

$$\begin{aligned} {\mathbb P}\left( \mathsf {Alg_{2}}= b \wedge R_{2} = i \ | \ R_{2} \ge i\right)= & {} \pi (y), \end{aligned}$$

which, in turn, implies that \({\mathbb P}\left( \mathsf {Alg_{2}}(x) = y\right) = \pi (y)/(1-\epsilon )\), and hence the output of \(\mathsf {Alg_{2}}\) satisfies (12). Clearly, the expected running time of both algorithms is polynomial in \(\left| {\mathcal X}\right| \) and \({\mathbb E}\left( R\right) \), where R is the number of execution of Step 1 in \(\mathsf {Alg_{1}}\) or \(\mathsf {Alg_{2}}\). We obviously have

$$\begin{aligned} {\mathbb E}\left( R\right)= & {} {\mathbb E}\left( R_{1} | \varphi (x) \ne \bot \right) \cdot {\mathbb P}\left( \varphi (x) \ne \bot \right) + {\mathbb E}\left( R_{2} | \varphi (x) = \bot \right) \cdot {\mathbb P}\left( \varphi (x) = \bot \right) \\= & {} (1/\epsilon ) \cdot \epsilon + (1/(1-\epsilon )) \cdot (1 - \epsilon )\\= & {} 2. \end{aligned}$$

Hence, the expected running time of \({ Noise}'(\varphi (x))\) is polynomial in \(\left| X\right| \). \(\square \)

4 Leakage from Vectors

In this section we describe the leakage models relevant to this paper. We start with describing the models abstractly, by considering leakage from an arbitrary sequence \((x_{1},\ldots ,x_{\ell }) \in {\mathcal X}^{\ell }\), where \({\mathcal X}\) is some finite set and \(\ell \) is a parameter. The adversary \({\mathcal A}\) will be able to obtain some partial information about \((x_{1},\ldots ,x_{\ell })\) via the games described below. Note that we do not specify the computational power of \({\mathcal A}\), as the definitions below make sense for both computationally bounded and infinitely powerful \({\mathcal A}\).

Noisy Model. For \(\delta \ge 0\) a \(\delta \)-noisy adversary on\({\mathcal X}^{\ell }\) is a machine \({\mathcal A}\) that plays the following game against an oracle that knows \((x_{1},\ldots ,x_{\ell }) \in {\mathcal X}^{\ell }\):

  1. 1.

    \({\mathcal A}\) specifies a sequence \(\{{ Noise}_{i} : {\mathcal X}\rightarrow {\mathcal Y}\}_{i=1}^{\ell }\) of noisy functions such that every \({ Noise}_{i}\) is \(\delta '_{i}\)-noisy, for some \(\delta '_{i} \le \delta \) and mutually independent noises.

  2. 2.

    \({\mathcal A}\) receives \({ Noise}_{1}(x_{1}),\ldots ,{ Noise}_\ell (x_{\ell })\) and outputs some value \( out _{{\mathcal A}}(x_{1},\ldots ,x_{\ell })\).

If \({\mathcal A}\) works in polynomial time and the noise functions specified by \({\mathcal A}\) are efficiently decidable, then we say that \({\mathcal A}\)is poly-time-noisy.

Random-Probing Model. For \(\epsilon \le 0\) a \(\epsilon \)-random-probing adversary on\({\mathcal X}^{\ell }\) is a machine \({\mathcal A}\) that plays the following game against an oracle that knows \((x_{1},\ldots ,x_{\ell }) \in {\mathcal X}^{\ell }\):

  1. 1.

    \({\mathcal A}\) specifies a sequence \((\epsilon _{1},\ldots ,\epsilon _{\ell })\) such that each \(\epsilon _{i} \le \epsilon \).

  2. 2.

    \({\mathcal A}\) receives \(\varphi _{1}(x_{1}),\ldots ,\varphi _\ell (x_{\ell })\) and outputs some value \( out _{{\mathcal A}}(x_{1},\ldots ,x_{\ell })\), where each \(\varphi _{i}\) is the \(\epsilon _{i}\)-identity function with mutually independent randomness.

A similar model was introduced in the work of Ishai, Sahai and Wagner [17] to obtain a circuit compiler that blows up the size of the circuit linearly in the security parameter d. Also, the work of Ajtai [1] considers the random-probing model and constructs a compiler that for sufficiently large security parameter d achieves security in the random-probing model for a small (but constant) probability \(\epsilon \). [1], however, does not give concrete parameters for \(\epsilon \) and d, and circuits produced by the compiler of [1] result into a huge circuit size blow-up (\(O(d^4)\) with large hidden constants).

Threshold-Probing Model. For \(t = 0,\ldots ,\ell \) a t-threshold-probing adversary on\({\mathcal X}^{\ell }\) is a machine \({\mathcal A}\) that plays the following game against an oracle that knows \((x_{1},\ldots ,x_{\ell }) \in {\mathcal X}^{\ell }\):

  1. 1.

    \({\mathcal A}\) specifies a set \(\mathcal{I}= \{i_{1},\ldots ,i_{\left| \mathcal{I}\right| }\} \subseteq \{1,\ldots ,\ell \}\) of cardinality at most t,

  2. 2.

    \({\mathcal A}\) receives \((x_{i_{1}},\ldots ,x_{i_{\left| \mathcal{I}\right| }})\) and outputs some value \( out _{{\mathcal A}}(x_{1},\ldots ,x_{\ell })\).

4.1 Simulating the Noisy Adversary by a Random-Probing Adversary

The following lemma shows that every \(\delta \)-noisy adversary can be simulated by a \(\delta \cdot \left| {\mathcal X}\right| \)-random-probing adversary.

Lemma 5

Let \({\mathcal A}\) be a \(\delta \)-noisy adversary on \({\mathcal X}^{\ell }\). Then there exists a \(\delta \cdot \left| {\mathcal X}\right| \)-random-probing adversary \({\mathcal S}\) on \({\mathcal X}^{\ell }\) such that for every \((x_{1},\ldots ,x_{\ell })\) we have

$$\begin{aligned} out _{{\mathcal A}}(x_{1},\ldots ,x_{\ell }) \,{\buildrel d \over =}\, out _{{\mathcal S}}(x_{1},\ldots ,x_{\ell }). \end{aligned}$$
(15)

Moreover, if \({\mathcal A}\) is poly-time-noisy, then \({\mathcal S}\) works in time polynomial in \(\left| {\mathcal X}\right| \).

Proof

Without loss of generality assume that \({\mathcal A}\) simply outputs all the information that he gets. Thus (15) can be rewritten as:

$$\begin{aligned} ({ Noise}_{1}(x_{1}),\ldots ,{ Noise}_{\ell }(x_{\ell })) \,{\buildrel d \over =}\, out _{{\mathcal S}}(x_{1},\ldots ,x_{\ell }), \end{aligned}$$
(16)

where \({ Noise}_{i}\)’s are the \(\delta _{i}\)-noisy functions chosen by \({\mathcal A}\). By Lemma 4 for each i there exists \(\epsilon _{i} \le \delta _{i} \cdot \left| {\mathcal X}\right| \le \delta \cdot \left| {\mathcal X}\right| \) and a randomized function \({ Noise}'_{i} : {\mathcal X}\cup \{\bot \} \rightarrow {\mathcal X}\), such that for every \(x \in {\mathcal X}\) we have

$$\begin{aligned} { Noise}_{i}(x) \,{\buildrel d \over =}\,{ Noise}'_{i}(\varphi _{i}(x)), \end{aligned}$$
(17)

where \(\varphi _{i} : {\mathcal X}_{i}\rightarrow {\mathcal X}_{i} \cup \{\bot \}\) is the \(\epsilon _{i}\)-identity function and \(Noise'_{i}(\varphi _{i}(x))\) is computable in time polynomial in \(\left| {\mathcal X}\right| \). We now describe the actions of \({\mathcal S}\). The sequence that he specifies is \((\epsilon _{1},\ldots ,\epsilon _{\ell })\). After receiving \((y_{1},\ldots ,y_{\ell })\) (equal to \((\varphi _{1}(x_{1}),\ldots ,\varphi _{\ell }(x_{\ell }))\)) he outputs

$$\begin{aligned} out (x_{1},\ldots ,x_{\ell }) := ({ Noise}'_{1}(y_{1}),\ldots ,{ Noise}'_{\ell }(y_{\ell })) \end{aligned}$$

(this clearly takes time that is expected polynomial in \(\ell \cdot \left| {\mathcal X}\right| \)). We now have

$$\begin{aligned}&({ Noise}'_{1}(y_{1}),\ldots ,{ Noise}'_{\ell }(y_{\ell })) \nonumber \\&\quad \,{\buildrel d \over =}\,({ Noise}'_{1}(\varphi _{1}(x_{1})),\ldots ,{ Noise}'_{\ell }(\varphi _{\ell }(x_{\ell }))) \nonumber \\&\quad \,{\buildrel d \over =}\,({ Noise}_{1}(x_{1}),\ldots ,{ Noise}_{\ell }(x_{\ell })) \end{aligned}$$
(18)

where (18) comes from (17). This implies (16) and hence it finishes the proof. \(\square \)

Intuitively, this lemma easily follows from Lemma 4 applied independently to each element of \((x_{1},\ldots ,x_{\ell })\).

4.2 Simulating the Random-Probing Adversary by a Threshold-Probing Adversary

In this section we show how to simulate every \(\delta \)-random-probing adversary by a threshold adversary. This simulation, unlike the one in Sect. 4, will not be perfect in the sense that the distribution output by the simulator will be identical to the distribution of the original adversary only when conditioned on some event that happens with a large probability. We start with the following lemma, whose proof is a straightforward application of the Chernoff bound.

Lemma 6

Let \({\mathcal A}\) be an \(\epsilon \)-random-probing adversary on \({\mathcal X}^{\ell }\). Then there exists a \((2 \epsilon \ell -1)\)-threshold-probing adversary \({\mathcal S}\) on \({\mathcal X}^{\ell }\) operating in time linear in the working time of \({\mathcal A}\) such that for every \((x_{1},\ldots ,x_{\ell })\) we have

$$\begin{aligned} \varDelta ( out _{{\mathcal A}}(x_{1},\ldots ,x_{\ell }) \ ; \ out _{{\mathcal S}}(x_{1},\ldots ,x_{\ell })\ |\ out _{{\mathcal S}}(x_{1},\ldots ,x_{\ell }) \ne \bot ) = 0, \end{aligned}$$
(19)

where

$$\begin{aligned} {\mathbb P}\left( out _{{\mathcal S}}(x_{1},\ldots ,x_{\ell }) = \bot \right) \le \exp \left( -\frac{\epsilon \ell }{3}\right) . \end{aligned}$$
(20)

Proof

As in the proof of Lemma 5 we assume that the simulated adversary \({\mathcal A}\) outputs all the information that he received. Moreover, since for \(\epsilon ' \le \epsilon \) every \(\epsilon '\)-identity function \(\varphi '\) can be simulated by the \(\epsilon \)-identity function \(\varphi \),Footnote 4 hence we can assume that each \(\epsilon _{i}\) specified by \({\mathcal A}\) is equal to \(\epsilon \). Thus, we need to show a \(2 \epsilon \ell \)-threshold-probing simulator \({\mathcal S}\) such that for every \((x_{1},\ldots ,x_{\ell }) \in {\mathcal X}^{\ell }\) we have

$$\begin{aligned} \varDelta (\varphi _{1}(x_{1}),\ldots ,\varphi _{\ell }(x_{\ell }) \ ; \ out _{{\mathcal S}}(x_{1},\ldots ,x_{\ell })\ |\ out _{{\mathcal S}}(x_{1},\ldots ,x_{\ell }) \ne \bot ) = 0, \end{aligned}$$
(21)

(where each \(\varphi _{i}\) is the \(\epsilon \)-identity function) and (20) holds. The simulator \({\mathcal S}\) proceeds as follows. First he chooses a sequence \((Z_{1},\ldots ,Z_{\ell })\) of independent random variables in the by setting, for each i:

$$\begin{aligned} Z_{i} := \left\{ \begin{array}{ll} 1 &{} \text{ with } \text{ probability } \epsilon _{i}\\ 0 &{} \text{ otherwise }. \end{array} \right. \end{aligned}$$

Let Z denote the number of \(Z_{i}\)’s equal to 1, i.e., \(Z := \sum _{i=1}^{\ell } Z_{i}\). If \(Z \ge 2 \ell \epsilon \), then \({\mathcal S}\) outputs \(\bot \). Otherwise, he specifies the set \(\mathcal{I}\) as \(\mathcal{I}:= \{i : Z_{i} = 1\}\). He receives \((x_{i_{1}},\ldots ,x_{i_{\left| \mathcal{I}\right| }})\). For all the remaining i’s (i.e., those not in the set \(\mathcal{I}\)) the simulator sets \(x_{i} := \bot \). He outputs \((x_{1},\ldots ,x_{\ell })\). It is straightforward to see that \({\mathcal S}\) is \((2\epsilon \ell - 1)\)-threshold-probing and that (21) holds. What remains is to show (20). Since \({\mathbb E}\left( Z\right) = \epsilon \ell \),

$$\begin{aligned} {\mathbb P}\left( Z \ge 2 \ell \epsilon \right)= & {} {\mathbb P}\left( Z \ge 2 {\mathbb E}\left( Z\right) \right) \nonumber \\\le & {} \exp \left( -\frac{\epsilon \ell }{3}\right) , \end{aligned}$$
(22)

where (22) comes from the Chernoff bound with \(\xi = 1\) (cf. Lemma 3). This finishes the proof. \(\square \)

The following corollary combines Lemmas 5 and 6 together, and will be useful in the sequel.

Corollary 1

Let \(d, \ell \in \mathbb {N}\) with \(\ell > d\) and let \({\mathcal A}\) be a \(d/(4 \ell \cdot \left| {\mathcal X}\right| )\)-noisy adversary on \({\mathcal X}^{\ell }\). Then there exists an \((d/2 - 1)\)-threshold-probing adversary \({\mathcal S}\) such that

$$\begin{aligned} \varDelta ( out _{{\mathcal A}}(x_{1},\ldots ,x_{\ell }) \ ; \ out _{{\mathcal S}}(x_{1},\ldots ,x_{\ell }) \ |\ out _{{\mathcal S}}(x_{1},\ldots ,x_{\ell }) \ne \bot ) = 0 \end{aligned}$$
(23)

and \({\mathbb P}\left( out _{{\mathcal S}}(x_{1},\ldots ,x_{\ell }) = \bot \right) \le \exp (-d/12)\). Moreover, if \({\mathcal A}\) is poly-time-noisy then \({\mathcal S}\) works in time polynomial \(\ell \cdot \left| {\mathcal X}\right| \).

Proof

By Lemma 5 there exists a \(d/(4 \ell )\)-random-probing adversary \({\mathcal A}'\) whose output is distributed identically to the output of \({\mathcal A}\). In turn, by Lemma 6 for \(t = 2 \cdot (d/(4\ell )) \cdot \ell = d/2\) there exists a \((t-1)\)-threshold-probing adversary \({\mathcal S}\) whose output, conditioned on not being equal to \(\bot \), is distributed identically to the output of \({\mathcal A}'\), and such that \({\mathbb P}\left( out _{{\mathcal S}}(x_{1},\ldots ,x_{\ell }) = \bot \right) \le \exp (-d/12)\).

If \({\mathcal A}\) is poly-time noisy then clearly the expected working time of \({\mathcal A}'\) is polynomial in \(\ell \cdot \left| {\mathcal X}\right| \). Since the working time of \({\mathcal S}\) is linear in the working time of \({\mathcal A}\) hence this finishes the proof. \(\square \)

5 Leakage from Computation

In this section we address the main topic of this paper, which is the noise-resilience of cryptographic computations. Our main model will be the model of arithmetic circuits over a finite field. First, in Sect. 5.1 we present our security definitions, and then, in Sect. 5.2 we describe a secure “compiler” that transforms any cryptographic scheme secure in the “black-box” model into one secure against the noisy leakage (it is essentially identical to the transformation of [17] later extended in [31]). Finally, in the last section we present our security results.

5.1 Definitions

A (stateful arithmetic) circuit\(\varGamma \)over a field\({\mathbb F}\) is a directed graph whose nodes are called gates. Each gate \(\gamma \) can be of one of the following types: an input gate\(\gamma ^{ inp }\) of fan-in zero, an output gate\(\gamma ^{ out }\) of fan-out zero, a random gate\(\gamma ^{ rand }\) of fan-in zero, a multiplication gate\(\gamma ^{\times }\) of fan-in 2, an addition gate\(\gamma ^{+}\) of fan-in 2, a subtraction gate\(\gamma ^{-}\) of fan-in 2, a constant gate\(\gamma ^{ const }\), and a memory gate \(\gamma ^{ mem }\) of fan-in 1. Following [17] we assume that the fan-out of every gate is at most 3. The only cycles that are allowed in \(\varGamma \) must contain exactly 1 memory gate. The size\(\left| \varGamma \right| \)of the circuit\(\varGamma \) is defined to be the total number of its gates. The numbers of input gates, output gates and memory gates will be denoted \(\left| \varGamma . inp \right| , \left| \varGamma . out \right| \), and \(\left| \varGamma . mem \right| \), respectively.

The computation of \(\varGamma \) is performed in several “rounds” numbered \(1,2,\ldots \). In each of them the circuit will take some input, produce an output and update the memory state. Initially, the memory gates of \(\varGamma \) are preloaded with some initial “state” \(k_{0} \in {\mathbb F}^{\left| \varGamma . mem \right| }\). At the beginning of the ith round the input gates are loaded with elements of some vector \(a_{i} \in {\mathbb F}^{\left| \varGamma . inp \right| }\) called the input for theith round. The computation of \(\varGamma \) in the ith round depends on \(a_{i}\) and on the memory state \(k_{i-1}\). It proceeds in a straightforward way: if all the input wires of a given gate are known then the value on its output wire can be computed naturally: if \(\gamma \) is a multiplication gate with input wires carrying values a and b, then its output wire will carry the value \(a \cdot b\) (where “\(\cdot \)” is the multiplication operation in \({\mathbb F}\)), and the addition and the subtraction gates are handled analogously. We assume that the random gates produce a fresh random field element in each round. The output of theith round is read-off from the output gates and denoted \(b_{i} \in {\mathbb F}^{\left| \varGamma . out \right| }\). The state after theith round is contained in the memory gates and denoted \(k_{i}\). For \(k \in {\mathbb F}^{\left| \varGamma . mem \right| }\) and a sequence of inputs \((a_{1},\ldots ,a_{m})\) (where each \(a_{i} \in {\mathbb F}^{\left| \varGamma . inp \right| }\)) let \(\varGamma (k,a_{1},\ldots ,a_{m})\) denote the sequence \((B_{1},\ldots ,B_{m})\) where each \(B_{i}\) is the output of \(\varGamma \) with \(k_{0} = k\) and inputs \(a_{1},\ldots ,a_{m}\) in rounds \(1,2,\ldots \). Observe that, since \(\varGamma \) is randomized, hence \(\varGamma (k,a_{1},\ldots ,a_{m})\) is a random variable.

A black-box circuit adversary\({\mathcal A}\) is a machine that adaptively interacts with a circuit \(\varGamma \) via the input and output interface. Then \( out \left( {\mathcal A}\mathop {\leftrightarrows }\limits ^{bb}\varGamma (k)\right) \) denotes the output of \({\mathcal A}\) after interacting with \(\varGamma \) whose initial memory state is \(k_{0} = k\). A \(\delta \)-noisy circuit adversary\({\mathcal A}\) is an adversary that has the following additional ability: after each ith round \({\mathcal A}\) gets some partial information about the internal state of the computation via the noisy leakage functions. More precisely: let \((X_{1},\ldots ,X_{\ell })\) be the random variable denoting the values on the wires of \(\varGamma (k)\) in the ith round. Then \({\mathcal A}\) plays the role of a \(\delta \)-noisy adversary in a game against \((X_{1},\ldots ,X_{\ell })\) (c.f. Sect. 4), namely: he chooses a sequence \(\{{ Noise}_{i} : {\mathbb F}\rightarrow {\mathcal Y}\}_{i=1}^{\ell }\) of functions such that every \({ Noise}_{i}\) is \(\delta _{i}\)-noisy for some \(\delta _{i} \le \delta \) and he receives \({ Noise}_{1}(X_{1}),\ldots ,{ Noise}_\ell (X_{\ell })\). Let \( out \left( {\mathcal A}\,{\buildrel noisy \over \leftrightarrows }\,\varGamma (k)\right) \) denote the output of such an \({\mathcal A}\) after interacting with \(\varGamma \) whose initial memory state is \(k_{0} = k\).

We can also replace, in the above definition, the “\(\delta \)-noisy adversary” with the “\(\epsilon \)-random probing adversary”. In this case, after each ith round \({\mathcal A}\) chooses a sequence \((\epsilon _{1},\ldots ,\epsilon _{\ell })\) such that each \(\epsilon _{i} \le \epsilon \) and he learns \(\varphi _{1}(X_{1}),\ldots ,\varphi _\ell (X_{\ell })\), where each \(\varphi _{i}\) is the \(\epsilon _{i}\)-identity function. Let \( out \left( {\mathcal A}\,{\buildrel rnd \over \leftrightarrows }\,\varGamma (k)\right) \) denote the output of such \({\mathcal A}\) after interacting with \(\varGamma \) whose initial memory state is \(k_{0} = k\).

Analogously we can replace the “\(\delta \)-noisy adversary” with the “t-threshold probing adversary” obtaining an adversary that after each ith round \({\mathcal A}\) learns t elements of \(X_{1},\ldots ,X_{\ell }\). Let \( out \left( {\mathcal A}\,{\buildrel thr \over \leftrightarrows }\,\varGamma (k)\right) \) denote the output of such \({\mathcal A}\) after interacting with \(\varGamma \) whose initial memory state is \(k_{0} = k\).

Definition 2

Consider two stateful circuits \(\varGamma \) and \(\varGamma '\) (over some field \({\mathbb F}\)) and a randomized encoding function \( Enc \). We say that \(\varGamma '\) is a \((\delta ,\xi )\)-noise-resilient implementation of a circuit\(\varGamma \)w.r.t. \( Enc \) if the following holds for every \(k \in {\mathbb F}^{\left| \varGamma . inp \right| }\):

  1. 1.

    the input-output behavior of \(\varGamma (k)\) and \(\varGamma '( Enc (k))\) is identical, i.e.: for every sequence of inputs \(a_{1},\ldots ,a_{m}\) and outputs \(b_{1},\ldots ,b_{m}\) we have

    $$\begin{aligned}&{\mathbb P}\left( \varGamma (k,a_{1},\ldots ,a_{m}) = (b_{1},\ldots ,b_{m})\right) \\&\quad = {\mathbb P}\left( \varGamma '( Enc (k),a_{1},\ldots ,a_{m}) = (b_{1},\ldots ,b_{m})\right) \end{aligned}$$

    and

  2. 2.

    for every \(\delta \)-noisy circuit adversary \({\mathcal A}\) there exists a black-box circuit adversary \({\mathcal S}\) such that

    $$\begin{aligned} \varDelta \left( out \left( {\mathcal S}\mathop {\leftrightarrows }\limits ^{bb}\varGamma (k)\right) \ ; \ out \left( {\mathcal A}\,{\buildrel noisy \over \leftrightarrows }\,\varGamma '( Enc (k))\right) \right) \le \xi . \end{aligned}$$
    (24)

The definition of \(\varGamma '\) being a \((\epsilon ,\xi )\)-random-probing resilient implementation of a circuit\(\varGamma \) is identical to the one above, except that Point 2 is replaced with:

  1. 2’.

    for every \(\epsilon \)-random-probing circuit adversary \({\mathcal A}\) there exists a black-box circuit adversary \({\mathcal S}\) such that

    $$\begin{aligned} \varDelta \left( out \left( {\mathcal S}\mathop {\leftrightarrows }\limits ^{bb}\varGamma (k)\right) \ ; \ out \left( {\mathcal A}\,{\buildrel rnd \over \leftrightarrows }\,\varGamma '( Enc (k))\right) \right) \le \xi . \end{aligned}$$

The definition of \(\varGamma '\) being a \((t,\xi )\)-threshold-probing resilient implementation of a circuit\(\varGamma \) is identical to the one above, except that Point 2 is replaced with:

  1. 2”.

    for every t-threshold-probing circuit adversary \({\mathcal A}\) there exists a black-box circuit adversary \({\mathcal S}\) such that

    $$\begin{aligned} \varDelta \left( out \left( {\mathcal S}\mathop {\leftrightarrows }\limits ^{bb}\varGamma (k)\right) \ ; \ out \left( {\mathcal A}\,{\buildrel thr \over \leftrightarrows }\,\varGamma '( Enc (k))\right) \right) \le \xi . \end{aligned}$$

In all cases above we will say that \(\varGamma '\) is a an implementation \(\varGamma \)with efficient simulation if the simulator \({\mathcal S}\) works in time polynomial in \(\varGamma '\cdot \left| {\mathbb F}\right| \) as long as \({\mathcal A}\) is poly-time and the noise functions specified by \({\mathcal A}\) are efficiently decidable.

5.2 The Implementation

In this section we describe the circuit compiler of [17], generalized to larger fields in [31]. Let \(\varGamma \) be a stateful arithmetic circuit and let \(d \in \mathbb {N}\) be a parameter. The encoding function \( Enc _{+}\) that we use is also standard and is often called the “additive masking”. It is defined as: \( Enc _{+}(x) := (X_{1},\ldots ,X_{d})\), where \(X_{1},\ldots ,X_{d}\) are uniform such that \(X_{1} + \cdots + X_{d} = x\).

At a high level, each wire w in the original circuit \(\varGamma \) is represented by a wire bundle in \(\varGamma '\), consisting of d wires \(\overrightarrow{w} = (w_1, \ldots , w_d)\), that carry an encoding of w. The gates in C are replaced gate-by-gate with so-called gadgets, computing on encoded values. The main difficulty is to construct gadgets that remain “secure” even if their internals may leak.

Because the transformed gadgets in \(\varGamma '\) operate on encodings, \(\varGamma '\) needs to have a subcircuit at the beginning that encodes the inputs and another subcircuit at the end that decodes the outputs. We will deal with the output decoding later. The input encoding is easy to implement for our encoding function \( Enc _{+}\): to encode an input x one simply uses the random gates to generate \(d-1\) field elements \(x_{1},\ldots ,x_{d-1}\) and then computes \(x_{d}\) as \(x_{1} + \cdots + x_{d-1} - x\). Clearly this can be done using d addition and subtraction gates. Recall that the memory gates of \(\varGamma \) are assumed to be preloaded with field elements that already encode k using the encoding \( Enc _{+}\) [cf. (24)]; hence, there is no need to encode k.

Each constant gate \(\gamma ^{ const }_{c}\) in \(\varGamma \) can be transformed into d constant gates in \(\varGamma '\), the first of them being \(\gamma _c^{ const }\) and the remaining ones being \(\gamma _{0}^{ const }\). This is trivially correct as \(c = c + 0 + \cdots + 0\). Every random gate \(\gamma ^{ rand }\) in \(\varGamma \) is transformed into d random gates in \(\varGamma '\). This works since, clearly, a uniformly random encoding \((X_{1},\ldots ,X_{d})\) encodes a uniformly random element of \({\mathbb F}\).

What remains to show is how the operation (addition, subtraction, and multiplication) gates are handled. Consider a gate \(\gamma \) in \(\varGamma \). Let a and b be its input wires and let \(\overrightarrow{a} = (a_{1},\ldots ,a_{d})\) and \(\overrightarrow{b} = (b_{1},\ldots ,b_{d})\) be their corresponding wire bundles in \(\varGamma '\). Let the output wire bundle in \(\varGamma '\) be \((c_{1},\ldots ,c_{d})\). The cases when \(\gamma \) is an addition or subtraction gate are actually easy to deal with, thanks to the linearity of the encoding function. For example, if \(\gamma \) is an addition gate \(\gamma ^{+}\) then each \(c_{i}\) can be computed using an addition gate \(\gamma ^{+}\) in \(\varGamma '\) with input wires \(a_{i}\) and \(b_{i}\) (this is obviously correct as \((a_{1} + b_{1}) + \cdots + (a_{d} + b_{d}) = (a_{1} + \cdots a_{d}) + (b_{1} + \cdots + b_{d})\)). The subtraction is handled analogously. The only tricky case is when \(\gamma \) is the multiplication gate. In this case the circuit \(\varGamma '\) generates, for every \(1 \le i < j \le d\), a random field element \(z_{i,j}\) (this is done using the random gates in \(\varGamma '\)). Then, for every \(1 \le j < i \le d\) it computes \( z_{i,j} := a_{i}b_{j} + a_{j}b_{i} - z_{j,i}, \) and finally he computes each \(c_{i}\) (for \(i=1,\ldots ,d\)) as \( c_{i} := a_{i}b_{i} + \sum _{i \ne j} z_{i,j}. \) To see why this computation is correct consider the sum \(c = c_{1} + \cdots + c_{d}\) and observe that every \(z_{i,j}\) in it appears exactly once with plus sign and once with a minus sign, and hence it cancels out. Moreover each term \(a_{i} b_{j}\) appears in the formula for c exactly once. Hence c is equal to \( \sum _{i,j \in \{1,\ldots ,n\}} a_{i}b_{j} = \left( \sum _{i=1}^{d} a_{i} \right) \left( \sum _{j=1}^{d} b_{j} \right) = ab. \) It is straightforward to verify that the total number of gates in this gadget is \(3.5 \cdot d^{2}\). This finishes the description of the compiler.

The multiplication gadget above turns out to be useful as a building block for “refreshing” of the encoding. More concretely, suppose we have a wire bundle \(\overrightarrow{a} = (a_{1},\ldots ,a_{d})\) and we wish to obtain another bundle \(\overrightarrow{b} = (b_{1},\ldots ,b_{d})\) such that \(\overrightarrow{b}\) is a fresh encoding of \( Dec _{+}(\overrightarrow{a})\). This can be achieved by a \( Refresh \) sub-gadget constructed as follows. First, create an encoding \((1,0,\ldots ,0)\) of 1 (using d constant gates), and multiply \((1,0,\ldots ,0)\) and \(\overrightarrow{a}\) together using the multiplication protocol above. Since \((1,0,\ldots ,0)\) is an encoding of 1, hence the result will be an encoding of \(1 \cdot a = a\). The multiplication can be done with \(3.5 \cdot d^{2}\) gates, and hence altogether this gadget uses \(3.5 \cdot d^{2} + d\) gates.

We can now use the \( Refresh \) sub-gadget to construct the output gadgets in \(\varGamma '\). Let \(\gamma ^{ out }\) be an output gate in \(\varGamma \) with an input wire a. Then in \(\varGamma '\) it is transformed into the following: let \(\overrightarrow{a}\) be the wire bundle corresponding to a. First apply the \( Refresh \) sub-gadget, and then calculate the sum \(b_{1} + \cdots + b_{d}\) (where \((b_{1},\ldots ,b_{d})\) is the output of \( Refresh \)) and output the result.

The refreshing gadget is also useful to provide security of the memory encoding in the multi-round scenario. More precisely, we assume that every memory state gets refreshed at the end of each round by the \( Refresh \) procedure. It is easy to see that without this “refreshing” the contents of the memory would eventually leak completely to the adversary even if he probes a very limited number (say: 1) of wires in each round. For more details see [17].

5.3 Security in the Probing Model [17]

In [17] it is shown that the compiler from the previous section is secure against probing attacks in which the adversary can probe at most \(\lfloor (d-1)/2\rfloor \) wires in each round.Footnote 5 This parameter may be a bit disappointing as the number of probes that the adversary needs to break the security does not grow with the size of the circuit. This assumption may seem particularity unrealistic for large circuits \(\varGamma \). Fortunately, [17] also shows a small modification of the construction from Sect. 5.2 that is resilient to a larger number of probes, provided that the number of probes from each gadget is bounded. Before we present it let us argue why the original construction is not secure against such attacks. To this end, assume that our circuit \(\varGamma \) has a long sequence of wires \(a_{1},\ldots ,a_{m}\), where each \(a_{i}\) (for \(i > 1\)) is the result of adding to \(a_{i-1}\) (using an addition gate) a 0 constant (that was generated using a \(\gamma ^{ const }_{0}\) gate). It is easy to see that in the circuit \(\varGamma '\) all the wire bundles \(\overrightarrow{a_{1}},\ldots ,\overrightarrow{a_{m}}\) (where each \(\overrightarrow{a_{i}}\) corresponds to \(a_{i}\)) will be identical. Hence, the adversary that probes even a single wire in each addition gadget in \(\varGamma '\) will learn the encoding of \(a_{1}\) completely as long as \(m \ge d\). Fortunately one can deal with this problem by “refreshing” the encoding after each subtraction and addition gate exactly in the same way as done before, i.e., by using the \( Refresh \) sub-gadget.

Lemma 7

([17]) Let \(\varGamma \) be an arbitrary stateful arithmetic circuit over some field \({\mathbb F}\). Let \(\varGamma '\) be the circuit that results from the procedure described above. Then \(\varGamma '\) is a \((\lfloor (d-1)/2\rfloor \cdot \left| \varGamma \right| ,0)\)-threshold-probing resilient implementation of a circuit \(\varGamma \) (with efficient simulation), provided that the adversary does not probe each gadget more than \(\lfloor (d-1)/2\rfloor \) times in each round.

We notice that [17] also contains a second transformation with blow-up \(\tilde{O}(d \left| \varGamma \right| )\). It may be possible that this transformation can provide better noise parameters as is achieved by Theorem 2. However, due to the hidden parameters in the \(\tilde{O}\)-notation we do not get a straightforward improvement of our result. In particular, using this transformation the size of the transformed circuit depends also on an additional statistical security parameter, which will affect the tolerated noise level.

5.4 Resilience to Noisy Leakage from the Wires

We now show that the construction from Sect. 5.3 is secure against the noisy leakage. More precisely, we show the following.

Theorem 1

Let \(\varGamma \) be an arbitrary stateful arithmetic circuit over some field \({\mathbb F}\). Let \(\varGamma '\) be the circuit that results from the procedure described in Sect. 5.3. Then \(\varGamma '\) is a \((\delta ,\left| \varGamma \right| \cdot \exp (-d/12))\)-noise-resilient implementation of \(\varGamma \) (with efficient simulation), where

$$\begin{aligned} \delta := \left( (28d + 8)\left| {\mathbb F}\right| \right) ^{-1} = O(1/(d\cdot \left| {\mathbb F}\right| )). \end{aligned}$$

Proof

Let \({\mathcal A}\) be a \(\delta \)-noisy circuit adversary attacking \(\varGamma '\). We construct an efficient black-box simulator \({\mathcal S}\) such that for every k it holds that

$$\begin{aligned} \varDelta \left( out \left( {\mathcal S}\mathop {\leftrightarrows }\limits ^{bb}\varGamma (k)\right) \ ; \ out \left( {\mathcal A}\,{\buildrel noisy \over \leftrightarrows }\,\varGamma '( Enc (k))\right) \right) \le \left| \varGamma \right| \cdot \exp (-d/12). \end{aligned}$$
(25)

Observe that in our construction every gates gets transformed into a gadget of at most \(3.5 \cdot d^{2} + d\) gates. Since each gate can have at most 2 inputs, hence the total number of wires in a gadget is \(\ell := 7 \cdot d^{2} + 2\cdot d\). Let \(\gamma ^{1},\ldots ,\gamma ^{\left| {\mathcal G}\right| }\) be the gates of \(\varGamma \). For each \(i = 1,\ldots ,\ell \) let the wires in the gadget in \(\varGamma '\) that corresponds to \(\gamma ^{i}\) be denoted \((x^{i}_{1},\ldots ,x^{i}_{\ell })\). Since \(\delta = d/(4 \ell \left| {\mathbb F}\right| )\), we can use Corollary 1 and simulate the noise from each \((x^{i}_{1},\ldots ,x^{i}_{\ell })\) by a \((d/2 - 1)\)-threshold-probing adversary \({\mathcal S}^{i}\) working in time polynomial in \(\ell \cdot \left| {\mathcal X}\right| \). The simulation is perfect, unless \({\mathcal S}^{i}\) outputs \(\bot \), which, by Corollary 1 happens with probability at most \(\exp (-d/12)\). Hence, by the union-bound the probability that some\({\mathcal S}^{i}\) outputs \(\bot \) is at most \(\left| \varGamma \right| \cdot \exp (-d/12)\). Denote this event \({\mathcal E}\).

From Lemma 7 we know that every probing adversary that attacks \(\varGamma '\) by probing at most \(\lfloor (d-1)/2\rfloor \ge d/2 - 1\) wires from each gadget can be perfectly simulated in polynomial time by an adversary \({\mathcal S}\) with a black-box access to \(\varGamma \) . Hence, \({\mathcal A}\) can also be simulated perfectly by a black-box access to \(\varGamma \) conditioned on the fact that \({\mathcal E}\) did not occur. Hence we get

$$\begin{aligned} \varDelta \left( out \left( {\mathcal S}\mathop {\leftrightarrows }\limits ^{bb}\varGamma (k)\right) | \lnot {\mathcal E}\ ; \ out \left( {\mathcal A}\,{\buildrel noisy \over \leftrightarrows }\,\varGamma '( Enc (k))\right) \right) = 0. \end{aligned}$$

This, by Lemma 2 (Sect. 2.1), implies (25). Obviously \({\mathcal S}\) works in time polynomial in \(\left| \varGamma \right| \cdot d^{2} \cdot \left| {\mathbb F}\right| \), which is polynomial in \(\varGamma ' \cdot \left| {\mathbb F}\right| \). This finishes the proof. \(\square \)

In short, this theorem is proven by combining Corollary 1 that reduces the noisy adversary to the probing adversary, with Lemma 7 that shows that the construction from Sect. 5.3 is secure against probing.

5.5 Resilience to Noisy Leakage from the Gates

The model of Prouff and Rivain is actually slightly different than the one considered in the previous section. The difference is that they assume that the noise is generated by the gates, not by the wires. This can be formalized by assuming that each noise function \({ Noise}\) is applied to the “contents of a gate”. We do not need to specify exactly what we mean by this. It is enough to observe that the contents of each gate \(\gamma \) can be described by at most 2 field elements: obviously if \(\gamma \) is a random gate, output gate, or memory gate then its entire state in a given round can be described by one field element, and if \(\gamma \) is an operation gate then it can be described by two field elements that correspond to \(\gamma \)’s input. Hence, without loss of generality we can assume that the noise function is defined over the domain \({\mathbb F}\times {\mathbb F}\).

Formally, we define a \(\delta \)-gate-noisy circuit adversary\({\mathcal A}\) as a machine that, besides of having black-box access to a circuit \(\varGamma (k)\), can, after each ith round, get some partial information about the internal state of the computation via the \(\delta \)-noisy leakage functions applied to the gates (in a model described above). Let \( out \left( {\mathcal A}\mathop {\leftrightarrows }\limits ^{g-noisy}\varGamma (k)\right) \) denote the output of such \({\mathcal A}\) after interacting with \(\varGamma \) whose initial memory state is \(k_{0} = k\).

We can accordingly modify the definition of noise-resilient circuit implementations (cf. Definition 2). We say that \(\varGamma '\) is a \((\delta ,\xi )\)-input-gate-noise resilient implementation of a circuit\(\varGamma \)w.r.t. \( Enc \) if for every k and every \(\delta \)-noisy circuit adversary \({\mathcal A}\) described above there exists a black-box circuit adversary \({\mathcal S}\) working in time polynomial in \(\varGamma '\cdot \left| {\mathbb F}\right| \) such that

$$\begin{aligned} \varDelta \left( out \left( {\mathcal S}\mathop {\leftrightarrows }\limits ^{bb}\varGamma (k)\right) \ ; \ out \left( {\mathcal A}\mathop {\leftrightarrows }\limits ^{g-noisy}\varGamma '( Enc (k))\right) \right) \le \xi . \end{aligned}$$
(26)

It turns out that the transformation from Sect. 5.3 also works in this model, although with different parameters. More precisely we have the following theorem.Footnote 6

Theorem 2

Let \(\varGamma \) be an arbitrary stateful arithmetic circuit over some field \({\mathbb F}\). Let \(\varGamma '\) be the circuit that results from the procedure described in Sect. 5.3. Then \(\varGamma '\) is a \((\delta ,\left| \varGamma \right| \cdot \exp (-d/24))\)-noise-resilient implementation of \(\varGamma \) (with efficient simulation), where

$$\begin{aligned} \delta := \left( \left( 28d + 8\right) \cdot \left| {\mathbb F}\right| ^{2}\right) ^{-1} = O(1/(d\cdot \left| {\mathbb F}\right| ^{2})). \end{aligned}$$
(27)

Proof

The proof is similar to the one of Theorem 1 so we only describe the key differences. Let \({\mathcal A}\) be a \(\delta \)-noisy adversary. The number \(\ell \) corresponds now to the number of gates in each gadget, and hence it is equal to \(3.5 \cdot d^{2} + d\). It is therefore straightforward to calculate that \(\delta \) defined in (27) is equal to \((d/2)/(4 \ell \cdot \left| {\mathbb F}\right| ^{2})\). Since the \({ Noise}\) function has domain of size \(\left| {\mathbb F}\right| ^{2}\), we can use Corollary 1 obtaining that \({\mathcal A}\) can be simulated by an adversary \({\mathcal S}\) that probes each gadget in less that d / 2 positions. Since now each “position” corresponds to a gate in the circuit, hence the adversary needs to probe up to two wires to determine its value. Therefore \({\mathcal S}\) probes less than d wires in each gadget. Since d is now 1 / 2 of what it was in the proof of Corollary 1, hence the error probability becomes \(\exp (-d/12) = \exp (-d/24)\). \(\square \)

Comparison with [28]. As described in the introduction, our main advantage over [28] is the removal of the assumption about the existence of the leak-free gates, a stronger security model—chosen message attack, instead of a random message attack, and a more meaningful security statement. Still, it is interesting to compare our noise parameters with the parameters of [28]. Let us analyze how much noise is needed by [28] to ensure that the adversary obtains exponentially small information from leakage. The reader should keep in mind that both in our paper, and in [28] “more noise” means that a certain quantity, \(\delta \), in our case, is smaller. Hence, the larger \(\delta \) is, the stronger the result becomes (as it means that less noise is required for the security to hold).

The main result of [28] is Theorem 4 on page 154. Unfortunately, the statement of this theorem is asymptotic treating \(\left| {\mathbb F}\right| \) as constant, and hence to get a precise bound on how much noise is required one needs to inspect the proof. The bound on the noise can be deduced from the part of the proof entitled “Security of Type 3 Subsequences”, where the required noise is inversely proportional to “\(\lambda (d)\)”, and this last value is linear in \(d \cdot \left| {\mathbb F}\right| ^{3}\) for a general d and linear in \(d \cdot \left| {\mathbb F}\right| ^{3/2}\) for large d’s (note that \(\left| {\mathbb F}\right| \) is denoted by N in [28], and d is a security parameter identical to ours). Hence, for a general d, their \(\delta \) is \(O(1/(d \cdot \left| {\mathbb F}\right| ^{3}))\).

However, as explained in Sect. 3.1, the notion of distance in [28] is slightly different than the standard “statistical distance” that we use. Fortunately, one can use (7) to translate our bound into their language. It turns out that in this case our and their bounds are asymptotically identical for general d’s, i.e., \((O(1/(d\cdot \left| {\mathbb F}\right| ^{3})))\). This is shown in Corollary 2 below. Note that this translation is unidirectional, in the sense that their “\(O(1/(d \cdot \left| {\mathbb F}\right| ^{3}))\)” bound does not imply a bound “\(O(1/(d \cdot \left| {\mathbb F}\right| ^{2}))\)” in our sense.

Corollary 2

Let \(\varGamma \) be an arbitrary stateful arithmetic circuit over some field \({\mathbb F}\). Let \(\varGamma '\) be the circuit that results from the procedure described in Sect. 5.3. Then \(\varGamma '\) is a \((\delta ',\left| \varGamma \right| \cdot \exp (-d/24))\)-noise-resilient implementation of \(\varGamma \) (with efficient simulation) when the noise is defined using the\(\beta \)distance, where

$$\begin{aligned} \delta ' = \left( \left( 14d + 4\right) \cdot \left| {\mathbb F}\right| ^{3}\right) ^{-1} = O(1/(d\cdot \left| {\mathbb F}\right| ^{3})). \end{aligned}$$

Proof

From (7) with \({\mathcal X}= {\mathbb F}\times {\mathbb F}\), it follows that if \({ Noise}\) is \(\delta '\)-noisy with respect to the \(\beta \) distance, then it is \((\left| {\mathbb F}\right| \cdot \delta '/2)\)-noisy in the standard sense. Since this last value is equal to \(\delta \) defined in (27), hence we can use Theorem 2 obtaining that \(\varGamma '\) is a \((\delta ',\left| \varGamma \right| \cdot \exp (-d/24))\)-noise-resilient implementation of \(\varGamma \) when the noise is defined using the \(\beta \) distance. \(\square \)