Abstract
This article deeply analyzes high-order (HO) Boolean masking countermeasures against side-channel attacks in contexts where the shares are manipulated simultaneously and the correlation coefficient is used as a statistical distinguisher. The latter attacks are sometimes referred to as zero-offset high-order correlation power analysis (HO-CPA). In particular, the main focus is to get the most out of a single mask (i.e., for masking schemes with two shares). The relationship between the leakage characteristics and the attack efficiency is thoroughly studied. Our main contribution is to link the minimum attack order (called HO-CPA immunity) to the amount of information leaked. Interestingly, the HO-CPA immunity can be much larger than the number of shares in the masking scheme. This is made possible by the leakage squeezing. It is a variant of the Boolean masking where masks are recoded relevantly by bijections. This technique and others from the state-of-the-art (namely leak-free masking and wire-tap codes) are overviewed, and put in perspective.
Similar content being viewed by others
1 Introduction
Masking [15, 24] is a countermeasure against observation attacks, also known as side-channel attacks (SCA), that is suitable for both hardware and software cryptographic implementations. It consists in changing the variable representation into randomized shares [15], and can thus be qualified as a logical countermeasure. Notably, masking does not rely on specific hardware properties, as opposed to dual-rail protection [36, Chp. 7] that demands some physical indiscernibility. However, the level of provided security depends on some hardware specificities (noise, glitches, etc.).
Nonetheless, masked implementations can always be theoretically attacked, since the tuple of all the shares unambiguously leaks information about the sensitive variable (nota bene: this was first shown by Thomas Messerges). In practice however, the difficulty of performing an attack involving several shares increases exponentially with the number of shares, the basis of the exponent being the variance of the noise. This statement has been first enlightened in Chari et al.’s paper [15], then complemented by Prouff and Rivain at EUROCRYPT 2013 [44], and confirmed several times by real experimentations. As a consequence, and as already observed in many papers, it makes sense to define the security of a masked implementation as the number of shares per internal variable. A scheme where every intermediate variable is shared into \(d+1\) shares is called \(d\)th-order masking scheme. For such a scheme, we usually say that the security is reached at order \(d\) if and only if any combination of \(d\) shares conveys no information about the sensitive variables.
We must concur that computing with \(d+1\) shares without revealing information from any set of size \(d\) of intermediate values is tough. In fact, many purported solutions have been defeated [18, 43, 46]. One sound solution, based on secure schemes to process additions and multiplications in a field, has been put forward recently in [49] and generalized in [14]. Another solution, based on extending the principle of look-up table recomputation introduced in [29], also exists [17]. They can either be applied on software or hardware and their security is stated under the two following assumptions:
-
H1: only the manipulated values leak, and
-
H2: manipulated values leak at different times without interfering.
These assumptions impose special constraints on the hardware. For instance, it must be ascertained that it does not cache the values. One example of variable caching is actually inherent to the leakage of CMOS technology. In this context, the leakage is caused by the transitions and hence involves two consecutive variables (see, e.g., [6]).
Another example is the fork of a value that causes it to be latched in two or more different registers. Thus, a sensitive value can show up many times during the computation, and the hypothesis of independent execution contexts does not hold either.
The two examples above alert us on the importance of analyzing the hardware. In particular, it must be carefully checked against caching and forking interferences when applying the existing masking schemes. Alternatively, the hardware can be designed to avoid them, for instance by enforcing a variable “wainscoting” (i.e., physical separation) as proposed for instance at the gate-level (for masked logic styles) in [21], or more recently at the algorithmic level in [41, 47]. This is the point of view that we adopt in the sequel. More precisely, here are the hardware constraints we set to satisfy the assumptions H1 and H2:
-
The leakage comes mainly from the registers update. To ensure that the computation part does not leak (via glitches that carry information about a sensitive variable [37]), the designer can decide to tabulate it in memories. For instance, in an FPGA design, the operations on the data stored in registers can be implemented by accesses to read-only memories, termed BRAM (Block RAM). As characterized for instance on Xilinx FPGAs [5], memories leak only their input and output data. But those data are exactly the ones that are already in a register (input) and that will be updated in another register (output). The paper [20] shows, still on Xilinx FPGAs, that a full-fledged AES can be implemented mostly with BRAM (and DSP, that can easily be traded for BRAMs). Based on these remarks, we will thus consider in the sequel only the leakage from the state registers.
-
The decoupling of variables (especially those belonging to the same sharing of a variable) is ensured by the storage of each share in a distinct hardware resource. In this article, we focus on a scenario where all the shares leak at the same time.
Eventually, we notice that in hardware, some further simplification hypotheses can be made:
-
The leakage is additive, and the designer can arrange that each manipulated bit leaks in the same way. This means that the Hamming weight/distance model is suitable.
-
The algorithmic noise is large, because the many variables processed in parallel with the targeted resource are unrelated to the attack, and thus act as independent noise sources, that can be modeled as a binomial law.
Contribution of the Paper Masking countermeasures are customarily characterized by the number of shares in which sensitive variables are split. The purpose of this paper is to look more in detail (with assumptions on the leakage), and to explain that the minimum order of a successful attack can be made greater than the number of shares. This is achieved by an encoding of the shares, termed leakage squeezing. In addition, we link the minimum attack order to the amount of information leaked (Theorem 1).
Outline The rest of the paper is organized as follows. A brief overview of masking theory is described in Sect. 2. The notion of minimum HO-CPA attack order (called HO-CPA immunity) is related to the amount of information leaked (even independently of any masking scheme) in Sect. 3. An extensive overview of the known techniques to reduce the leakage and to increase the HO-CPA immunity (noted \(\mathsf {HCI}\)) is given in Sect. 4. Then, Sect. 5 provides a concrete example of HO-CPA immunity optimization thanks to a “leakage squeezing” followed by some simulation results in Sect. 6. Finally, Sect. 7 concludes the paper and opens some further research perspectives. The value of a typical signal-to-noise ratio (SNR) of a side-channel signal measured on an FPGA is given in Appendix A.
2 State-of-the-art: masking
2.1 Definitions and modeling
We use capital letters (e.g., \(X\)) for random variables (RVs), and small letters (e.g., \(x\)) for their realizations. Moreover, we denote by \(\mathcal {X}\) the support of RV \(X\). The probability that \(X\) is equal to \(x\) is noted \(\mathsf {P}[X=x ]\) or \(\mathsf {P}[x ]\) when there is no ambiguity. Scalar (resp. vectorial) variables are noted in “thin” (resp. “bold”) font.
A \(d\)th-order masking scheme consists in splitting a sensitive variable \(Z \in \mathbb {F}_2^n\) (that can be deduced from either the plaintext or the ciphertext through few sub-key hypotheses) into \(d+1\) random shares, noted \(\varvec{S}=(S_i)_{i \in |[ 0, d |]}\), in such a way that the relation \(S_0 \perp \cdots \perp S_d = Z\) is satisfied for a group operation \(\perp \) (e.g., the XOR operation in additive Boolean masking). We recall hereafter the definition of the masking soundness.
Definition 1
(masking dth-order soundness) The masking is sound at \(d\)th-order if
-
\(Z\) can be deterministically reconstructed knowing the \(d+1\) shares, while
-
no information about \(Z\) can be extracted from the knowledge of strictly less than \(d+1\) shares.
In order to study a masking scheme resistance in a SCA context, one usually associates each share with a noisy observation of it, modeled by a noisy function \(\ell _i:X \mapsto f_i(X) + N_i\) where \(N_i\) is an independent and Gaussian noise and where \(f_i\) is a deterministic but unknown function sometimes approximated by the Hamming weight. For the sake of simplicity, we assume that \(N_i\) is a centered Gaussian RV of standard deviation \(\sigma _i\). We denote by \(L_i\) the RV \(\ell _i(S_i)\) and summarize by \(\varvec{L}\) the tuple \((L_i)_{i \in |[ 0, d |]}\). This definition forbids the modelization of glitches (as those put forward in [37]), that can be unexpected functions of different shares \(S_i\) and \(S_j\) (\(i\ne j\)). But it complies with our strategy of allocating one dedicated hardware resource to each share. This is illustrated by the snapshots from Cadence SOC-Encounter place-and-route tool in Fig. 1. In addition, the usage of block RAMs allows to remove glitches. It is shown on the left-hand side that without placement constraints, the registers that hold the two shares of a first-order masked sbox can be intricated in the circuit, hence being coupled. But if the shares are constrained to be placed at different locations, then they remain clearly separated in space, as shown in the right-hand side of Fig. 1. Incidentally, the physical separation helps to make the device’s leakage to be the sum of the \(L_i\) (linear, no coupling).
When all the shares are manipulated at the same time, the attacker observes \(\mathcal {C}_\text {device}(\varvec{L})\), typically the sum of all the \(L_i\). In this case, \(\mathcal {C}_\text {device}(\varvec{L}) | \varvec{S}\) follows a normal law, of variance equal to
This is equivalent to saying that
where \(N=\sum _{i=0}^d N_i = N \sim \mathcal {N}(0, \sigma ^2)\). After collecting the side-channel leakage, the attacker can apply a pre-processing of her choice before performing any statistical analysis. It is denoted by \(\mathcal {C}_\text {attacker}:\mathbb {R} \rightarrow \mathbb {R}\). Eventually, the exploited leakage is the RV \(\mathcal {C}_\text {total}(\varvec{L})\), which is equal to \(\mathcal {C}_\text {attacker}(\mathcal {C}_\text {device}(\varvec{L}))\). Thus, the function \(\mathcal {C}_\text {total} \doteq \mathcal {C}_\text {attacker} \circ \mathcal {C}_\text {device}\) can be developed as a polynomial in \(\mathbb {R}[L_0, \ldots , L_d] = \mathbb {R}[\varvec{L}]\). The reason is that any pseudo-Boolean function (i.e., a function \(\mathbb {F}_2^n\rightarrow \mathbb {R}\)) can be written uniquely as a multi-linear polynomial. This polynomial takes the form:
where \(\varvec{L}^{\varvec{\alpha }}\) denotes the monomial term \(\prod _{i=0}^{d} L_i^{\alpha _i}\) and \(a_{\varvec{\alpha }}\) is a real coefficient. We recall that the degree \(\text {d}^\circ \) of a multivariate polynomial is defined as:
That is, \(\text {d}^\circ \) is the greatest sum of exponents among all the monomials that make up the polynomial.
2.2 Notation and basics on statistics, with application to HO-CPA
We introduce some notations which will be useful in the sequel. We denote by \(\mu _z\) and \(\sigma ^2_z\) the mean and the variance of the conditional RV \(\mathcal {C}_\text {total}(\varvec{L})\mid Z=z\). We call \(\mu _\text {tot} = \sum _z \mathsf {P}[Z=z] \cdot \mu _z\) the mean of \(\mathcal {C}_\text {total}(\varvec{L})\). The total variance \(\sigma ^2_\text {tot}\) of \(\mathcal {C}_\text {total}(\varvec{L})\) decomposes into the sum of inter- and intra-class variance, denoted by \(\sigma ^2_\text {inter}\) and \(\sigma ^2_\text {intra}\), respectively. Those quantities are defined using the law of total variance as: \(\sigma ^2_\text {inter} \doteq \sum _z \mathsf {P}[Z=z] \cdot {\left( \mu _z - \mu _\text {tot} \right) }^2\) and \(\sigma ^2_\text {intra} \doteq \sum _z \mathsf {P}[Z=z] \cdot \sigma ^2_z\).
In the presence of countermeasures, the central moment \(\mu _i(\mathcal {C}_\text {total}(\varvec{L})\mid Z=z) \doteq \mathbb {E}[\left( \mathcal {C}_\text {total}(\varvec{L}) - \mu _\text {tot}\right) ^i \mid Z=z ]\) of order \(i\) can be constant with respect to \(z\). In practice, the attacker will typically try to compute the moments starting from low orders \(i \ge 1\), because their estimation is less affected by the measurement noise.
2.3 Information-theoretic characterization of the masking
In order to characterize the polynomial \(\mathcal {C}_\text {total}(\varvec{L})\) from an information-theoretic point of view, we introduce the notion of multivariate degree.Footnote 1 The multivariate degree \(\text {d}_\text {alg}\) of a monomial \(\prod _{i=0}^{d} L_i^{\alpha _i}\) is equal to the number of non-zero exponents \(\alpha _i\). We also emphasize that the multivariate degree is smaller than the degree.
We start with the following basic lemma:
Lemma 1
(Soundness and mutual information) If a masking scheme is \(d\)th-order sound, then the mutual information between any monomial in \(\mathbb {R}[\varvec{L}]\), of multivariate degree lower than or equal to \(d\), and \(Z\) is null.
Proof
If a masking scheme is \(d\)th-order sound, then \(\mathsf {I}[Z;(S_i)_{i \in I} ]=0\) if \(\#I \le d\). Now, for any function \(\psi \), \(\mathsf {I}[Z;(S_i)_{i \in I} ] \ge \mathsf {I}[Z;\psi ((S_i)_{i \in I}) ]\). So, if \(\psi \) is taken as a monomial in \((S_i)_{i \in |[ 0, d |]}\) of multivariate degree less than or equal to \(d\), we have \(\mathsf {I}[Z;\psi ((S_i)_{i \in I}) ] \le 0\), hence \(\mathsf {I}[Z;\psi ((S_i)_{i \in I}) ] = 0\). \(\square \)
As a consequence of Lemma 1, for any sound \(d\)th-order masking scheme, every moment of \(\varvec{L}\) of order lower than or equal to \(d\) is constant. Hence, an attacker may attempt to apply the following strategy: choose a \(\mathcal {C}_\text {attacker}\) such that \(\mathcal {C}_\text {total}\) is of multivariate degree strictly greater than \(d\). This result implies that the adversary must choose \(\mathcal {C}_\text {attacker}\) such that the multivariate degree \({d}_\text {alg}(\mathcal {C}_\text {total})\) of \(\mathcal {C}_\text {total}\) is at least \(d+1\), and that the (regular) degree \({d}^\circ (\mathcal {C}_\text {total})\) must not be too high otherwise the SCA attack efficiency decreases [15, 44].
2.4 Our goal
Considering Lemma 1, given \(\mathcal {C}_\text {device}\), it seems sufficient from an attacker viewpoint to choose \(\mathcal {C}_\text {attacker}\) such that \(\mathcal {C}_\text {total}=\mathcal {C}_\text {attacker} \circ \mathcal {C}_\text {device}\) contains at least one term depending on all \(d+1\) leakages \(L_i\). In this paper, we show that it is possible to devise \(d\)th-order masking schemes such that such a term of multivariate degree \(d+1\) of \(\mathcal {C}_{\text {device}}\) does not give enough information on \(Z\) for an attack to succeed, whatever the choice of \(\mathcal {C}_{\text {attacker}}\). In Sect. 5, we give examples where a \(d\)th-order masking scheme
-
is not defeated even if the polynomial \(\mathcal {C}_\text {total}(\varvec{L})\) has a monomial of maximal multivariate degree \(d+1\), hence
-
is not defeated by a \((d+1)\)th-order CPA.
Thus, the relationship between the number of masks and the order of the first HO-CPA to work is not trivial. In the next section, we formally prove this relationship.
3 HO-CPA attacks and HO-CPA immunity
There are two ways to address the security evaluation of a countermeasure [52]:
-
1.
Estimate the efficiency of existing attacks (using metrics such as the success rate or the guessing entropy). Basically, the main higher-order attacks are HO-CPA [45], HO-MIA [2], template attacks [16] and the stochastic attack [51].
-
2.
Leakage estimation with information-theoretic metrics, such as the mutual information between the leakage (observations) and the sensitive data.
In the next Sects. 3.1 and 3.2, we introduce a new metric that jointly covers the strength of the attacks and the amount of leaked information. In the sequel, when mentioning HO-CPA attacks, we mean the univariate attacks that focus on a higher-order moment of the leakage.
3.1 HO-CPA immunity
In this subsection, we define the notion of HO-CPA immunity to quantify the difficulty of an attack.
Definition 2
The HO-CPA immunity of RV \(\mathcal {C}_\text {total}(\varvec{L})\) is the order of the smallest (central) moment of \(\mathcal {C}_\text {total}(\varvec{L})\) which is dependent on \(Z\).
The HO-CPA immunity of \(\mathcal {C}_\text {total}\) is denoted by \(\mathsf {HCI}\) in the following. The minimal value of the HO-CPA immunity is \(1\) and it is reached when the distributions of the RV \(\mathcal {C}_\text {total}(\varvec{L})\mid Z=z\) do not have the same mean when \(s\) varies. This is the case of unprotected circuits, for which a first-order CPA works.
The HO-CPA immunity is larger than or equal to \(2\) when the distributions are balanced (i.e., \(\mu _z = \mu _\text {tot}\) for every \(z\)). In this case, the inter-class variance is null and the total variance \(\sigma ^2_\text {tot}\) is equal to the intra-class variance \(\sigma ^2_\text {intra} = \sum _z \mathsf P [\text {Z=z}] \times \mu _2(\mathcal {C}_\text {total}(\varvec{L})\mid Z=z)\). If the central moments \(\mu _2(\mathcal {C}_\text {total}(\varvec{L})\mid Z=z)\) are not all equal, then \(\mathsf {HCI}=2\) and a second-order CPA using the moments of order \(2\) is possible.
The motivation of the HO-CPA immunity definition is thus straightforward. As argued in Definition 2, all HO-CPA using the moments of order \(i<\mathsf {HCI}\) will fail, because the moments are independent of \(Z\). Thus, the HO-CPA immunity is equal to the smallest order of the moments for which an HO-CPA attack can be successful.
3.2 Link between \(\mathsf {I}[\mathcal {C}_\text {total}(\varvec{L});Z ]\) and the HO-CPA immunity
HO-CPA exploits linear dependencies between the RVs \(\mathcal {C}_\text {total}(\varvec{L})\) and \(Z\).Unless the RVs \(\mathcal {C}_\text {total}(\varvec{L})\mid Z = z\) are identically distributed for every \(z\), the mutual information \(\mathsf {I}[\mathcal {C}_\text {total}(\varvec{L});Z ]\) will be non-zero. There is no such notion of “order” for MIA. Nonetheless, we show in the following theorem that \(\mathsf {HCI}\) is also relevant to quantify the efficiency of a mutual information attack with respect to the leakage noise.
In terms of mutual information, the impact of the noise \(N= \sum _{i \in |[0, d |]} N_i\) is quantified by Theorem 1.
Theorem 1
Let \(\sigma \) denote the standard deviation of the noise \(N\), the mutual information \(\mathsf {I}[\mathcal {C}_\text {total}(\varvec{L});Z ]\) tends towards \(\mathcal {O}\left( \sigma ^{-2 \times \mathsf {HCI}}\right) \) when \(\sigma \) tends towards infinity.
Remark 1
Theorem 1 holds only asymptotically when \(\sigma \) tends to infinity. Nonetheless, as will be noticed in Fig. 5, the relationship between the logarithm of the mutual information between the leakage and the sensitive variable, and the noise standard deviation (\(\sigma \)), starts to be almost affine starting from \(\sigma \ge 4\), i.e., much less that the \(\sigma \approx 14\) found in our Appendix A.
To prove the theorem, we recall the notion of cumulants of the RV \(X\), denoted by \(k_i(X)\), that corresponds to the monomials in the Taylor series of the function
The proof will use the following lemma.
Lemma 2
If \(\mathcal {C}_\text {total}(\varvec{L})\) has an HO-CPA immunity equal to \(\mathsf {HCI}\), then for every \(i\) in \(|[ 0, \mathsf {HCI} |[\) and every \(z\) in \(\mathbb {F}_2^{n}\) we have, \(k_i(\mathcal {C}_\text {total}(\varvec{L})\mid Z=z) = k_i(\mathcal {C}_\text {total}(\varvec{L}))\).
Proof
First of all, we notice that \(\forall i \in |[ 0, \mathsf {HCI} |[\), the cumulants \(k_i(\mathcal {C}_\text {total}(\varvec{L})\mid Z=z)\) are equal for every \(z\in \mathbb {F}_2^n\). The reason is that for any law \(X\), \(k_j(X)\) can be expressed as a function of \(\mu _i(X)\) for \(0 \le i \le j\) (and reciprocally). For instance \(k_3(X)=\mu _3(X)\), \(k_4(X)=\mu _4(X) - 3\mu ^2_2(X)\), \(k_5(X)=\mu _5(X) - 10\mu _3(X)\mu _2(X)\), etc. Generally speaking, the relationship is \(k_i=\mu _i-\sum _{j=1}^{i-1} {i-1 \atopwithdelims ()j-1} \, k_j \, \mu _{i-j}\). Now, according to Definition 2, if \(\mathcal {C}_\text {total}(\varvec{L})\) has HO-CPA immunity \(\mathsf {HCI}\), then all the moments \(\mu _i(\mathcal {C}_\text {total}(\varvec{L})\mid Z=z)\) for \(0 \le i < \mathsf {HCI}\) are independent of \(z\). Consequently, the same holds for the cumulants of orders \(i \in |[0, \mathsf {HCI} |[\). Eventually, as \(\forall i <\mathsf {HCI}\), \(\forall z\), \(\mu _i(\mathcal {C}_\text {total}(\varvec{L})\mid Z=z) = \mu _i(\mathcal {C}_\text {total}(\varvec{L}))\), we also have \(k_i(\mathcal {C}_\text {total}(\varvec{L})\mid Z=z) = k_i(\mathcal {C}_\text {total}(\varvec{L}))\). \(\square \)
Besides, we need also to make use of this lemma.
Lemma 3
(Known as “small cumulant approximation to the Kullback–Leibler divergence”) Let \(P\) and \(Q\) be two distributions close to the standard normal distribution. Then, the small cumulant approximation to the Kullback–Leibler divergence \(\mathsf {D}_\mathsf {KL}\left[P;Q \right]\) writes
Proof
(Sketch) This lemma has been proved by Cardoso in Section 4 at page 1194 of [10]. The proof relies on the Gram–Charlier expansion around a standard normal reference distribution. In the demonstration of Cardoso, multivariate distributions for \(P\) and \(Q\) are considered; \(P\) and \(Q\) are \(n\)-dimensional probability density functions close to an \(n\)-dimensional standard Gaussian. Therefore, it involves cross-cumulants, i.e., cumulants between at least two different random variables. In our case, \(P\) and \(Q\) are scalar, thus cross-cumulants simplify to cumulants, as defined in Eq. (1). \(\square \)
We give hereafter the proof of Theorem 1.
Proof
The mutual information between \(\mathcal {C}_\text {total}\left( \varvec{L}\right) \) and \(Z\) can be computed as follows:
Under the Gaussian assumption, \(\mathcal {C}_\text {total}\left( \varvec{L}\right) \) is distributed like \(\mathcal {N}(\mu _\text {tot}, \sigma _\text {tot}^2 + \sigma ^2)\) and \((\mathcal {C}_\text {total}\left( \varvec{L}\right) |Z=z)\) like \(\mathcal {N}(\mu _z, \sigma _\text {z}^2 + \sigma ^2)\).
We distinguish two cases: \(\mathsf {HCI} \le 2\), and \(\mathsf {HCI} > 2\).
Case \(\mathsf {HCI} \le 2\). The Kullback–Leibler divergence of two Gaussians \(P_j \sim \mathcal {N}({\mu _j}, \sigma _j^2)\) (\(j\in \{1,2\}\)) has an analytic closed form; it is equal to
Therefore, the mutual information (refer to Eq. (3)) is equal to
Indeed, by the law of total variance, \(\sigma _\text {inter}^2+\sigma _\text {intra}^2=\sigma _\text {tot}^2\). The logarithm in Eq. (4) can be developed by a Taylor expansion at order \(2\), using \(\log _2(1+\epsilon ) = \frac{1}{\ln 2} \left( \epsilon - \epsilon ^2/2\right) + \mathcal {O}(\epsilon ^3)\), when \(\epsilon = \frac{1}{\sigma ^2} \rightarrow 0^+\). Consequently
Thus,
-
if \(\mathsf {HCI}=1\), then \(\sigma _\text {inter}^2 \ne 0\), and thus \(\mathsf {I}[\mathcal {C}_\text {total}\left( \varvec{L}\right) ;Z ]\) tends to zero as \(1/\sigma ^2\) (the dominant term is proportional to \(\sigma _\text {inter}^2/\sigma ^2\));
-
if \(\mathsf {HCI}=2\), then \(\sigma _\text {inter}^2 = 0\), and thus \(\mathsf {I}[\mathcal {C}_\text {total}\left( \varvec{L}\right) ;Z ]\) tends to zero as \(1/\sigma ^4\) (because \(\sum _z \mathsf {P}[z ] \sigma _z^4 - \sigma _\text {tot}^4 \ne 0\)).
Case \(\mathsf {HCI} > 2\). With the condition that \(\mathcal {C}_\text {total}\left( \varvec{L}\right) |Z=z\) are close enough to standard normal distributions, the Kullback–Leibler divergence between \(\mathcal {C}_\text {total}\left( \varvec{L}\right) |Z=z\) and \(\mathcal {C}_\text {total}\left( \varvec{L}\right) \) can be expanded according to Lemma 3.
Notice that the cumulants of the RV \(\mathcal {C}_\text {total}\left( \varvec{L}\right) \) (resp. \((\mathcal {C}_\text {total}\left( \varvec{L}\right) |Z=z)\)) and of the RV \(\mathcal {C}_\text {total}\left( \varvec{L}\right) +N\) (resp. \((\mathcal {C}_\text {total}\left( \varvec{L}\right) |Z=z)+N\)) are the same at any order strictly greater than two, and are the sum of the variance, i.e., \(\sigma _\text {tot}^2 + \sigma ^2\) (resp. \(\sigma _\text {z}^2 + \sigma ^2\)) at order two. The reason is that the noise is independent on \(\mathcal {C}_\text {total}\left( \varvec{L}\right) \) (resp. \((\mathcal {C}_\text {total}\left( \varvec{L}\right) |Z=z)\)), and assumed Gaussian (i.e., whose cumulants of order strictly greater than two are all zero).
Before using Lemma 3, the distributions shall be standardized. Under the assumption of \(2\)nd-order resistance (\(\mathsf {HCI}>2\)), we have \(\forall z, \sigma _z^2 = \sigma _\text {tot}^2\). Thus, \(\mathcal {C}_\text {total}\left( \varvec{L}\right) \) and \((\mathcal {C}_\text {total}\left( \varvec{L}\right) |Z=z)\) have identical variance. Now, the Kullback–Leibler divergence is invariant by a common scaling (specifically, a scaling by factor \(1/\sqrt{\sigma _\text {tot}^2 + \sigma ^2}\)). This means that, for all \(z\):
Besides, it is well known that the \(i\)th cumulant is homogeneous of degree \(i\), i.e., if \(c\) is any constant, \(k_i(c X) = c^i k_i(X)\).
So, by plugging Lemma 3 into Eq. (3), we get an expression of the mutual information between \(\mathcal {C}_\text {total}\left( \varvec{L}\right) \) and \(Z\) as a series:Footnote 2
Notice that the index \(i\) starts at \(3\) because the cumulants are balanced when \(i<\mathsf {HCI}\). Then, according to Lemma 2, the first non-zero term in the summation in (6) is at index \(i=\mathsf {HCI}\). So,
Indeed, when \(\sigma \rightarrow +\infty , \sigma _\text {tot}^2 + \sigma ^2 \approx \sigma ^2\). This proves Theorem 1. \(\square \)
Our main interest in Theorem 1 is that it gives the dependence among the leakage, the noise variance \(\sigma ^2\) and the \(\mathsf {HCI}\) order. It shows that the higher \(\mathsf {HCI}\), the less information is leaked by the device.
We notice that a recent paper by Grosso et al. at CARDIS 2013 [25] has empirically illustrated Theorem 1 on simulations.
4 State-of-the-art about masking optimizations when the leakage model is approximately known
4.1 Motivations
The modern description of masking schemes actually puts the emphasis on the way the shares are split and processed. For example, Boolean additive masking [24] gets its security from an information-theoretic argument similar to that employed to prove the unconditional security of the Vernam cipher. However, Boolean and bitwise masking is only suited to operations in \((\mathbb {F}_2^n, \oplus )\), hence the invention later on led to the use of the multiplicative masking [1] (when a product appears in the algorithm and the data are non-zero) and of the homographic masking [19] (when inversion is necessary, along with addition and multiplication). Many other sharings exist (affine [22], polynomial [23, 47], threshold implementation [41], etc.), depending on the usage constraints.
Those sharings can be expensive to implement. Thus, several trade-offs are encountered in the literature. One of them consists in reducing the entropy of the masks, while keeping a security against \(d\)th-order attacks. This strategy of using depleted masks is presented in [39]. It is proved that the masking resists first- and second-order attacks if the masks are chosen as a subset \(\mathcal {M} \subseteq \mathbb {F}_2^n\) such that the indicator of \(\mathcal {M}\) is \(2\text {nd}\)-order correlation-immune [11, Chap. 7]. We recall that the indicator of a subset \(\mathcal {M}\subseteq \mathbb {F}_2^n\) is the Boolean function \(\mathsf {1}_\mathcal {M}\), defined on \(\mathbb {F}_2^n\) as
and that a Boolean function is \(d\)th-order correlation-immune if its output distribution is unchanged by fixing up to \(d\) input bits. A concrete architecture that implements this masking scheme is presented in [40]: every sensitive variable is masked by an element from \(\mathcal {M}\). Incidentally, the results have been extended to arbitrary order in [3].
Another direction aims at keeping the entropy of the masks full, but attempts to encode the shares to further reduce the leakage. An example (illustrated in Fig. 2) for this operation consists in applying a function \(B_i\) independently to each share \(S_i\). This choice is termed “leakage squeezing” and further developed in Sect. 4.2. Another option is to encode all the shares together. This is implemented by the “wiretap codes” countermeasure, described in Sect. 4.3, and by the “leak-free” countermeasure presented in Sect. 4.4. The only requirement is that the encoding function is invertible, as at the end of the computation (and certainly also during the computations) the unencoded shares must be recovered. At the output, an attacker attempts to extract information on \(Z\) from the leakage. In Fig. 2, the leakage function is depicted as “scalar”. However, generally speaking, the leakage function \(\mathcal {C}_\text {device}\) can be vectorial. The attacker measures this leakage, and applies this attack strategy: the measured leakage is raised at successive powers \(i\) until it starts to depend on \(Z\). This test allows to build a distinguisher: the variations between \(\mathcal {C}_\text {total}\) and \(Z\) are non-zero when \(i=\mathsf {HCI}\).
If the sharing is \(d\)th-order secure, then the attacker must use a post-processing function (e.g., a power function) of order at least \(d+1\). If in addition the encoding of the shares is effective, the post-processing function must be of greater order.
Let us consider an example that illustrates the benefits of the encoding stage in Fig. 2. In the case of Boolean masking with one mask \(M\), the sensitive variable is split into \(S_0=Z \oplus M\) and \(S_1=M\). Let us also assume that the leakage function consists in adding all the bits of the processed variable (Hamming weight model). Then, the attacker measures \(\mathsf {HW}( Z \oplus M, M ) = \sum _{i=1}^n (Z \oplus M)_i + M_i\). In this expression, it appears that the mask almost cancels. Indeed, on \(n=1\) bit, the leakage is \((Z \oplus M)_1 + M_1 = 2 \times (Z_1 \oplus M_1) \wedge M_1 + (Z_1 \oplus \) ) \(\oplus \) .
We review in the rest of this section the state-of-the-art about encoding methods. They can be seen as pre-processing methods on the shares, aiming at reducing the overall degree of the function \(\mathcal {C}_\text {total}\).
4.2 Leakage squeezing
The leakage squeezing has the objective to increase the \(\mathsf {HCI}\) value of \(\mathcal {C}_\text {device}\). It consists in encoding each share separately. Thus, \(d+1\) functions \(B_i: \mathbb {F}_2^n \rightarrow \mathbb {F}_2^n\) are applied to the \(d+1\) additive shares \(S_i\) of \(Z\). The functions must be bijective to recover the plain shares for the computation (during the algorithm and at the final demasking).
When the device leaks the sum of the shares, then the deterministic part of the leakage function \(\mathcal {C}_\text {total}(\varvec{L})\) is the overall sum of the Hamming weight of the shares: \(\mathsf {HW}( B_0(S_0), \ldots , B_d(S_d) )\). More realistically, on hardware platforms, this deterministic part involves the distances between the values carried by registers, i.e.,
Interestingly enough, when the bijections are linear, then the two cases can be treated in a common framework:
-
1.
In the Hamming weight model, the distinguisher is the correlation coefficient with the first non-zero moment of
$$\begin{aligned} \mathsf {HW}( B_0(S_0), \ldots , B_d(S_d) )|Z; \end{aligned}$$ -
2.
In the Hamming distance model, the distinguisher is the correlation coefficient with the first non-zero moment of
$$\begin{aligned}&\mathsf {HW}( B_0(S'_0 \oplus S_0), \ldots , B_d(S'_d \oplus S_d) )|(Z' \oplus Z) \nonumber \\&\quad =\mathsf {HW}( B_0(S''_0), \ldots , B_d(S''_d) )|Z'' , \end{aligned}$$where for any random variable \(X\), the random variable \(X''\) is the difference between the two consecutive values, \(X'\) and \(X\), i.e., \(X''=X' \oplus X\).
Clearly, this encoding has the potential to increase the resistance of the additive Boolean sharing: indeed, the sharing without encoding is a special case of the leakage squeezing, where all the bijections are equal to the identity. A proof of concept (linear \(B_i\)) and some implementation results were the topic of the first publication about leakage squeezing at WISTP 2011 [33]. In subsequent publications, the leakage squeezing has been studied from a mathematical standpoint. For the sake of simplification, they all assume that one share (say the first one) is processed without a bijection (\(B_0\) is the identity), and that the other shares are encoded. The bijections \(B_i\) can be public; nonetheless, if they are not, a correlation attack will be more complex.
The paper [31] analyzes the leakage with two shares, explicitly finds bounds for the resistance against high-order attacks, and provides the corresponding \(B_1\) bijection. The countermeasure is depicted in Fig. 3. The memory \(C\) implements the cryptographic function (e.g., the substitution box) and \(R\) the mask refresh. The optimal solution is a bijection whose graph has the highest possible correlation immunity (see definition in [8]). This problem comes down to the identification of rate codes of maximal dual distance. Such codes, called Complementary Information Set (CIS), have been studied in detail in [13]. In Sect. 5, we focus on the solutions that are linear. Indeed, they are easier to compute, and, as already discussed, cover both Hamming weight and Hamming distance leakage models. The paper [31] also analyzes the effects of the models imperfections (including cross-coupling): e.g., when the squeezing is done under the Hamming weight or the Hamming distance assumption but the device actually leaks differently.
It is remarkable how the leakage squeezing is resilient to such imperfections: it still reduces the leakage even if the imperfections represent up to \(50\) % of the expected leakage. The performance of leakage squeezing is illustrated in [33] where two FPGA implementations of the leakage squeezing countermeasure are studied on DES. The proposed implementations have been tested in a StratixII FPGA which is based on Adaptative LUT Module (ALM) cell. They have been compared with non-protected DES and masked without any leakage squeezing. Table 1 summarizes the memories needed for each implementation and the estimated throughput. The ALMs are used for the combinational gates that implement linear operations. The memories are used for the non-linear operations. It can be seen that \(32\) BRAMs are necessary, and fully employed (\(32 \times 4~\text {Kbit} = 131072~\text {bit}\)). These results show that the leakage squeezing method on hardware implementations has little impact on complexity and speed.
Eventually, the study of the leakage squeezing in the case where the sensitive variable is split into three shares is conducted in [12]. It shows that couples of bijections \((B_1,B_2)\) can be found jointly, that are better than any upgrade from the optimal solution for 1–2 masks. Indeed, one can start to improve on top of a first-order leakage squeezing to get a second-order leakage squeezing scheme; but this approach is proved to be sub-optimal w.r.t. the direct consideration of second-order leakage squeezing.
4.3 Wire-tap codes
The countermeasure using wire-tap codes has been presented in [7]. Its objective is to prevent an attacker from recovering the information on the sensitive variable even if she can recover some bits of the encoded sensitive variable. The attacker model is thus original, because it is assumed that either the attacker is able to probe some wires or she can use very accurate magnetic probes (see [28]).
To achieve this goal, some random bits \(M\) (\(p\) bits) are collated to the sensitive variable \(Z\) (\(n\) bits). The complete wire-tap protected signal has length \(q=n+p\). The protection consists in
-
encoding the mask \(M\) thanks to linear code \(C\) of parameters \([q,p,d]\),
-
expanding the sensitive data \(Z\) on \(q\) bits by computing a linear combination (parity matrix), and
-
XORing the two parts, which yields the protected data \(z \in \mathbb {F}_2^q\).
Thus, if \(G\) is a generating matrix of \(C\) and \(H\) is the corresponding parity matrix, then the transformation in the masked representation is
Then, it can be proved (Lemma 1 in [7]) that this representation resists unconditionally to the leakage of up to \(d^\perp -1\) bits (including), when \(d^\perp \) is the dual distance of \(C\). However, this paper does not link this result to the resistance against high-order side-channel attacks. Similar to the leakage squeezing, the wire-tap need not be private; obviously, the scheme will be more secure if it is secret.
4.4 Leak-free masking
The leak-free masking allows to completely cancel any univariate leakage, i.e., to zero the mutual information between the leakage and the sensitive variables. (considered individually, i.e., not in distance w.r.t. a previous state). It has been introduced in [34], and an example of implementation is discussed in [35]. This pre-processing requires to fulfill two conditions for this encoding to apply:
-
1.
The leakage must depend on the previous and on the current values of the manipulated variables, and must be invariant in the exchange of previous and current values.
-
2.
The two shares must not interfere in the leakage, i.e., no product terms between (all or part) of the masked variable and (all or part) of the mask shall exist.
The countermeasure consists in defining a processing for one share (e.g., the mask) that does not leak, whereas the other share is perfectly masked.
The construction demands that
-
\(M'=M \oplus \alpha \), for a non-zero constant \(\alpha \): this allows to have a constant leakage in distance, because \(M' \oplus M=\alpha \), which does not depend on the mask; Notice that a fresh mask \(M\) is chosen randomly at every new encryption.
-
The sensitive variable be XORed with a function \(F\) of the masking, so that its leakage \((Z' \oplus F(M')) \oplus (Z \oplus F(M))\) is independent of \(Z' \oplus Z\). In these equations, \(Z'\) is the next value taken by \(Z\) (refer to Fig. 4). This is achievable provided that the derivative \(D_\alpha (F)\) of \(F\) in \(\alpha \) is balanced. Indeed, \((Z' \oplus F(M')) \oplus (Z \oplus F(M)) = Z'' \oplus D_\alpha F(M)\).
The scheme is sketched in Fig. 4. The function \(F\) is not necessarily invertible. However, the requirement is that the overall encoding be invertible. Here, the function \((Z,M) \mapsto (Z \oplus F(M), M)\) must be invertible.
In theory this construction cancels the mutual information between the leakage model and the sensitive variable. However, it holds only if the leakage is indeed symmetrical with respect to the exchange between the previous and the current values and the two shares do not interfere. A small asymmetry according to the time or a small coupling between the two shares will create a dependency with the sensitive variable. As this variable is not encoded (with a code, like in the leakage squeezing), the dependency will leak the secret. This is confirmed by the simulations in [31]. Also, real experiments show that if a little part of the leakage obeys the Hamming weight (total asymmetry in the relationship between previous and current value), then the countermeasure can be broken at order \(2\) [38].
5 Concrete example of HCI increase
5.1 Formulation and results in the perfect model
In this section, we discuss a case study that illustrates the leakage squeezing with a linear bijection. The considered example applies to AES, where the data are manipulated by bytes (hence \(n=8\)). We also focus on the usage of a single mask (\(d=1\)).
For this countermeasure, the combination function \(\mathcal {C}_\text {total}\) of \(\mathcal {C}_\text {device}\) with an univariate \(i^\text {th}\) order CPA adversary part applying a processing \(\mathcal {C}_\text {attacker} = (\,\cdot \,)^i\) can be expressed (for hardware devices) as
The defender searches for a good bijection \(B_1\), denoted simply by \(B\). From the attacker standpoint, the univariate HO-CPA succeeds if and only if \(\mathbb {E}[\mathcal {C}_\text {total} | Z=z ]\) depends on \(z\in \mathbb {F}_2^n\). So, conversely, if \(z \mapsto \mathbb {E}[\mathcal {C}_\text {total} | Z=z ]\) is constant, the HO-CPA fails. Now, \(\mathcal {C}_\text {total}\left( L_0, L_1 \right) \) is a polynomial of \(L_0\) and \(L_1\). The terms in the polynomial write \(L_0^p \times L_1^q\), where \((p,q) \in \mathbb {N}^2\) are the exponents of each leakage in each term.
We can write
By definition of HCI, for all exponents that satisfy \(p+q < \mathsf {HCI}\),
The goal for the designer is to choose the bijection \(B\) that maximizes HO-CPA immunity, i.e., that meets Eq. (9) for the largest possible HCI. In [31], it is shown that the best bijection \(B\) is such that its graph \((Z, B(Z))\) is of maximal dual distance. For \(n=8\), such graph \(C = (Z, B(Z))\) can be deduced from the linear code \([16,8,5]\) that is autodual. More precisely, the graph is the indicator of this code. By writing the code \(C\) in a systematic form, i.e., the codewords are listed in a matrix \((I_n, G)\), we have that \(B\) is a linear function generated by \(G\). This leakage squeezing protects against high-order CPA of any order up to \(4\).
Besides, in [7], it is shown that the best wire-tap code with a mask of size \(p=n\) can be built from a linear code of characteristics \([q,p,d] = [2n,n,d] = [16,8,d]\) of greatest dual distance. Thus, the same code \([16,8,5]\) can be used. It protects against all attackers that are able to probe up to \(4\) bits. To see the similarity between leakage squeezing and wire-tap masking in the particular case of linear functions and with the use of a single mask, we apply the simplifications suggested in [7]:
-
\(L\) is \((I_n, 0)\), and
-
\(G\) is written in systematic form as \((\Gamma ^\mathsf {T}, I_{q-n=p})\). Let us assume (which is beyond the assumptions made in [7]) that the matrix \(\Gamma \) is invertible.
Then, Eq. (7) rewrites
Thus, by replacing the uniformly distributed mask \(M\) (on \(\mathbb {F}_2^n\)) by \((\Gamma ^\mathsf {T})^{-1} M = B(M)\), the sensitive variable encoded in wire tap is also the couple \((X \oplus M, B(M))\), as in the leakage squeezing.
So, with one mask of the same size as the sensitive data, the linear functions that allow to resist to
-
attacks of the highest order, and
-
the probing with the highest number of probes,
are the same. Two security objectives are achieved with a single linear code.
5.2 Resistance in the imperfect model
Remark 2
The fact mentioned in Sect. 4.2 that leakage squeezing resists model imperfections can be proved. If for instance \(L_0=\mathsf {HW}( Z \oplus M )^2+N_0\) instead of \(L_0=\mathsf {HW}( Z \oplus M )+N_0\) (i.e., there is already a mixture of bits owing to the device), then the new resistance order, noted \(\mathsf {HCI}'\), is equal to
It can be proved (see below) that \(\mathsf {HCI}'=\lfloor \frac{d+1}{2} \rfloor \); thus, a security margin still exists.
Proof
Equation (10) can be equivalently reformulated as
\(\square \)
6 Security evaluation of the countermeasure
For our simulations, we still consider the hardware case, where all shares leak simultaneously (i.e., \(\mathcal {C}_\text {device}(\varvec{L}) = \sum _{i=0}^d L_i\)).
6.1 Security analysis
Lemma 2 in [50] proves that, without leakage squeezing, a hardware Boolean masking countermeasure with \(d\) masks has HO-CPA immunity \(\mathsf {HCI}=d+1\), and thus protects against \(d\text {th}\)-order CPA. This is illustrated for \(n=4\) in first five groups of Table 2, that correspond to \(d \in |[0, 4 |]\). For these simulations, we consider \(L_i=\mathsf {HW}( S_i )\), without noise. In this table, the number of lines in gray is equal to \(\mathsf {HCI}-1\) (Definition 2).
Let us define the linear bijection \(B\) defined by its matrix \(\mathtt{ni4 } \doteq \overline{I_4}\):
Using this linear bijection, we summarize in Table 2 (the last group) the security improvement brought by the leakage squeezing. Now, for these simulations, we take \(L_0=\mathsf {HW}( S_0 )\) and \(L_1=\mathsf {HW}( B(S_1) )\). The results in the table show that the leakage squeezing allows to improve the HO-CPA immunity by 2 U without adding extra masks.
Example 1
The resistance when the leakage model is imperfect (refer to Remark 2) can also be illustrated on an example using \(B: x \mapsto \mathtt{ni4 } \times x\) and a combination function that mixes the two shares by an exclusive or (of degree \(2\)). This can happen in a software implementation where one register would successfully contain one share, then the other. In this case, the leakage is a function of \((z \oplus M) \oplus (B(M))\), which is thus equal to \(Z \oplus \left( \begin{array}{llll} 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 \\ 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 \\ 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 \\ 1 &{}\quad 1 &{}\quad 1 &{}\quad 1\\ \end{array}\right) \times M\). Of course, \(\left( \begin{array}{llll} 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 \\ 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 \\ 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 \\ 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 \\ \end{array}\right) \left( \mathbb {F}_2^4\right) = \left\{ \left( \begin{array}{llll} 0&{}\quad 0 &{}\quad 0 &{}\quad 0\\ \end{array}\right) ^\mathsf {T}, \left( \begin{array}{llll} 1 &{}\quad 1 &{}\quad 1 &{}\quad 1\\ \end{array}\right) ^\mathsf {T} \right\} \).
But in the image, there are \(8\) vectors \((\begin{array}{llll} 0 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ \end{array})^\mathsf {T}\) and \(8\) vectors \((\begin{array}{llll} 1 &{}\quad 1 &{}\quad 1 &{}\quad 1\\ \end{array})^\mathsf {T}\), i.e., they are balanced. Thus, the expectation (on RV \(M\)) of any affine function of \((z \oplus M) \oplus (B(M))\) will not depend on \(z\).
This result is a direct application of Remark 2, since \(\lfloor \frac{d+1}{2} \rfloor = 2\).
6.2 Information-theoretic evaluation of the countermeasure
In this section, our purpose is to quantify the amount of information that the countermeasure reveals about the sensitive variable \(Z\). To achieve this goal, we follow the information-theoretic approach introduced in [52]. Namely, we compute the mutual information between the sensitive variable \(Z\) and the leakage function of Eq. (8), where \(N=N_0+N_1\) is an Additive White Gaussian Noise (AWGN) of standard deviation \(\sigma \). In our simulation, we use the bijection \(\overline{I_4}\) (called ni4). For comparison purpose, we proceed the same for high-order Boolean masking. The mutual information of the leakage squeezing hardware implementation is represented in Fig. 5. The curves in this figure have been obtained by computing the mutual information as the integral of distributions (mixtures of Gaussians). The data type is double and the integration software is the GNU “contrib_adaptint” by Steven G. Johnson, with accuracy set to \(1.14 \times 10^{-14}\). This accuracy is a bit less than \(2^{-46}\), hence the vertical scale of Fig. 5.
This first analysis allows us to observe that the gain is high when the leakage squeezing is applied, because the mutual information leaked is less than without the countermeasure whatever the SNR. Typically, our simulations confirm theoretical predictions of Theorem 1. As a corollary, \(\mathsf {I}[\mathcal {C}_\text {total}(\varvec{L});Z ] \doteq \text {MIM} = \mathcal {O}(1/\sigma ^8)\) for first-order masking with leakage squeezing (\(\mathsf {HCI} = 4\)), whereas \(\text {MIM} = \mathcal {O}(1/\sigma ^4)\) for first-order masking without (\(\mathsf {HCI}= 2\)).
Taking advantage of the leakage squeezing principle, the quantity of information leaked with one sole mask is almost the same of the third-order masking without the need of adding extra masks. It is in this respect that we describe the leakage squeezing as a masking scheme that gets the most out of a single mask.
An illustration of leakage squeezing is given in Figs. 6 and 7. The goal of the designer (in green) is to increase the order of the information-theoretic (IT) attacks and of the HO-CPA. At the opposite, the goal of the attacker (in red) is to decrease the order of the attacks. Without leakage squeezing (\(B_0=B_1=\cdots =B_d=I_n\), the identity function, cf. Fig. 6), the \(\mathsf {HCI}\) coincides with the algebraic degree of the attack. However, with the leakage squeezing (cf. Fig. 7), the lowest degree of the working HO-CPA, namely \(\text {d}^\circ (\mathcal {C}_\text {total})\) can be 2 U greater than \(\text {d}_\text {alg}(\mathcal {C}_\text {total})\) (on the example of \(d=1\) and \(n=4\), with \(B_0=I_4\) and \(B_1=\overline{I_4}\)).
7 Conclusions and perspectives
In this paper, we have investigated high-order masking countermeasure against side-channel attacks, in the context of FPGAs where the computation is implemented as table lookups in block RAMs. We have shown that the minimal attack order (the HO-CPA immunity, or \(\mathsf {HCI}\)) relates to the amount of leakage. Then, we presented a method called leakage squeezing which aims at raising the HO-CPA immunity. This method consists in using bijective encodings which are applied on the masking shares. Our evaluation analysis shows that this technique provides a great security robustness against HO-CPA: without leakage squeezing, \(\mathsf {HCI}=d+1\), whereas with leakage squeezing, \(\mathsf {HCI}>d+1\). For instance, we characterize linear bijections, that allow to reach \(\mathsf {HCI}=4\) with only \(d=1\) mask when the sensitive variable is a nibble. The robustness is corroborated by an information-theoretic analysis of the leakage. Indeed, at a given cost and performance level, we show that the leakage squeezing with linear bijections is as efficient as adding one or two other masks.
As a perspective, we intend to extend the research for bijections where more than two masks are used. A recent paper by Grosso et al. [25] suggests that leakage squeezing could be also efficient against higher-order attacks when the several shares (two in their article) are leaking at different dates. We intend to formalize the reason why leakage squeezing also increases the \(\mathsf {HCI}\) in this context.
Notes
In the context of polynomials in variables \(L_0, \ldots , L_d\) over the field \(\mathbb {K}\) (e.g., \(\mathbb {K}=\mathbb {R}\)), our definition of multivariate degree coincides with the “usual” degree of polynomials in the algebra \(\mathbb {K}[L_0, \ldots , L_d]/(\prod _{i=0}^d L_i^2-L_i)\), also called sometimes the algebraic degree.
References
Akkar, M.-L., Giraud, C.: An Implementation of DES and AES Secure against Some Attacks. In LNCS (ed) Proceedings of CHES’01, vol. 2162 of LNCS, pp. 309–318. Springer, Berlin (2001)
Batina, L., Gierlichs, B., Prouff, E., Rivain, M., Standaert, F.-X., Veyrat-Charvillon, N.: Mutual information analysis: a comprehensive study. J. Cryptol. 24(2), 269–291 (2011)
Bhasin, S., Carlet, C., Guilley, S.: Theory of masking with codewords in hardware: low-weight \(d\)th-order correlation-immune Boolean functions. Cryptology ePrint Archive, Report 2013/303, 2013. http://eprint.iacr.org/2013/303/
Bhasin, S., Danger, J.-L., Guilley, S., Najm, Z.: NICV: normalized inter-class variance for detection of side-channel leakage. Cryptology ePrint Archive, Report 2013/717, 2013. http://eprint.iacr.org/2013/717
Bhasin, S., Guilley, S., Heuser, A., Danger, J.-L.: From cryptography to hardware: analyzing and protecting embedded xilinx bram for cryptographic applications. J. Cryptogr. Eng. 3(4), 213–225 (2013)
Brier, E., Clavier, C., Olivier, F.: Analysis, correlation power, with a leakage model. In: CHES, vol 3156 of LNCS, pp. 16–29. August 11–13, Cambridge, MA. Springer, Berlin (2004)
Bringer, J., Chabanne, H., Le, T.-H.: Protecting AES against side-channel analysis using wire-tap codes. J. Cryptogr. Eng. 2(2), 129–141 (2012)
Camion, P., Carlet, C., Charpin, P., Sendrier, N.: On correlation-immune functions. In: Feigenbaum, J. (ed) CRYPTO, Lecture Notes in Computer Science, vol. 576, pp. 86–100. Springer, Berlin (1991)
Cardoso, J.-F.: High-order contrasts for independent component analysis. Neural Comput. 11(1), 157–192 (January 1999)
Cardoso, Jean-François: Dependence, correlation and gaussianity in independent component analysis. J. Mach. Learn. Res. 4, 1177–1203 (2003)
Carlet, C.: Boolean functions for cryptography and error correcting codes. In: Crama, Y., Hammer, P. (eds) Chapter of the Monography Boolean Models and Methods in Mathematics, Computer Science, and Engineering, pp. 257–397. Cambridge University Press, Cambridge. Preliminary version available at http://www.math.univ-paris13.fr/carlet/chap-fcts-Bool-corr.pdf (2010)
Carlet, C., Danger, J.-L.: Sylvain Guilley, and Houssem Maghrebi. Leakage Squeezing of Order Two. In INDOCRYPT, vol. 7668 of LNCS, pp. 120–139. Springer, Berlin (2012)
Carlet, C., Gaborit, P., Kim, J.-L., Solé, P.: A new class of codes for boolean masking of cryptographic computations. IEEE Trans. Inf. Theory 58(9), 6000–6011 (2012)
Carlet, C., Goubin, L., Prouff, E., Quisquater, M., Rivain, M.: Higher-order masking schemes for S-Boxes. In: FSE, Lecture Notes in Computer Science. Springer, Berlin (2012)
Chari, S., Jutla, C.S., Rao, J.R., Rohatgi, P.: Approaches, towards sound, to counteract power-analysis attacks. In: CRYPTO, vol. 1666 of LNCS. Springer, Berlin (1999). ISBN 3-540-66347-9
Chari, S., Rao, J.R., Rohatgi, P.: Template attacks. In: CHES, vol. 2523 of LNCS, pp. 13–28. Springer, Berlin (2002)
Coron, J.-S.: Higher order masking of look-up tables. Cryptology ePrint Archive, Report 2013/700. 2013. http://eprint.iacr.org/
Jean-Sébastien Coron, Emmanuel Prouff, and Matthieu Rivain. Side Channel Cryptanalysis of a Higher Order Masking Scheme. In CHES, vo. 4727 of LNCS, pp. 28–44. Springer, Berlin
Courtois, N., Goubin, L.: An algebraic masking method to protect AES against power attacks. In: Won, D., Kim, S. (eds) ICISC, vol. 3935 of Lecture Notes in Computer Science, pp. 199–209. Springer, Berlin (2005)
Drimer, S., Güneysu, T., Paar, C.: DSPs, BRAMs, and a pinch of logic: Extended recipes for AES on FPGAs. ACM Trans. Reconfig. Technol. Syst. 3(1), 1–27 (2010). doi:10.1145/1661438.1661441
Fischer, W., Gammel, B.M.: Masking at gate level in the presence of glitches. In: CHES, vol. 3659 of Lecture Notes in Computer Science, pp. 187–200. Springer, Berlin (2005)
Fumaroli, G., Martinelli, A., Prouff, E., Rivain, M: Affine masking against higher-order side channel analysis. In: Biryukov, A., Gong, G., Stinson, D.R. (eds) Selected Areas in Cryptography, vol. 6544 of LNCS, pp. 262–280. Springer, Berlin (2010)
Goubin, L., Martinelli, A.: Protecting AES with Shamir’s Secret Sharing Scheme. In: Preneel and Takagi [42], pp. 79–94
Goubin, L., Jacques P.: DES and differential power analysis. The “Duplication” Method. In: CHES, LNCS, pp. 158–172. Springer, Berlin (1999)
Grosso, V., Standaert, F.-X., Prouff, E.: Leakage squeezing, Revisited. In: CARDIS, Lecture Notes in Computer Science. Springer, Berlin (2013)
Guilley, S., Carlet, C., Maghrebi, H., Danger, J.-L., Prouff, E.: Leakage squeezing–defeating instantaneous \((d+1)\)th-order correlation power analysis with strictly less than \(d\) masks. In: CryptArchi, June 19–22 2012. Château de Goutelas, Marcoux, France; (abstract)
Güneysu, T., Moradi, A.: Generic side-channel countermeasures for reconfigurable devices. In: Preneel and Takagi [42], pp. 33–48
Heyszl, J., Mangard, S., Heinz, B., Stumpf, F., Sigl, G.: Localized electromagnetic analysis of cryptographic implementations. In: Dunkelman, O. (ed) CT-RSA, vol. 7178 of Lecture Notes in Computer Science, pp. 231–244. Springer, Berlin (2012)
Kocher, P.C., Jaffe, J., Jun, B.: Differential power analysis. In: Wiener, M.J. (ed.) CRYPTO. Lecture Notes in Computer Science, vol. 1666, pp. 388–397. Springer, Berlin (1999)
Le, T.-H., Berthier, M.: Mutual information analysis under the view of higher-order statistics. In: Echizen, I., Kunihiro, N., Sasaki, R. (eds) IWSEC, volume 6434 of LNCS, pp. 285–300. Springer, Berlin (2010)
Maghrebi, H., Carlet, C., Guilley, S., Danger, J.-L.: Optimal first-order masking with linear and non-linear bijections. In: Mitrokotsa, A., Vaudenay, S. (eds) AFRICACRYPT, vol. 7374 of Lecture Notes in Computer Science, pp. 360–377. Springer, Berlin (2012)
Maghrebi, H., Guilley, S., Carlet, C., Danger, J.-L.: Classification of high-order boolean masking schemes and improvements of their efficiency. Cryptology ePrint Archive, Report 2011/520, September 2011. http://eprint.iacr.org/2011/520
Maghrebi, H., Guilley, S., Danger, J.-L.: Leakage squeezing countermeasure against high-order atacks. In: WISTP, vol. 6633 of LNCS, pp. 208–223. Springer, Berlin (2011). doi:10.1007/978-3-642-21040-2_14
Maghrebi, H., Prouff, E., Guilley, S., Danger, J.-L.: A first-order leak-free masking countermeasure. In: CT-RSA, vol. 7178 of LNCS, pp. 156–170. Springer, Berlin (2012). doi:10.1007/978-3-642-27954-6_10
Maghrebi, H., Prouff, E., Guilley, S., Danger, J.-L.: Register leakage masking using gray code. In: HOST, IEEE Computer Society, pp. 37–42 (2012). doi:10.1109/HST.2012.6224316
Mangard, S., Oswald, E., Popp, T.: Power analysis attacks: revealing the secrets of smart cards. Springer, Berlin (2006). ISBN 0-387-30857-1, http://www.dpabook.org/
Mangard, S., Schramm, K.: Pinpointing the side-channel leakage of masked AES hardware implementations. In: CHES, vol. 4249 of LNCS, pp. 76–90. Springer, Berlin (2006)
Moradi, A., Mischke, O.: How far should theory be from practice? Evaluation of a countermeasure. In: CHES, Leuven, Belgium (2012)
Nassar, M., Guilley, S., Danger, J.-L.: Formal analysis of the entropy/security trade-off in first-order masking countermeasures against side-channel attacks. In: INDOCRYPT, vol. 7107 of LNCS, pp. 22–39. Springer, Berlin (2011). doi:10.1007/978-3-642-25578-6_4
Nassar, M., Souissi, Y., Guilley, S., Danger, J.-L.: RSM: a small and fast countermeasure for AES, secure against first- and second-order zero-offset SCAs. In: DATE, pp. 1173–1178. IEEE Computer Society, March 12–16, 2012. Dresden, Germany. (TRACK A: “Application Design”, TOPIC A5: “Secure Systems”)
Nikova, S., Rijmen, V., Schläffer, M.: Secure hardware implementation of nonlinear functions in the presence of glitches. J. Cryptol. 24(2), 292–321 (2011)
Preneel, B., Takagi, T. (eds) Cryptographic hardware and embedded systems-CHES 2011—13th International Workshop, Nara, Japan, September 28-October 1, 2011. Proceedings, vol. 6917 of LNCS. Springer, Berlin (2011)
Prouff, E., McEvoy, R.P.: First-order side-channel attacks on the permutation tables countermeasure. In: CHES, vol. 5747 of Lecture Notes in Computer Science, pp. 81–96. Springer, Berlin (2009)
Prouff, E., Rivain, M.: Masking against side channel attacks: a formal security proof. In: EUROCRYPT, vol. 7881 of LNCS, pp. 142–159. Springer, Berlin (2013)
Prouff, E., Rivain, M., Bevan, R.: Statistical analysis of second order differential power analysis. IEEE Trans. Comput. 58(6), 799–811 (2009)
Prouff, E., Roche, T.: Attack on a higher-order masking of the AES based on homographic functions. In: Gong, G., Chand Gupta, K. (eds) INDOCRYPT, vol. 6498 of Lecture Notes in Computer Science, pp. 262–281. Springer, Berlin (2010)
Prouff, E., Roche, T.: Higher-order glitches free implementation of the AES using secure multi-party computation protocols. In: Preneel and Takagi [42], pp. 63–78
Japanese RCIS-AIST. SASEBO (Side-channel Attack Standard Evaluation Board, Akashi Satoh) development board: 2013. http://www.risec.aist.go.jp/project/sasebo/
Rivain, M., Prouff, E.: Provably secure higher-order masking of AES. In: Mangard, S., Standaert, F.-X. (eds) CHES, vol. 6225 of LNCS, pp. 413–427. Springer, Berlin (2010)
Rivain, M., Prouff, E., Doget, J.: Higher-order masking and shuffling for software implementations of block ciphers. Cryptology ePrint Archive, Report 2009/420, September 2009. http://eprint.iacr.org/2009/420
Schindler, W., Lemke, K., Paar, C.: A stochastic model for differential side channel cryptanalysis. In: LNCS (ed) CHES, vol. 3659 of LNCS, pp. 30–46. Springer, Berlin (2005)
Standaert, F.-X., Malkin, T., Yung, M.: A unified framework for the analysis of side-channel key recovery attacks. In: EUROCRYPT, vol. 5479 of LNCS, pp. 443–461. Springer, Berlin (2009)
Acknowledgments
The authors are grateful to Shivam Bhasin for providing the estimation of the signal-to-noise ratio on FPGAs. We also thank Thanh-Ha Le and Maël Berthier from Safran-Morpho for interesting discussions regarding the use of cumulants in the development of the mutual information in the presence of strong noise. The interaction with them was a key for the rigorous demonstration of Theorem 1. Besides, this work, originating from IACR Cryptology ePrint Archive 2011/520 [32] and from a presentation at CRYPTARCHI 2012 [26], has greatly improved after the numerous fruitful exchanges with the anonymous reviewers. This work has been partly supported by the French National Research Agency (ANR), under Grant ANR-09-SEGI-013 (ARPEGE project SecReSoC, “Secured Reconfigurable System on Chip”).
Author information
Authors and Affiliations
Corresponding author
Appendix A: Estimation of the noise level in hardware implementations
Appendix A: Estimation of the noise level in hardware implementations
This appendix presents a method to estimate the signal-to-noise ratio (SNR) from real traces. For the sake of illustration, we use traces gathered from an FPGA (Xilinx Virtex 5) soldered on a SASEBO-GII board [48]. The traces are captured from the electromagnetic field emitted by the FPGA by an oscilloscope with a bandwidth of \(6\) GHz. The FPGA is programmed with an AES, that leaks values \(Y\) that depend on the distance between two state values \(X\). The architecture of the AES is that described in [40] (but with the countermeasure inhibited): one round is computed for every clock cycle. For each of the \(16\)-state bytes (but the first line, invariant through the ShiftRows transform, that has a poor SNR), the SNR is computed at the last round. The definition of the SNR requires two notions:
-
1.
the signal is the inter-class variance, i.e., \(\mathsf {Var}\left[\, \mathbb {E}[Y|X ] \right]\), whereas
-
2.
the noise is the total variance minus the signal, i.e., the intra-class variance \(\mathbb {E}[\mathsf {Var}\left[\, Y|X \right] ]\).
The SNR (in power, i.e., squared) is defined as the ratio between the inter- and the intra-class variances (refer to [4, 36]). These values are plotted over time in Fig. 8 when \(X\) is the transition of the last round. It appears that the value of the “squared” SNR is about \(0.005\), hence \(1/\sigma ^2 \approx 0.005\), which means \(\sigma \approx 14\). This value of \(\sigma \), representative of million-gate parallel devices like FPGAs, is significantly larger than the noise that taints measurements over ASICs such as smart-cards. This definitely shows that the hypothesis of “large values” of \(\sigma \) in FPGAs is supported, all the more so as the designer can decide to further increase the noise variance by activating pseudo-random logic, as explained for instance in [27].
Rights and permissions
About this article
Cite this article
Carlet, C., Danger, JL., Guilley, S. et al. Achieving side-channel high-order correlation immunity with leakage squeezing. J Cryptogr Eng 4, 107–121 (2014). https://doi.org/10.1007/s13389-013-0067-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13389-013-0067-1