Improved SideChannel Analysis of FiniteField Multiplication
 11 Citations
 3.7k Downloads
Abstract
A sidechannel analysis of multiplication in \(\mathsf {GF}(2^{128})\) has recently been published by Belaïd, Fouque and Gérard at Asiacrypt 2014, with an application to AESGCM. Using the least significant bit of the Hamming weight of the multiplication result, the authors have shown how to recover the secret multiplier efficiently. However such least significant bit is very sensitive to noise measurement; this implies that, without averaging, their attack can only work for high signaltonoise ratios (\( \mathsf {SNR}> 128\)). In this paper we describe a new sidechannel attack against the multiplication in \(\mathsf {GF}(2^{128})\) that uses the most significant bits of the Hamming weight. We show that much higher values of noise can be then tolerated. For instance with an \(\mathsf {SNR}\) equal to 8, the key can be recovered using \(2^{20}\) consumption traces with time and memory complexities respectively equal to \(2^{51.68}\) and \(2^{36}\). We moreover show that the new method can be extended to attack the fresh rekeying countermeasure proposed by Medwed, Standaert, Großschädl and Regazzoni at Africacrypt 2010.
Keywords
Sidechannel analysis Galois Field Multiplication LPN problem1 Introduction
SideChannel Attacks. The cornerstone of sidechannel analysis (SCA for short) is that information about some keydependent variable x leaks through e.g. the power consumption or the electromagnetic information of the device manipulating x. A sidechannel attack classically follows a divideandconquer approach and the secret is recovered by exhaustively testing the likelihood of every possible value for every secret piece. This modus operandi implicitly assumes that x depends on a short portion of the secret (for example only 8 bits if x corresponds to the output of the AES sbox). It is particularly suited to the context of software implementations where the processing is sequentially split into operations on data whose size depends on the device architecture (e.g. 8 bit or even 32 bit for smart cards).
SideChannel Analysis of FiniteField Multiplication. At Asiacrypt 2014 [BFG14], Belaïd, Fouque and Gérard consider an attack scenario dedicated to hardware implementations where many operations are performed simultaneously. Following previous works as [MSGR10, MSJ12], they assume that when performing a multiplication \(\mathbf {\mathrm {a}} \cdot \mathbf {\mathrm {k}}\) over \(\mathsf {GF}(2^n)\) for some known \(\mathbf {\mathrm {a}}\), only the Hamming weight of the result \(\mathbf {\mathrm {a}} \cdot \mathbf {\mathrm {k}} \in \mathsf {GF}(2^n)\) is leaking, with some noise; the goal is to recover the secret multiplier \(\mathbf {\mathrm {k}}\). Formally, after denoting by \(\mathcal{N}(0,\sigma )\) the Gaussian distribution with null mean and standard deviation \(\sigma \) and by \(\mathsf {HW}\) the Hamming weight over \(\mathsf {GF}(2^n)\), for a given basis of \(\mathsf {GF}(2^n)\), the SCA then amounts to solve the following problem:
Definition 1
(Hidden Multiplier Problem). Let \(\mathbf {\mathrm {k}} \leftarrow \mathsf {GF}(2^n)\). Let \(\ell \in {\mathbb N}\). Given a sequence \((\mathbf {\mathrm {a}}_i,\mathcal{L}_i)_{1 \le i\le \ell }\) where \(\mathbf {\mathrm {a}}_i \leftarrow \mathsf {GF}(2^n)\) and \(\mathcal{L}_i=\mathsf {HW}(\mathbf {\mathrm {a}}_i \cdot \mathbf {\mathrm {k}})+\varepsilon _i\) where \(\varepsilon _i \leftarrow \mathcal{N}(0,\sigma )\), recover \(\mathbf {\mathrm {k}}\).
The BelaïdFouqueGérard Attack and the LPN Problem. As noted in [BFG14], for \(\sigma =0\) (no noise) the above problem is easy to solve. Namely the least significant bit of the Hamming weight of x is the xor of the bits of x. Hence for known \(\mathbf {\mathrm {a}}_i\) the least significant bit of \(\mathsf {HW}(\mathbf {\mathrm {a}}_i \cdot \mathbf {\mathrm {k}})\) is a linear function of the bits of the secret \(\mathbf {\mathrm {k}}\). Therefore every Hamming weight gives a linear equation over the n bits of \(\mathbf {\mathrm {k}}\) and, if the system of equations has rank n (which happens with good probability), the secret \(\mathbf {\mathrm {k}}\) can be recovered by solving a linear system. However such least significant bit is very sensitive to the observation noise \(\varepsilon _i\). Even for relatively high signaltonoise ratios (i.e., low \(\sigma \)), this induces a significant error probability for the linear equations. This is all the more damageable that a device is never exactly leaking the Hamming weight of manipulated data, and a modeling (aka epistemic) error therefore adds to the observation noise. The problem of solving a system of noisy linear equations over \(\mathsf {GF}(2)\) is known as the Learning Parity with Noise (LPN) problem. New algorithms for solving LPN have recently been proposed [GJL14, BTV15]. The previous best method to solve the LPN problem was the FouqueLevieil algorithm from [LF06], which is a variant of the algorithm BKW proposed by Blum, Kalai and Wasserman in [BKW00]. According to [BFG14] the FouqueLevieil algorithm can solve the LPN for \(n=128\) bits with error probability \(p=0.31\) (corresponding to \(\mathsf {SNR}=128\)) with \(2^{48}\) acquisitions and \(2^{50}\) complexity (it becomes \(2^{334}\) when \(\mathsf {SNR}=8\)). Therefore the BelaïdFouqueGérard (BFG for short) algorithm for solving the Hidden Multiplier Problem is quite efficient for relatively high signaltonoise ratios (\(\mathsf {SNR}>128\)); however it becomes prohibitively inefficient for smaller values (e.g., larger values of \(\sigma \)).
Our New Attack. In this paper we describe a new algorithm for solving the Hidden Multiplier Problem, in which we use several most significant bits of the Hamming weight instead of the single least significant bit; we show that much smaller values of \(\mathsf {SNR}\) can then be tolerated (\(\mathsf {SNR}\simeq 8\)), which increases the practicability of the attack. Our technique works as follows. We only keep the observations with small Hamming weight or high Hamming weight. Namely if \(\mathsf {HW}(\mathbf {\mathrm {a}}_i \cdot \mathbf {\mathrm {k}})\) is close to 0, this means that most of the bits of \(\mathbf {\mathrm {a}}_i \cdot \mathbf {\mathrm {k}}\) are equal to 0. This can be written as a system of n equations over the bits of \(\mathbf {\mathrm {k}}\), all equal to 0, where some of the equations are erroneous. Similarly if the Hamming weight is close to n, we can assume that all n equations are equal to 1, and we obtain again a set of n noisy equations. Hence in both cases we obtain an instance of the LPN problem. For example, if we only keep observations with Hamming weight less than n/4 or greater than 3n/4, we obtain a set of noisy equations with error probability less than 1/4.
To solve the LPN problem we will use BKW style algorithms [BKW00]. The main drawback of these algorithms is the huge samples requirement that makes them unpractical for sidechannel attacks. In this paper we use some improvements to reduce the query complexity using ShamirSchroeppel [SS79] or the variant proposed by HowgraveGraham and Joux in [HGJ10]. We also take advantage of secreterror switching lemma [Kir11, ACPS09] to further reduce the time complexity.
Since our attack is based on filtering for abnormally low or high Hamming weights, it is much less sensitive to noise in Hamming weight measurement than the BFG attack, which relies on the least significant bit of the Hamming weight. Namely even for small \(\mathsf {SNR}\) (i.e., close to 8), our filtering remains essentially correct, whereas the information from the least significant bit of the Hamming weight is buried in noise and becomes useless. However, for high \(\mathsf {SNR}\), our attack requires a larger amount of observations. Therefore in the latter contexts, the BFG attack stays better.
We also describe an attack when the messages \(\mathbf {\mathrm {a}}_i\) can be chosen. In that case, the attack becomes much more efficient. We also attack a fresh rekeying scheme proposed in [MSGR10] to defeat sidechannel cryptanalysis. Whereas the latter scheme is not vulnerable to the technique used in [BFG14], we demonstrate that our attack enables to recover the secret key very efficiently.
Organization of the Paper. In Sect. 2, we recall the field multiplication for the AESGCM, the leakage model, the LPN problem and the BKW algorithm. Then, we present our new attack in Sect. 3 and the new algorithmic techniques to reduce the number of queries. In Sect. 4 we describe a new chosen message attack and in Sect. 5 our attack on the fresh rekeying scheme. Finally, in Sect. 6 we present the result of our practical experiments.
2 Preliminaries
2.1 Galois Field Multiplication
For any positive integer n, the finite field of \(2^n\) elements is denoted by \(\mathsf {GF}(2^n)\) and the ndimensional vector space over \(\mathsf {GF}(2)\) is denoted by \(\mathsf {GF}(2)^n\). Choosing a basis of \(\mathsf {GF}(2^n)\) over \(\mathsf {GF}(2)\) enables to represent elements of \(\mathsf {GF}(2^n)\) as elements of \(\mathsf {GF}(2)^n\) and vice versa. In the following, we assume that the same basis is always used to represent elements of \(\mathsf {GF}(2^n)\) over \(\mathsf {GF}(2)\).
2.2 Probabilities
2.3 Leakage Model
In the rest of the paper, the level of noise in the observations is quantified with the signaltonoise ratio (\(\mathsf {SNR}\) for short), that we define as the ratio between the signal variance and the noise variance. This value, which equals \(n/(4\sigma ^2)\) under Assumption (2), is a useful notion to compare different contexts where the variances of both the signal and the noise are different (e.g. with different devices).
As in [BFG14], the main purpose of our attack is to show that the key \(\mathbf {\mathrm {k}}\) can be recovered with only the observations \(\mathcal {L}(\mathbf {\mathrm {k}} \cdot \mathbf {\mathrm {a_i}})\) for many known \(\mathbf {\mathrm {a_i}}\)’s. Thus, we assume that the attacker has no access to the internal leakage of the field multiplication \(\mathbf {\mathrm {k}} \cdot \mathbf {\mathrm {a_i}}\) and that the nbit results are stored in nbit registers, which is the worst case to attack.
2.4 Learning Parities with Noise
As briefly explained in the introduction the problem of recovering a secret \(\mathbf {\mathrm {k}}\) from noisy observations of \(\mathsf {HW}(\mathbf {\mathrm {a}}\cdot \mathbf {\mathrm {k}})\) relates to the well known LPN problem.
Definition 2
(Learning Parity with Noise (LPN) Problem). Let \(\mathbf {\mathrm {k}} \in \mathsf {GF}(2)^n\) and \(p\in (0,1/2)\). Given a family of \(\nu \) values \((\mathbf {\mathrm {a}}_i)_{0 \leqslant i< \nu }\) in \(\mathsf {GF}(2)^n\) and the family of corresponding observations \((b_i=\langle \mathbf {\mathrm {a}}_i, \mathbf {\mathrm {k}}\rangle +e_i)_{0 \leqslant i< \nu }\), where \(\langle \cdot ,\cdot \rangle \) denotes the scalar product \(\in \mathsf {GF}(2)^n\) and where the \(\mathbf {\mathrm {a}}_i\) are drawn uniformly in \(\mathsf {GF}(2^n)\) and the \(e_i\) are generated according to Bernoulli’s distribution \(\mathsf {Ber}(p)\) with parameter p, recover \(\mathbf {\mathrm {k}}\).
We denote by \(\mathsf {LPN}(n,\nu ,p)\) an instance of the LPN problem with parameters \((n,\nu ,p)\). In this paper, the noisy equations \(\langle \mathbf {\mathrm {a}}_i,\mathbf {\mathrm {k}}\rangle +e_i\) will come from the noisy observations of a device performing field (or ring) multiplications in the form \(\mathbf {\mathrm {z}}=\mathbf {\mathrm {a}}\cdot \mathbf {\mathrm {k}}\) in \(\mathsf {GF}(2^n)\).
2.5 The BKW Algorithm and Its Variants
Blum et al. described in [BKW00] a subexponential algorithm for solving the LPN problem: it performs a clever Gaussian elimination using a small number of linear combinations, which reduces the dimension of the problem. Then, Levieil and Fouque proposed a practical improvement in [LF06] for the second phase of the algorithm and Kirchner [Kir11] proposed to switch secret and error [ACPS09] to further improve the method. Later Arora and Ge [AG11] proposed an algebraic approach for specifically structured noise. Recently Guo et al. proposed to use errorcorrecting codes [GJL14].
Finding linear combinations. To find linear combinations satisfying (3), we first split the \(\mathbf {\mathrm {a}}_i\)’s into a blocks of b bits, where \(n=a \cdot b\) (e.g. for \(n=128\) we can take \(a=8\) and \(b=16\)). Initially we have \(\nu \) vectors \(\mathbf {\mathrm {a}}_i\). Consider the rightmost b bits of each \(\mathbf {\mathrm {a}}_i\), and sort the \(\mathbf {\mathrm {a}}_i\)’s into \(2^b\) classes according to this value. We xor all elements of each class with a single one element of it, and we discard this element. Hence we get at least \(\nu 2^b\) new vectors \(\mathbf {\mathrm {a}}^{(1)}_i\), whose rightmost b bits are zero; these \(\mathbf {\mathrm {a}}^{(1)}_i\) are the xor of 2 initial vectors \(\mathbf {\mathrm {a}}_i\). One can then proceed recursively. For the next block of b bits we get at least \(\nu 2 \cdot 2^b\) vectors \(\mathbf {\mathrm {a}}^{(2)}_i\) whose rightmost 2b bits are zero; they are the xor of 4 initial vectors \(\mathbf {\mathrm {a}}_i\). Stopping at the lastbutone block, we get at least \(\nu (a1) \cdot 2^b\) vectors, for which only the first bbit block is possibly nonzero, and which are the xor of \(2^{a1}\) initial vectors \(\mathbf {\mathrm {a}}_i\). Among these \(\nu (a1) \cdot 2^b\) vectors, we select the ones equal to the basis vectors \(\mathbf {\mathrm {u}}_j\) and we perform a majority vote. With the xor of \(\ell =2^{a1}\) vectors, the bias is \((12p)^{2^{a1}}\). Therefore for the majority vote we need roughly \(c / (12p)^{2^{a1}}\) such vectors, for some logarithmic factor c [BKW00]. A variant of BKW algorithm is described by Levieil and Fouque in [LF06]: it finds linear combinations similarly, however at the end, it uses a Walsh Transform to recover the last b bits of \(\mathbf {\mathrm {k}}\) at once.
3 Our New Attack
In this section, we describe our new sidechannel attack on the result of the multiplication in \(\mathsf {GF}(2^{n})\), which benefits from being weakly impacted by the observation noise. As in [BFG14], we aim at recovering the nbit secret key \(\mathbf {\mathrm {k}}\) from a sequence of \(t\) queries \((\mathbf {\mathrm {a}}_i,\mathsf {HW}(\mathbf {\mathrm {k}}\cdot \mathbf {\mathrm {a}}_i)+\varepsilon _i)_{0 \leqslant i < t}\) where the \(\mathbf {\mathrm {a}}_i\) are drawn uniformly in \(\mathsf {GF}(2^{n})\) and the \(\varepsilon _i\) are drawn from the Gaussian distribution \(\mathcal{N}(0,\sigma )\).
3.1 Overview
The cornerstone of the attack is to filter the collected measurements to keep only the lowest and the highest Hamming weights. Then we assume that for each low (resp. high) Hamming weight, the multiplication result is exactly n bits of zeros (resp. ones). As a consequence, each filtered observation of \(\mathbf {\mathrm {z}}_i=\mathbf {\mathrm {a}}_i \cdot \mathbf {\mathrm {k}}\) gives n equations each with some error probability p. In our context, the equations correspond to the rowbycolumn scalar products in (1) and the binary error associated to the ith equation is denoted by \(e_i\), with \(\Pr [e_i=1]=p\). Therefore given \(t\) messages and corresponding measurements, we get an instance of the \(\mathsf {LPN}(n,n \cdot t, p)\) problem that we can solve using techniques described in Sect. 3.3. To correctly scale the latter techniques, we need to know the error probability p with good precision. In the next section we show how to compute p from the filtering threshold and the measurement noise \(\sigma \) in (2).
3.2 Filtering
We describe here how we filter the lowest and highest leakage and we compute the error probabilities of our final set of equations. In order to catch the extreme Hamming weight values of the multiplication results, we choose a threshold real value \(\lambda \) and we filter all the observations below \(n/2 \lambda s\) and above \(n/2+ \lambda s\), with \(s=\sqrt{n}/2\) the standard deviation of the leakage deterministic part (here the Hamming weight). In the first case, we assume that all the bits of the multiplication result are zeros and in the second case we assume that they are all set to one. In both cases, we get n linear equations on the key bits, each having the same error probability p.
Error probability p and \(\lambda \) w.r.t. the filtering proportion \(F(\lambda )\) and the \(\mathsf {SNR}\)

3.3 Solving the LPN Problem
Numerous algorithms for solving LPN are known in the literature; a good survey is given by Pietrzak in [Pie12]. They generally require a huge number of LPN equations. However in our context, these equations come from sidechannel acquisitions and thus remain in a rather scarce number. A wellknown result of Lyubashevsky reduces the sample complexity, but its limitations on the noise render it inapplicable to our problem [Lyu05]. In this section we summarize the ideas we setup for solving the LPN problem with a reduced number of samples and under reasonable levels of noise.
We take the point of view of an attacker: she has a limited quantity of sidechannel information, thus a limited number of initial LPN samples. She also has a limited computing power and (most importantly) memory. She has two goals: firstly she wants to make sure that the attack will indeed be feasible in theory (this depends on the final number of reduced equations), thus she must compute it as exactly as possible (she cannot afford to miss one bit of complexity in the computations). Secondly, she has reasonable but limited resources and wants to make the attack as efficient as possible.
Algorithm Sketch. The main parameter of the algorithm is the initial bias: it determines the number of linear combinations steps we will be able to do before the final bias explodes. We fix it to 3 reductions (8 linear combinations). We look for smallweight linear combinations of initial equations that have their MSB cancelled. There’s not enough initial LPN equations to use BKW or LF1 (cf Sect. 2.5) algorithms directly (they do not remove enough bits per iteration).
We thus first (rather artificially) square the number \(\nu \) of LPN samples: for all elements \(\mathbf {\mathrm {a}}_i\) in the initial set, with error probability p (bias \(\delta = 1  2p\)), we build the set \((\mathbf {\mathrm {a}}_{i,j})_{i \not = j}\doteq (\mathbf {\mathrm {a}}_i \oplus \mathbf {\mathrm {a}}_j)_{i,j}\). We then can do only 2 reductions. However, on the one hand, BKWlike algorithms will still not find enough reduced equations. On the other hand, exhaustively looking for reduced equations among all linear combinations of at most 4 (corresponding to 2 reductions) amplified equations would not be very efficient. Consequently, we apply two steps of a generalized birthday paradoxlike algorithm [Wag02].
Then assume that we obtain wbits reduced equations. Once enough equations are found (this depends on the final bias of the equations, which is \(\delta ^8\)), we can directly apply a WalshHadamard transform (WHT) to recover the w LSB of the secret if the attacker memory is greater than \(2^w\) wbits words. If we can only obtain equations reduced to \(w' > w\) bits, we can simply guess the \(w'  w\) bits of the secret and do a WHT on the last w bits. In this case, the search space can be reduced using the error/secret switching idea at the very beginning of the algorithm.
The algorithm steps as well as its time and space complexities are analyzed in details in [BCF+15]. From a practical perspective, the optimal choice depends on several parameters: number of traces, filtering ratio, level of noise, available memory, computing power. Several tradeoffs are thus available to the attacker. The most obvious one is to trade sidechannel measurements against computing needs. Using more traces either makes it possible to reduce the bias of the selected equations, or increases their number, reducing the reduction time (birthday paradox phase). In a nutshell, the more traces are available, the better. Given a fixed number of traces (order of magnitude \(2^{20}\) to \(2^{24}\)), the attacker fixes the filtering threshold \(\lambda \). Increasing \(\lambda \) improves the bias of the selected equations. Thus less reduced equations are required for the WHT to correctly find w bits of the secret. Nonetheless, increasing \(\lambda \) also reduces the number of initial equations and thus makes the birthday paradox part of the algorithm slower. Concerning the reduction phase, it is well known that balancing the two phases of the generalized birthday paradox is the best way to reduce its complexity. Finally doubling the memory makes it possible recover one bit more with the WHT, while slightly more than doubling its time complexity: we fill the table with equations that are 1 bit less reduced, halving the time needed by the birthday paradox phase.
3.4 Comparison with Stateofthe Art Attacks
Compared to [BFG14], our new attack performs better except in one scenario when \(\mathsf {SNR}=128\) and the number of available queries is very limited by the context. Indeed, for \(\mathsf {SNR}=128\) the attack in [BFG14] requires only 128 observations to get 128 equations with error probability 0.31 whereas our attack requires \(2^{15}\) observations to achieve the same error probability. In the other contexts (i.e., for higher levels of noise) the attack in [BFG14] faces strong limitations. Concretely, recovering the secret key becomes very hard if the inputs are not chosen. On the contrary, since our attack benefits from being quite insensitive to noise, it stays successful even for higher noise levels.
4 Extension to Chosen Inputs
In this section, we present a keyrecovery technique which can be applied when the attacker is able to control the public multiplication operands \(\mathbf {\mathrm {a}}_i\). It is based on comparing the leakage for related inputs.
4.1 Comparing Leaks
In the socalled chosen message model, the attacker chooses \(\nu \) messages \((\mathbf {\mathrm {a}}_i)_{0 \leqslant i < \nu }\) in \(\mathsf {GF}(2^n)\) and gets the corresponding leakages \(\mathcal {L}(\mathbf {\mathrm {k}}\cdot \mathbf {\mathrm {a}}_i)\) as defined by Equation (2).
From the underlying associative property of the field \(\mathsf {GF}(2^{n})\), we remark^{1} that the relation \((2 \cdot \mathbf {\mathrm {a}}_i) \cdot \mathbf {\mathrm {k}}= 2 \cdot (\mathbf {\mathrm {a}}_i \cdot \mathbf {\mathrm {k}})\) stands for every query \(\mathbf {\mathrm {a}}_i\). If the most significant bit of \(\mathbf {\mathrm {a}}_i \cdot \mathbf {\mathrm {k}}\) is zero, then the latter relation implies that the bits of \(\mathbf {\mathrm {a}}_i \cdot k\) are simply shifted when computing \(2 \cdot (\mathbf {\mathrm {a}}_i \cdot \mathbf {\mathrm {k}})\) which results in \(\mathsf {HW}((2 \cdot \mathbf {\mathrm {a}}_i) \cdot \mathbf {\mathrm {k}})=\mathsf {HW}(\mathbf {\mathrm {a}}_i \cdot \mathbf {\mathrm {k}})\). However, if the most significant bit of \(\mathbf {\mathrm {a}}_i \cdot \mathbf {\mathrm {k}}\) is one, then the bits are also shifted but the result is summed with the constant value 23, which corresponds to the decimal representation of the binary coefficients of the nonleading monomials of the polynomial \(x^{128} + x^7 + x^2 + x + 1\) involved in the representation of the field \(\mathsf {GF}(2^{128})\) in \(\mathsf {AES}\text {}\mathsf {GCM}\). In this case, the Hamming weight values \(\mathsf {HW}((2 \cdot \mathbf {\mathrm {a}}_i)\cdot \mathbf {\mathrm {k}})\) and \(\mathsf {HW}(\mathbf {\mathrm {a}}_i \cdot \mathbf {\mathrm {k}})\) are necessarily different. Indeed, the bits are shifted, the less significant bit is set to one and the bits of (\(\mathbf {\mathrm {a}}_i \cdot \mathbf {\mathrm {k}}\)) at positions 0, 1 and 6 are flipped. Thus, the absolute value of the difference between both Hamming Weight values is equal to 3 with probability 1 / 4 or to 1 with probability 3 / 4.
Optimal threshold and probability of deciding correctly w.r.t. the \(\mathsf {SNR}\)

Comparing to Table 1, the error probabilities in Table 2 are much more advantageous and only 129 queries are required. If the number of queries is not limiting, the traces can be averaged to decrease the noise and thus improve the success rate. Another improvement is to correlate not only two consecutive powers of 2 but also nonconsecutive ones (e.g., \(2^j\) and \(2^{j+2}\)). Without noise, we do not get more information but in presence of noise we can improve the probability of deciding correctly.
4.2 Key Recovery
With the method described above, we only get 128 different linear equations in the key bits. Thus, we cannot use an LPN solving algorithm to recover the secret key in presence of errors. However, since we can average the measurements, we can significantly reduce the level of noise and remove the errors almost completely. For instance, with an \(\mathsf {SNR}\) of 128 (which can also be achieved from an \(\mathsf {SNR}\) of 2 and 64 repetitions), we get an average of \(128\times 0.003=0.384\) errors. Solving the system without error is straightforward when we use the powers of two since we directly have the key bits. Thus, inverting all the second members of the equations onebyone to remove a single error leads to a global complexity of \(2^7\) key verifications. This complexity is easily achievable and remains reasonable to recover a 128bit key.
5 Adaptation to Fresh ReKeying
The core idea of the fresh rekeying countermeasure originally proposed in [MSGR10] for block cipher algorithm is to create a new session key from a public nonce for each new processing of the encryption algorithm. It guaranties that the secret (master) key is never used directly. To allow for the decryption of the ciphertext, the latter one is sent together with the nonce. For soundness, the fresh rekeying must satisfy two properties. First, it must be easy to protect against sidechannel attacks. Secondly, it must have a good diffusion so that each bit of the new session key depends on a large number of bits of the master key, rendering attacks based on keyhypotheses testing inefficient. To satisfy the first property, [MSGR10] proposes to base the rekeying on linear functions. Efficient techniques are indeed known to secure the latter functions against SCA (e.g. higherorder masking has linear complexity for linear functions [ISW03, CGP+12]). To additionally satisfy the second property, [MSGR10] proposes to define the linear functions from circulant matrices deduced from the random nonce.
Error probability p according to the proportion of filtered acquisitions \(F(\lambda )\).

This confirms on different parameters that with much fewer observations, we have smaller error probabilities. Therefore, even for \(F(\lambda )=0.5\) (i.e., we only filter one observation over two), the system can be solved to recover the 128bit key. Furthermore, it is worth noting that this new attack on an \(\mathsf {AES}\) using a onetime key allows to recover the master key without observing any leakage in the fresh rekeying algorithm.
By using this trick which consists in observing the leakage of 8bit session keys in the first round of the \(\mathsf {AES}\), we can also mount an attack towards the outlines of the approach proposed in [BFG14] against the \(\mathsf {AES}\text {}\mathsf {GCM}\) multiplication. Since in this case only the first matrix row is involved in the computation, the coefficients of the key bits are different and each observation gives a useful linear equation. Plus, since we observe the leakage on 8bit data, the noise impacts on the less significant bit of Hamming weight is reduced, which improves the system solving. However, the resulting attack remains much less efficient than our new attack, even in the number of required observations.
6 Practical Experiments
We showed in previous sections how to mount efficient sidechannel attacks on finitefield multiplication over 128bit data in different scenarios according to the attacker capabilities. In order to verify the truthfullness of our leakage assumptions, we have mounted few of these attacks in practice and made some simulations. In particular, we implemented the AESGCM and the fresh rekeying protocol on an ATMega328p and measured the leakage using the ChipWhisperer kit [OC14]. We also obtained the 100,000 traces of AESGCM multiplication from [BFG14] corresponding to EM radiations of an FPGA implementation on the Virtex 5 of a SASEBO board.
We first illustrate the leakage behavior we obtained on the ATMega328p. Then we present experimental confirmations that the attack on AESGCM with known inputs can actually be mounted. Afterwards, we show how efficient is the attack on fresh rekeying when the attacker can exploit 8bit leakages of the first round of AES. Eventually, the reader may find in the extended version of this paper [BCF+15] an experiment corresponding to the chosenmessage attack presented in Sect. 4 for a 128bit multiplication implemented on the ATMega328p.
6.1 ATMega328p Leakage Behaviour
Since we are in software on an 8bit implementation, we simulate a 128bit leakage by summing the intermediate leakage on 8bit parts of the result^{4}. We randomly generated 100, 000 vectors \(\mathbf {\mathrm {a}}\in \mathsf {GF}(2)^{128}\) and, for a fixed key \(\mathbf {\mathrm {k}}\), we measured the leakage during the processing of \(\mathbf {\mathrm {z}}= \mathbf {\mathrm {a}}\cdot \mathbf {\mathrm {k}}\) as specified in \(\mathsf {AES}\text {}\mathsf {GCM}\) (see (1)). Each measurement was composed of 4, 992 points among which we detected 16 points of interest by following a Ttest approach as e.g. described in [GJJR11]. We afterwards verified that these points corresponded to the manipulation of the bytecoordinates \(\mathbf {\mathrm {z}}[i]\) of \(\mathbf {\mathrm {z}}\) after the multiplication processing.
6.2 Attacks on AESGCM with Known Inputs
The aforementioned attack of AESGCM with known inputs was almost completely performed for 96bit keys (simulations for more leakage traces) and partially performed for 128bit keys (the error probabilities were confirmed in practice).
Experiments on Filtering.
Experimental and theoretical parameters corresponding to filtering proportion \(F(\lambda )\) on the ATmega for 128bit \(\mathsf {AES}\text {}\mathsf {GCM}\).

Error probabilities obtained from real traces.
\(\lambda \)  0.906  1.270  1.645  2.022  2.409  2.794  3.165  3.847 
\(p_\mathrm{the}\)  0.442  0.431  0.419  0.407  0.395  0.382  0.369  0.357 
\(p_\mathrm{exp}\)  0.441  0.430  0.418  0.405  0.392  0.379  0.370  0.361 
Experimental and theoretical parameters corresponding to filtering proportion \(F(\lambda )\) on the ATmega for 96bit \(\mathsf {AES}\text {}\mathsf {GCM}\)

LPN Experiments.
Attack on Simulated Traces (96bit). We successfully performed our new attack on \(\mathsf {AES}\text {}\mathsf {GCM}\) for a blocksize reduced to 96 bits. We generated a 96bit key \(\mathbf {\mathrm {k}}\), then generated \(2^{20}\) uniform random \(\mathbf {\mathrm {a}}_i\). We simulated a leakage corresponding to the one obtained on the ATMega328p (i.e., with the same statistics) and chose \(\lambda \) equal to 3.80 (filtering with probability \(2^{10}\), error probability 0.387). This kept 916 relations, the less noisy one having weight 25 (error rate 0.260). We used this relation for secret/error switch. All in all, we got \(87840 \approx 2^{16,42}\) LPN equations. After 6 hours of parallelized generalized birthday computation (32 cores, 200 GB of RAM), we got \(\approx 2^{39}\) equations reduced down to 36 bits. After a 36bit Walsh transform (\(\approx 2000\) seconds, same machine), we recovered the 36 least significant bits of the error that we converted in 36 bits of the secret. This heavy computation corresponds to the most complex part of the attack and validates its success. We can afterwards find the remaining bits by iterating the attack with the knowledge of the recovered bits. This is a matter of minutes: it corresponds to an attack on a 60bit key, which is much less expensive than the 96bit case.
6.3 Attack on Fresh ReKeying
We detail here the attack that aims at recovering the master key from the leakages corresponding to the first round of the \(\mathsf {AES}\) when the secret key is generated by the fresh rekeying primitive described in Sect. 5. We present the knowninput version of the attack, the choseninput attack is described in [BCF+15].
Leakage Acquisition. We randomly generated 15,000 vectors \(\mathbf {\mathrm {a}}\in \mathsf {GF}(2)^{128}\) and 15,000 vectors \(\mathbf {\mathrm {b}} \in \mathsf {GF}(2)^{8}\). We then measured the 8bit leakage during the processing of \(\mathsf {Sbox}(\mathbf {\mathrm {z}}[0] \oplus \mathbf {\mathrm {b}})\) with \(\mathbf {\mathrm {z}}[0]\) the first byte of the multiplication between \(\mathbf {\mathrm {a}}\) and \(\mathbf {\mathrm {k}}\).
Error probability p according to the proportion of filtered acquisitions \(F(\lambda )\) on the ATMega328p for the fresh rekeying with known inputs

Key Recovery. With a sufficient (but still reasonable) filtering, we can directly recover the key by inverting the linear system of equations. For instance, in our experiments, filtering one observation over \(2^9\) gives \(33 \times 8=264\) linear equations on the bits of \(\mathbf {\mathrm {k}}\) without a single error. Thus, inverting the system directly gives us the correct key.
Footnotes
 1.
We can simply choose \(\mathbf {\mathrm {a}}_i\) equal to 1.
 2.
Note that we did not consider so far the bias induced by the recovery of the less significant bits (whose values have been altered by previous squarings) since it is very negligible in practice.
 3.
 4.
Our purpose was to test the practical soundness of our theoretical analyses; we hence chose to artificially build a 128bit leakage. The application of our attack to 8bit chunks is the purpose of Sect. 6.3 where it is shown that this situation is much more favourable to the attacker.
 5.
It must be noticed that a \(\mathsf {SNR}\) equal to 8.21 in our experiments (with a noise standard deviation 0.0206) corresponds to a noise with standard deviation \(\sigma =\sqrt{32/8.21}=1.97\) in the theoretical Hamming weight model over 128bit data.
 6.
It must be noticed that, surprisingly, we also obtained an \(\mathsf {SNR}\) equal to 8.21 in FPGA experiments but corresponding to a noise standard deviation of 7.11.
 7.
An \(\mathsf {SNR}\) equal to 8.7073 in our experiments (with a noise standard deviation 0.0173) corresponds to a noise with standard deviation \(\sqrt{24/8.7073}=1.66\) in the theoretical Hamming weight model over 96bit data.
References
 Applebaum, B., Cash, D., Peikert, C., Sahai, A.: Fast cryptographic primitives and circularsecure encryption based on hard learning problems. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 595–618. Springer, Heidelberg (2009) CrossRefGoogle Scholar
 Arora, S., Ge, R.: New algorithms for learning in presence of errors. In: Aceto, L., Henzinger, M., Sgall, J. (eds.) ICALP 2011, Part I. LNCS, vol. 6755, pp. 403–415. Springer, Heidelberg (2011) CrossRefGoogle Scholar
 Belaïd, S., Coron, J.S., Fouque, P.A., Gérard, B., Kammerer, J.G., Prouff, E.: Improved sidechannel analysis of finitefield multiplication. Cryptology ePrint Archive, Report 2015/542, (2015). http://eprint.iacr.org/
 Belaïd, S., Fouque, P.A., Gérard, B.: SideChannel analysis of multiplications in GF(2128). In: Sarkar, P., Iwata, T. (eds.) ASIACRYPT 2014, Part II. LNCS, vol. 8874, pp. 306–325. Springer, Heidelberg (2014) Google Scholar
 Blum, A., Kalai, A., Wasserman, H.: Noisetolerant learning, the parity problem, and the statistical query model. In: 32nd ACM STOC, pp. 435–440. ACM Press, May 2000Google Scholar
 Bogos, S., Tramer, F., Vaudenay, S.: On solving LPN using BKW and variants. Cryptology ePrint Archive, Report 2015/049, (2015). http://eprint.iacr.org/2015/049
 Carlet, C., Goubin, L., Prouff, E., Quisquater, M., Rivain, M.: HigherOrder masking schemes for Sboxes. In: Canteaut, A. (ed.) FSE 2012. LNCS, vol. 7549, pp. 366–384. Springer, Heidelberg (2012) CrossRefGoogle Scholar
 Chekuri, C., Jansen, Rolim, K., J.D.P., Trevisan, L. (eds.) Approximation, randomization and combinatorial optimization, algorithms and techniques. In: 8th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2005 and 9th International Workshop on Randomization and Computation, RANDOM 2005, Berkeley, CA, USA, August 22–24, 2005, Proceedings, vol. 3624 of Lecture Notes in Computer Science. Springer, Heidelberg (2005)Google Scholar
 Dabosville, G., Doget, J., Prouff, E.: A new secondorder side channel attack based on linear regression. IEEE Trans. Comput. 62(8), 1629–1640 (2013)MathSciNetCrossRefGoogle Scholar
 Goodwill, G., Jun, B., Jaffe, J., Rohatgi, P.: A testing methodology for sidechannel resistance validation. In: Workshop NIAT (2011)Google Scholar
 Guo, Q., Johansson, T., Löndahl, C.: Solving LPN using covering codes. In: Sarkar, P., Iwata, T. (eds.) ASIACRYPT 2014. LNCS, vol. 8873, pp. 1–20. Springer, Heidelberg (2014) Google Scholar
 HowgraveGraham, N., Joux, A.: New generic algorithms for hard knapsacks. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 235–256. Springer, Heidelberg (2010) CrossRefGoogle Scholar
 Ishai, Y., Sahai, A., Wagner, D.: Private circuits: securing hardware against probing attacks. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 463–481. Springer, Heidelberg (2003) CrossRefGoogle Scholar
 Kirchner, P.: Improved generalized birthday attack. Cryptology ePrint Archive, Report 2011/377, (2011). http://eprint.iacr.org/2011/377
 Levieil, É., Fouque, P.A.: An improved LPN algorithm. In: De Prisco, R., Yung, M. (eds.) SCN 2006. LNCS, vol. 4116, pp. 348–359. Springer, Heidelberg (2006) CrossRefGoogle Scholar
 Lyubashevsky, V.: The parity problem in the presence of noise, decoding random linear codes, and the subset sum problem. In: Chekuri et al. (eds.) [CJRT05], pp. 378–389 (2005)Google Scholar
 Medwed, M., Standaert, F.X., Großschädl, J., Regazzoni, F.: Fresh rekeying: security against sidechannel and fault attacks for lowcost devices. In: Bernstein, D.J., Lange, T. (eds.) AFRICACRYPT 2010. LNCS, vol. 6055, pp. 279–296. Springer, Heidelberg (2010) CrossRefGoogle Scholar
 Medwed, M., Standaert, F.X., Joux, A.: Towards superexponential sidechannel security with efficient leakageresilient PRFs. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, pp. 193–212. Springer, Heidelberg (2012) CrossRefGoogle Scholar
 O’Flynn, C., Chen, Z.: Chipwhisperer: an opensource platform for hardware embedded security research. Cryptology ePrint Archive, Report 2014/204 (2014). http://eprint.iacr.org/
 Pietrzak, K.: Cryptography from learning parity with noise. In: Bieliková, M., Friedrich, G., Gottlob, G., Katzenbeisser, S., Turán, G. (eds.) SOFSEM 2012. LNCS, vol. 7147, pp. 99–114. Springer, Heidelberg (2012) CrossRefGoogle Scholar
 Renauld, M., Kamel, D., Standaert, F.X., Flandre, D.: Information theoretic and security analysis of a 65nanometer DDSLL AES SBox. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 223–239. Springer, Heidelberg (2011) CrossRefGoogle Scholar
 Schindler, W., Lemke, K., Paar, C.: A Stochastic Model for Differential Side Channel Cryptanalysis. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 30–46. Springer, Heidelberg (2005) CrossRefGoogle Scholar
 Schroeppel, R., Shamir, A.: A T \(\text{ s }^{2}\) = o(\(2^{{\rm n}}\)) time/space tradeoff for certain npcomplete problems. In: 20th Annual Symposium on Foundations of Computer Science, pp. 328–336. IEEE Computer Society, San Juan, Puerto Rico, 29–31 October (1979)Google Scholar
 Wagner, D.: A generalized birthday problem. In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, pp. 288–303. Springer, Heidelberg (2002) CrossRefGoogle Scholar