Horizontal Collision Correlation Attack on Elliptic Curves
Abstract
Elliptic curves based algorithms are nowadays widely spread among embedded systems. They indeed have the double advantage of providing efficient implementations with short certificates and of being relatively easy to secure against sidechannel attacks. As a matter of fact, when an algorithm with constant execution flow is implemented together with randomization techniques, the obtained design usually thwarts classical sidechannel attacks while keeping good performances. Recently, a new technique that makes some randomizations ineffective, has been successfully applied in the context of RSA implementations. This method, related to a socalled horizontal modus operandi, introduced by Walter in 2001, turns out to be very powerful since it only requires leakages on a single algorithm execution. In this paper, we combine such kind of techniques together with the collision correlation analysis, introduced at CHES 2010 by Moradi et al., to propose a new attack on elliptic curves atomic implementations (or unified formulas) with input randomization. We show how it may be applied against several stateofthe art implementations, including those of ChevallierMames et al., of Longa and of GiraudVerneuil and also Bernstein and Lange for unified Edward’s formulas. Finally, we provide simulation results for several sizes of elliptic curves on different hardware architectures. These results, which turn out to be the very first horizontal attacks on elliptic curves, open new perspectives in securing such implementations. Indeed, this paper shows that two of the main existing countermeasures for elliptic curve implementations become irrelevant when going from vertical to horizontal analysis.
Keywords
Elliptic Curves Cryptosystems (ECC) that have been introduced by N. Koblitz [21] and V. Miller [29], are based on the notable discrete logarithm problem, which has been thoroughly studied in the literature and is supposed to be a hard mathematical problem. The main benefit in elliptic curves based algorithms is the size of the keys. Indeed, for the same level of security, the schemes require keys that are far smaller than those involved in classical publickey cryptosystems. The success of ECC led to a wide variety of applications in our daily life and they are now implemented on lots of embedded devices: smartcards, microcontroller, and so on. Such devices are small, widespread and in the hands of endusers. Thus the range of threats they are confronted to is considerably wider than in the classical situation. In particular, physical attacks are taken into account when assessing the security of the application implementation (e.g. the PACE protocol in epassports [20]) and countermeasures are implemented alongside the algorithms.
A physical attack may belong to one of the two following families: perturbation analysis or observation analysis. The first one tends to modify the cryptosystem processing with laser beams, clock jitter or voltage perturbation. Such attacks can be thwarted by monitoring the device environment with captors and by verifying the computations before returning the output. The second kind of attacks consists in measuring a physical information, such as the power consumption or the electromagnetic emanation, during sensitive computations. Inside this latter area we can distinguish, what we call simple attacks, that directly deduces the value of the secret from one or a small number of observation(s) (e.g. Simple Power Analysis [23]) and advanced attacks involving a large number of observations and exploiting them through statistics (e.g. Differential Power Analysis [24] or Correlation Power Analysis [9]). Such attacks require the use of a statistical tool, also known as a distinguisher, together with a leakage model to compare hypotheses with real traces (each one related to known or chosen inputs). The latter constraint may however be relaxed thanks to the socalled collision attacks [32] which aim at detecting the occurrences of colliding values during a computation, that can be linked to the secret [8, 14, 30, 31]. In order to counteract all those attacks, randomization techniques can be implemented (e.g. scalar/message blinding for ECC [16]). The recent introduction of the socalled horizontal sidechannel technique by Clavier et al. in [13] seems to have set up a new deal. This method, which is inspired by Walter’s work [33], takes its advantage in requiring a unique power trace, thus making classical randomization techniques ineffective. Up to now, it has been applied successfully on RSA implementations and we show in this paper that it can be combined with collision correlation analysis to provide efficient attack on elliptic curves protected implementations.
Core idea. In the context of embedded security, most ECC protocols (e.g. ECDSA [1] or ECDH [2]) use a short term secret that changes at each protocol iteration. In this particular setting, advanced sidechannel attacks, which require several executions of the algorithm with the same secret, are ineffective. As a consequence, only protection against SPA is usually needed, that can be done thanks to the popular atomicity principle [11, 18, 26]. Up to now, this technique is considered as achieving the best security/efficiency tradeoff to protect against sidechannel analysis. In this paper, we provide a new sidechannel attack, called horizontal collision correlation analysis that defeats such protected ECC implementations. In particular, implementations using point/scalar randomization combined with atomicity are not secure, contrary to what was thought up to now. Moreover in order to complete our study, we also investigate the case of unified formulas^{1}. Indeed, we show that our horizontal collision correlation attack allows to distinguish, with a single leakage trace, a doubling operation from an addition one. This technique, which allows to eventually recover the secret scalar, is applied to three different atomic formulae on elliptic curves, namely those proposed by ChevallierMames et al. in [11], by Longa in [26], by Giraud and Verneuil in [18].
The paper is organized as follows. First, Sect. 2 recalls some basics about ECC in a sidechannel attacks context. Then, under the assumption that one can distinguish common operands in modular multiplications, the outlines of our new horizontal collision correlation attack are presented in Sect. 3. After a theoretical analysis explaining how to practically deal with the distinguishability assumption, we provide in Sect. 4 experimental results for \(160\), \(256\) and \(384\)bitsize curves working with \(8\), \(16\) or \(32\)bit registers. These results show that the attack success rate stays high even when significant noise is added to the leakage.
2 Preliminaries
2.1 Notations and Basics on SideChannel Attacks
A sidechannel analysis aims at describing a strategy to deduce information on the algorithm secret parameter from the leakages \(L_i\). Let us denote by \(\varvec{s}\) this secret parameter. In this paper, we pay particular attention to two attacks subclasses. The first ones are called simple and try to exploit a dependency between the sequence of operations \(\mathsf {C}_i\) and \({\varvec{s}}\) (independently of the \(\mathsf {C}_i\) inputs and outputs). A wellknown example of such an attack is the simple power analysis (SPA) [16]. In this attack, the algorithm input is kept constant and the unprotected sequence of \(\mathsf {C}_i\) is usually composed of two distinct operations (for instance a doubling and an addition in the case of ECC). It can easily be checked that the order of those operations in the sequence is a onetoone function of the secret scalar \({\varvec{s}}\). Hence, if the leakages \(L_i\) enable to clearly differentiate the operations, then the adversary may recover the order of the latters, and thus the secret.
Following the framework presented in [4], we call advanced the attacks belonging to the second class of sidechannel analyses. Among them, we find the wellknown differential power analysis (DPA) [24] or the correlation power analysis (CPA) [9]. Contrary to simple attacks, the advanced ones do not only focus on the operations but also on the operands. They usually focus on a small subset \(I\) of the calculations \(\mathsf {C}_i\) and try to exploit a statistical dependency between the results \(O_i\) of those calculations and the secret \({\varvec{s}}\). For such a purpose, the adversary must get a sufficiently large number \(N\) of observations \((\ell _j^i)_{j} \hookleftarrow L_i(O_i)\), where \(i \in I\) and \(1 \le j \le N\).
In the literature, two strategies have been specified to get the observations samples \((\ell ^i_j)_j\) for a given elementary computation \(O_i = \mathsf {C}_i(X_i)\). The first method, called vertical, simply consists in executing the implementation several times and in defining \(\ell ^i_j\) as the observation related to the result \(O_i\) at the \(j\) \(^{\text {th}}\)algorithm execution. Most attacks [3, 9, 24] enter into this category and the number of different indices \(i\) may for instance correspond to the attack order [27]. The second method, called horizontal [13, 33], applies on a single algorithm execution. It starts by finding the sequence of elementary calculations \((\mathsf {C}_{i_j})_j\) that processes the same mathematical operation than \(\mathsf {C}_i\) (e.g. a field multiplication) and depends on the same secret subpart. By construction, all the outputs \(O_{i_j}\) of the \(\mathsf {C}_{i_j}\) can be viewed as a realization of \(O_i=\mathsf {C}_i(X_i)\) and the \(\ell _j^i\) are here defined as the observations of the \(O_{i_j}\). We can eventually notice that the vertical and horizontal strategies are perfectly analogous to each other and that they can be applied to both simple and advanced attacks.
2.2 Background on Elliptic Curves
As this paper focuses on sidechannel attacks on ECC, let us recall now some basics on elliptic curves and especially on the various ways of representing points on such objects (the reader could refer to [15, 19] for more details).
Throughout this paper, we are interested in elliptic curve implementations running on platforms (ASIC, FPGA, microcontroller) embedding a hardware modular multiplier (e.g. a \(16\)bit, \(32\)bit or \(64\)bit multiplier). On such implementations, the considered elliptic curves are usually defined over a prime finite field \(\mathbb {F}_p\). In the rest of this paper, we will assume that all curves are defined over \(\mathbb {F}_p\) with \(p \ne \{2,3\}\). The algorithm used for the hardware modular multiplication is assumed to be known to the attacker. Moreover, to simplify the attack descriptions, we assume hereafter that the latter multiplication is performed in a very simple way: a schoolbook long integer multiplication followed by a reduction. Most of current devices do not implement the modular multiplications that way, but the attacks described hereafter can always be adapted by changing the definition of the elementary operations of Sect. 3.3 (see the full version of the paper for a complete discussion on that point).
In the rest of the paper, the points will be represented using their projective coordinates. Namely, a point \(P=(x,y)\) is expressed as a triplet \((X:Y:Z)\) such that \(X=xZ\) and \( Y=yZ\).
2.3 Points Operations in Presence of SCA
This paper focusses on elliptic curves cryptosystems which involve the scalar multiplication \([\varvec{s}]P\), implemented with the wellknown double and add algorithm.
In a nonprotected implementation, the sequence of point doublings and point additions can reveal the value of \(\varvec{s}\) with a single leakage trace. Thus to protect the scheme against SPA, the sequence of point operations must be independent from the secret value. This can be achieved in several ways. The double and add always algorithm [16] is the simplest solution. It consists in inserting dummy point additions each time the considered bit value of \(\varvec{s}\) is equal to \(0\). In average, this solution adds an overhead of \(\frac{\log _2(\varvec{s})}{2}\) point additions. Another technique consists in using unified formulae for both addition and doubling [6, 7, 25]. Finally, the scheme that is usually adopted in constrained devices such as smart cards, since it achieves the best time/memory tradeoff, remains atomicity [11, 18, 26]. This principle is a refinement of the double and add always technique. It consists in writing addition and doubling operations as a sequence of a unique pattern. This pattern is itself a sequence of operations over \(\mathbb {F}_p\). Since the pattern is unique, the same sequence of field operations is repeated for the addition and the doubling, the only difference being the number of times the pattern is applied for each operation. It thus becomes impossible to distinguish one operation from the other or even to identify the starting and ending of these operations.
To defeat an atomic implementation, the adversary needs to use advanced sidechannel attacks (see Sect. 2.1), such as DPA, CPA and so on. These attacks focus on the operations operands instead of only focusing on the kind of operations. They usually require more observations than for SPA since they rely on statistical analyses. In the ECC literature, such attacks have only been investigated in the vertical setting, where they can be efficiently prevented by input randomization.
3 Horizontal Collision Correlation Attack on ECC
We show hereafter that implementations combining atomicity and randomization techniques are in fact vulnerable to collision attacks in the horizontal setting. This raises the need for new dedicated countermeasures.
This section starts by recalling some basics on collision attacks. Then, assuming that the adversary is able to distinguish when two field multiplications have a common (possibly unknown) operand, we show how to exhibit flaws in the atomic algorithms proposed in [11, 18, 26]) and also in implementations using the unified formulas for Edward’s curves [5]. Eventually, we apply the collision attack presented in the first subsection to show how to efficiently deal with the previous assumption.
3.1 Collision Power Analysis in the Horizontal Setting
Collision power analysis
1.  Identify two elementary calculations \(\mathsf {C}_1(\cdot )\) and \(\mathsf {C}_2(\cdot )\) which are processed several times, say \(N\), with input(s) drawn from the same distribution(s). The correlation between the random variables \(O_1\) and \(O_2\) corresponding to the outputs of \(\mathsf {C}_1\) and \(\mathsf {C}_2\) must depend on the same secret subpart \(s\). 
2.  For each of the \(N\) processings of \(\mathsf {C}_1\) (resp. \(\mathsf {C}_2\)) get an observation \(\ell _{j}^1\) (resp. \(\ell _{j}^2\)) with \(j\in [1;N]\). 
3.  Compute the quantity:\( \hat{\rho }= \hat{\rho }\Big ( (\ell _{j}^1)_{j}, (\ell _{j}^2)_j\Big )\) 
4.  Deduce information on \(s\) from \(\hat{\rho }\). 
Remark 1
In Table 1, we use Pearson’s coefficient to compare the two samples of observations but other choices are possible (e.g. mutual information).
Remark 2
In order to deduce information on \(s\) from the knowledge of \(\hat{\rho }\), one may use for instance a Maximum Likelihood distinguisher (see a discussion on that point in Sect. 4).
In the next section, the attack in Table 1 is invoked as an Oracle enabling to detect whether two field multiplications share a common operand.
Assumption 1
The adversary can detect when two field multiplications have at least one operand in common.
In Sect. 3.3, we will come back to the latter hypothesis and will detail how it can indeed be satisfied in the particular context of ECC implementations on constrained systems.
3.2 Attacks on ECC Implementations: Core Idea
We start by presenting the principle of the attack on atomic implementations, and then on an implementation based on unified (addition and doubling) formulas over Edward’s curves.
We now show that the same idea can successfully be applied to attack the other atomic implementations proposed in the literature, namely those of Longa [26] and Giraud and Verneuil [18].
Remark 3
A second version of the patterns in Fig. 3 has been proposed in [18] which allows to save more field additions and negations without addition of dummy operations. This proposal share the same weakness as the previous ones and our attack still applies.
Remark 4
This technique still applies in the case of other unified formulas (e.g. those introduced in [10]). Indeed, the sequence of operations in [10] present the same weaknesses as Edward’s case. The multiplication \(X_1Z_1\) is performed twice if the current operation is a doubling (see the first and third multiplications in [10, Sect. 3, Fig. 1]).
3.3 Distinguishing Common Operands in Multiplications
In this section we apply the collision attack principle presented in Sect. 3.1 to show how an adversary may deal with Assumption 1. This will conclude our attack description. As mentioned before, we assume that the field multiplications are implemented in an arithmetic coprocessor with a Long Integer Multiplication (LIM) followed by a reduction. Many other multiplication methods exist but our attack can always be slightly adapted to also efficiently apply to those methods (see the full version of the paper).
Let \(\omega \) denote an architecture size (e.g. \(\omega \) equals \(8\), \(16\) or \(32\)) and let us denote by \((X[t], \cdots , X[1])_{2^{\omega }}\) the base\(2^{\omega }\) representation of an integer. We recall hereafter the main steps of the LIM when applied between two integers \(X\) and \(Y\).

Case (1) where the device processes LIM(X,W) and LIM(Y, Z) (all the operands are independent),

Case (2) where LIM(X, Z) and LIM(Y, Z) are processed (the two LIM processings share an operand).
3.4 Study of the Attack Soundness
 If \(O_1\) and \(O_2\) correspond to the internal multiplications during the processings of LIM(X, W) and LIM(Y, Z) respectively, then, for every \((a,b)\in [1;t]^2\), we have:$$\begin{aligned}&\ell _{a,b}^1 = \mathrm {HW}(x[a]\cdot w[b]) + b_{1,a,b} \end{aligned}$$(9)Since \(W\), \(X\), \(Y\) and \(Z\) are independent, the correlation coefficients in (7) and (8) tend towards \(0\) when \(t\) tends towards infinity.$$\begin{aligned}&\ell _{a,b}^2 = \mathrm {HW}(y[a]\cdot z[b]) + b_{2,a,b} \ . \end{aligned}$$(10)
 If \(O_1\) and \(O_2\) correspond to the internal multiplications during the processings of LIM(X, Z) and LIM(Y, Z) respectively, then we have:$$\begin{aligned}&\ell _{a,b}^1 = \mathrm {HW}(x[a]\cdot z[b]) + b_{1,a,b} \end{aligned}$$(11)Since the two multiplications share an operand, their results are dependent. In this case indeed, it can be proved that the correlation coefficients (7) and (8) satisfy:$$\begin{aligned}&\ell _{a,b}^2 = \mathrm {HW}(y[a]\cdot z[b]) + b_{2,a,b} \ . \end{aligned}$$(12)and$$ \hat{\rho }\Big ( (\ell _{a,b}^1)_{a,b} , (\ell _{a,b}^2)_{a,b} \Big ) \simeq \frac{1}{1+\frac{2^{2\omega +2}\sigma ^2 + (\omega 1)2^{2\omega }+2^{\omega }}{ 2.2^{2\omega } (2\omega +1) 2^\omega  1}} $$When \(t\) tends towards infinity, it may be noticed that the second correlation coefficient tends towards \(1\) (which is optimal).$$ \hat{\rho }\left( \Big (\frac{1}{t}\sum _a \ell _{a,b}^1\Big )_{b} , \Big (\frac{1}{t}\sum _a \ell _{a,b}^2\Big )_{b}\right) \simeq \frac{1}{1+\frac{1}{t}\frac{2^{2\omega +2}\sigma ^2 + (\omega 1)2^{2\omega }+2^{\omega }}{ 2.2^{2\omega } (2\omega +1) 2^\omega  1}}. $$
4 Experiments
Figures (5, 6, 7, 8) illustrate the spreading of the obtained Pearson’s coefficient around the mean value. This variance gives us information about the amount of trust we can put into the mean values. It also shows whether a distinction between the right hypothesis and the wrong one can easily be highlighted. For each SNR value (denoted by \(\tau \)) and each sample size \(t\), let us denote by \(\hat{\rho }_{0,t}(\tau )\) (resp. \(\hat{\rho }_{1,t}(\tau )\)) the random variable associated to the processing of (7) for \(s=0\) (resp. for \(s=1\)). In Figs. (5, 6, 7, 8), we plot estimations of the mean and variance of \(\hat{\rho }_{0,t}(\tau )\) and \(\hat{\rho }_{1,t}(\tau )\) for several pairs \((\tau ,t)\). Clearly, the efficiency of the attack described in Sect. 3 depends on the ability of the adversary to distinguish, for a fixed pair \((t,\tau )\), the distribution of \(\hat{\rho }_{0,t}(\tau )\) from that of \(\hat{\rho }_{1,t}(\tau )\). In other terms, once the adversary has computed a Pearson coefficient \(\hat{\rho }\) he must decide between the two following hypotheses; \(\mathrm {H}_0: \hat{\rho } \hookleftarrow \hat{\rho }_{0,t}(\tau )\) or \(\mathrm {H}_1: \hat{\rho } \hookleftarrow \hat{\rho }_{1,t}(\tau )\). For such a purpose, we propose here to apply a maximum likelihood strategy and to choose the hypothesis having the highest probability to occur. This led us to approximate the distribution of the coefficients \(\hat{\rho }_{0,t}(\tau )\) and \(\hat{\rho }_{1,t}(\tau )\) by a Gaussian distribution with mean and variance estimated in the Hamming weight model (as given in Figs. 5, 6, 7, 8). Attacks reported in Figs. 9 and 10 are done with this strategy.
Remark 5
Since the adversary is not assumed to know the exact leakage SNR, the maximum likelihood can be computed for several SNR values \(\tau \) starting from \(\infty \) to some predefined threshold. This problematic occurs each time that the principle of collision attacks is applied.
Remark 6
For a curve of size \(n=\lceil \log _2(p) \rceil \) and a \(\omega \)bit architecture, the adversary can have a sample of \(t=\lceil \frac{n}{\omega }\rceil \) observations if he averages over the columns and \(t=\lceil (\frac{n}{\omega })^2\rceil \) without averaging. All experiments provided in this section have been performed using the “average” strategy.
This attack works for any kind of architecture, even for a \(32\)bit one (see Fig. 10), which is the most common case in nowadays implementations. In the presence of noise, the attack success decreases highly but stays quite successful for curves of size \(160\), \(256\) and \(384\) bits. In all experiments (Figs. 9, 10), we also observe that the success rate of our attack increases when the size of the curve becomes larger. This behaviour can be explained by the increasing number of observations available in this case. Paradoxically, it means that when the theoretical level of security becomes stronger (i.e. \(p\) is large), resistance against sidechannel attacks becomes weaker. This fact stands in general for horizontal attacks and has already been noticed in [12, 33].
