Online template attacks
 813 Downloads
Abstract
Template attacks are a special kind of sidechannel attacks that work in two stages. In a first stage, the attacker builds up a database of template traces collected from a device which is identical to the attacked device, but under the attacker’s control. In the second stage, traces from the target device are compared to the template traces to recover the secret key. In the context of attacking elliptic curve scalar multiplication with template attacks, one can interleave template generation and template matching and reduce the amount of template traces. This paper enhances the power of this technique by defining and applying the concept of online template attacks, a general attack technique with minimal assumptions for an attacker, who has very very limited control over the template device. We show that online template attacks need only one power consumption trace of a scalar multiplication on the target device; they are thus suitable not only against ECDSA and static elliptic curve Diffie–Hellman (ECDH), but also against elliptic curve scalar multiplication in ephemeral ECDH. In addition, online template attacks need only one template trace per scalar bit and they can be applied to a broad variety of scalar multiplication algorithms. To demonstrate the power of online template attacks, we recover scalar bits of a scalar multiplication using the doubleandaddalways algorithm on a twisted Edwards curve running on a smartcard with an ATmega163 CPU.
Keywords
Sidechannel analysis Template attacks Scalar multiplication Elliptic curves1 Introduction
Sidechannel attacks exploit various physical leakages of secret information or instructions from cryptographic devices, and they constitute a constant threat for cryptographic implementations. We focus on power analysis attacks that exploit the power consumption leakage from a device running some cryptographic algorithm. Attacking elliptic curve cryptosystems (ECC) with natural protection against sidechannel attacks, e.g., implementations using Edwards curves, is quite challenging. This form of elliptic curves, proposed by Edwards in 2007 [17] and promoted for cryptographic applications by Bernstein and Lange [5], has several advantages compared to elliptic curves in Weierstrass form. For instance, the fast and complete formulas for addition and doubling make these types of curves more appealing for memoryconstrained devices and at the same time resistant to classical simple power analysis (SPA) techniques. Although considered a very serious threat against ECC implementations, differential power analysis (DPA), as proposed in [14, 31], cannot be applied directly to ECDSA or ephemeral elliptic curve Diffie–Hellman (ECDH) because the secret scalar is used only once. This is incompatible with the requirement of DPA to see large number of power traces of computations on the same secret data. In order to attack various asymmetric cryptosystems, new techniques that reside between SPA and DPA were developed, most notably collision [3, 18, 23, 40, 41, 44] and template attacks [15, 35, 39]. The efficiency of most of those collisionbased attacks is shown only on simulated traces; no practical experiments on real ECC implementations have verified these results. To the best of our knowledge, only two practical collisionbased attacks on exponentiation algorithms are published, each of which relies on very specific assumptions and deals with very special cases. Hanley et al. exploit collisions between input and output operations of the same trace [19]. Wenger et al. in [42] performed a hardwarespecific attack on consecutive rounds of a Montgomery ladder implementation. However, both attacks are very restrictive in terms of applicability to various ECC implementations as they imply some special implementation options, such as the use of LópezDahab coordinates, where field multiplications use the same keydependent coordinate as input to two consecutive rounds. In contrast, our attack is much more generic as it applies to arbitrary choices of curves and coordinates, and many scalar multiplication algorithms.
Related work Collision attacks exploit leakages by comparing two portions of the same or different traces to discover when values are reused. The Big Mac attack [41] is the first theoretical attack on public key cryptosystems, in which only a single trace is required to observe key dependencies and collisions during an RSA exponentiation. Witteman et al. in [43] performed a similar attack on the RSA modular exponentiation in the presence of blinded messages. Clavier et al. introduced in [13] horizontal correlation analysis, as a type of attack where a single power trace is enough to recover the private key. They also extended the Big Mac attack by using different distinguishers. Horizontal correlation analysis was performed on RSA using the Pearson correlation coefficient in [13] and triangular trace analysis of the exponent in [12].
The first horizontal technique relevant to ECC is the doubling attack, presented by Fouque and Valette in [18]. Homma et al. in [23] proposed a generalization of this attack to binary righttoleft, mary and slidingwindow methods. An attack proposed by Bauer et al. in [3] is a type of horizontal collision correlation attack on ECC, which combines atomicity and randomization techniques. A recent attack on ECC is horizontal crosscorrelation [20]; the approach is similar to [43] but uses only a single trace. The most recent horizontal attacks on ECC are singletrace attacks on a software implementation of scalar multiplication with precomputed points (for example, an mary implementation) [25]. The presented attacks target a single trace and employ clustering and correlation to recover scalar bits.
Template attacks are a combination of statistical modeling and power analysis attacks consisting of two phases, as follows. The first phase is the profiling or templatebuilding phase, where the attacker builds templates to characterize the device by executing a sequence of instructions on fixed data. The second phase is the matching phase, in which the attacker matches the templates to actual traces of the device. The attacker is assumed to possess a device which behaves the same as the target device, in order to build template traces while running the same implementation as the target. Medwed and Oswald demonstrated in [35] a practical template attack on ECDSA. However, their attack required an offline DPA attack on the EC scalar multiplication operation during the templatebuilding phase, in order to select the points of interest. They also need 33 template traces per key bit. Furthermore, attacks against ECDSA and other elliptic curve signature algorithms only need to recover a few bits of the ephemeral scalar for multiple scalar multiplications with different ephemeral scalars and can then employ lattice techniques to recover the longterm secret key [4, 15, 39]. This is not possible in the context of ephemeral ECDH: An attacker gets only a single trace and needs to recover sufficiently many bits of this ephemeral scalar from sidechannel information to be able to compute the remaining bits through, for example, Kangaroo techniques.
Another template attack on ECC is presented in [21]. This attack exploits register locationbased leakage using a highresolution inductive EM probe; therefore, the attack is considerably expensive to execute. A template attack on a wNAF ECC algorithm is presented in [45]. However, this attack is applied to an implementation that is not protected with either scalar randomization or basepoint randomization. Furthermore, contrary to our approach, all of the above attacks require multiple traces to construct a template.
This paper is an extended version of the original paper on online template attacks [2]. In the meantime, from the original paper on OTA until this extended version is written, two related works are published that verify the applicability of OTA on different curves. The first work from Dugardin et.al [16] performed OTA on Weierstrass (Brainpool and NIST) curves using EM emanations. A followup work from Özgen et. al [38] verified that OTA can successfully give the correct prediction on the scalar bit, by using distinguishers from machine learning (classification methods).
Our contribution In this paper, we introduce an adaptive template attack technique, which we call online template attacks (OTA). This technique is able to recover a complete scalar from only one power trace of a scalar multiplication using this scalar. The attack is characterized as online, because we create the templates after the acquisition of the target trace. While we use the same terminology, our attack is not a typical template attack; i.e., no preprocessing templatebuilding phase is necessary. Our attack functions by acquiring one target trace from the device under attack and comparing patterns of certain operations from this trace with templates obtained from the attacker’s device that runs the same implementation. Pattern matching is performed at suitable points in the algorithm, where key bitrelated assignments take place by using an automated module based on the Pearson correlation coefficient.
The attacker needs only very limited control over the device used to generate the online template traces. The main assumption is that the attacker can choose the input point to a scalar multiplication, an assumption that trivially holds even without any modification to the template device in the context of ephemeral ECDH. It also holds in the context of ECDSA, if the attacker can modify the implementation on the template device or can modify internal values of the computation. This is no different than for previous template attacks against ECDSA.

It does not require any cumbersome preprocessing templatebuilding phase, but a rather simple postprocessing phase.

It does not assume any previous knowledge of the leakage model.

It does not require full control of the device under attack.

It works against SPAprotected and to some extent DPAprotected implementations with unified formulas for addition and doubling.

Countermeasures such as scalar randomization and changing point representation from affine to (deterministic) projective representation inside the implementation do not prevent our attack.

It is applicable to the Montgomery ladder and to constanttime (lefttoright and righttoleft) exponentiation algorithms.

It is experimentally confirmed on an implementation of doubleandaddalways scalar multiplication on the twisted Edwards curve used in the Ed25519 signature scheme.

Our attack is a chosen input attack—it means that the adversary needs to control the input of a scalar multiplication (but not the scalar). Most ECC implementations use inputs in affine (or compressed affine) coordinates and internally convert to projective representation. In this paper, we show how to apply the attack if an attacker controls either the projective coordinates input or affine input (even if it is compressed).

Our attack works when the target trace and the template traces are acquired from the same device and also when the target trace is acquired from a different device than the template traces, as shown in our experimental results. Note that using the same device for both templates and target traces, would actually make the attack easier, because there would be no vertical or horizontal misalignment in the traces.
As mentioned in the previous subsection, this paper is an extended version of the original paper on online template attacks [2]. We present the theoretic primitives of the attack and verify our theory with new experiments with different types of input, namely with 256bit projective input, with the reduced 255bit projective coordinates and finally with affine coordinates. These experiments verify the applicability of OTA and provide a complete practical setting for this type of sidechannel attack.
Organization of the paper This paper is organized as follows. We introduce and explain OTA in Sect. 2. Section 3 gives specific examples of how the attack applies to different scalar multiplication algorithms. Section 4 presents our practical OTA on doubleandaddalways scalar multiplication. A discussion of how the proposed attack can be applied to implementations that include countermeasures that randomize the algorithm or operands is given in Sect. 5. Finally, Sect. 6 summarizes our contribution and concludes the paper.
2 Online template attacks
 1
The attacker obtains only one power trace of the cryptographic algorithm involving the targeted secret data. This trace is called the target trace. We call the device from which the target trace is obtained the target device. This property makes it possible to attack scalar multiplication algorithms with ephemeral scalar and with randomized scalar.
 2
The attacker is generating template traces after having obtained the target trace. These traces are called (online) template traces.
 3
The attacker obtains the template traces on the target device or a similar device^{1}with very limited control over it, i.e., access to the device to run several executions with chosen public inputs. The attacker does not rely on the assumption that the secret data are the same for all template traces.
 4
At least one assignment in the exponentiation algorithm is made depending on the value of particular scalar bit(s), but there are no branches with keydependent computations. Since we are attacking the doubling operation, this keydependent assignment should be during doubling. As a counterexample, we note that the binary righttoleft addalways algorithm for Lucas recurrences [29] is resistant to the proposed attack, because the result of the doubling is stored in a nonkeydependent variable.
2.1 Attack description
Template attacks consist of two phases, template building for characterizing the device and template matching, where the characterization of the device together with a power trace from the device under attack is used to determine the secret [34]. Therefore, the first condition of our proposed attack is typically fulfilled by all attacks of this kind.
It is well known that template attacks against scalar multiplication can generate templates “onthefly,” i.e., interleaving the templatebuilding and matching phases. See, for example, [35, Sec. 5.3]. We take this idea further by building templates after the target trace has been obtained (condition 2). The attacker, being able to do things in this order, needs only limited control over the target device. Moreover, the attacker is not affected by randomization of the secret data during different executions of the algorithm, since he always has to compare his template traces with the same target trace.
The basic idea consists of comparing the target trace and an online template trace while executing scalar multiplication and then finding similar patterns between them, based on hypothesis on a bit for a given operation. The target trace is obtained only once with input \(\varvec{P}\). For every bit of the scalar, we need to obtain an online template trace with input \(k \varvec{P}, k\in \mathbb {Z}\), where k is chosen as a function of our hypothesis on this bit. The attack requires to send a different point to the device, thus generating a template trace for each of these points. Each template trace should be then compared with the part of the target trace that corresponds to the manipulated bit. However, due to jitter, for example, it may be not easy to determine that part. Therefore, we compare the template trace with the target trace at each sample offset.
We performed pattern matching for our traces using an automated module based on the Pearson correlation coefficient, \(\rho (X,Y)\), which measures the linear relationship between two variables X and Y. For power traces, the correlation coefficient shows the relationship between two points of the trace, which indicates the leakage of keydependent assignments during the execution of a cryptographic algorithm. The leakage can be due to differences in Hamming weight or Hamming distance of the variables, but the exact leakage model does not affect online template attacks in any way. Extensions to other leakage models and distinguishers are straightforward. Our pattern matching corresponds to a list of the correlation coefficients that show the relationship between all samples from the template trace to the same consecutive amount of samples in the target trace. If our hypothesis on the given key bit is correct, then the pattern match between our traces at the targeted operation will be high (in our experiments it reached \(99\%\)).
In this way, we can recover the first i bits of the key. Knowledge of the first i bits provides us with complete knowledge of the internal state of the algorithm just before the \((i+1)\)th bit is processed. Since at least one operation in the loop depends on this bit, we can make a hypothesis about the \((i+1)\)th bit, compute an online template trace based on this hypothesis and correlate this trace with the target trace at the relevant predetermined point of the algorithm.
A separate question is how many templates need to be created per attacked bit. In this paper, we show that only a single template trace per key bit is sufficient if a correct template can be safely recognized from any incorrect template, see Sect. 4 for examples. Essentially, for each experiment we establish a correlation threshold to recognize correct templates from incorrect ones; then, we can create one template (for bit 0, for example), and depending on the template correlation and the threshold, we can recover the scalar bit. If the difference between the correlation for correct and incorrect templates is sufficiently large, then the threshold can be learned either through profiling or by computing two templates (for 0 and 1) in the first few iterations.
Furthermore, note that if a bit is incorrectly identified then all subsequent templates would not match with the target trace. In case this happens, then it is possible to backtrack to the last successful matching and restart the attack.
3 Applying the attack to scalar multiplication algorithms
3.1 Attacking the lefttoright doubleandaddalways algorithm
Assuming that the first \((i1)\) bits of k are known, we can derive the ith bit by computing the two possible states of \(\varvec{R}_0\) after this bit has been treated and recover the key iteratively. Note that only the assignment in the ith iteration depends on the key bit \(k_{i}\), but none of the computations do, so we need to compare the trace of the doubling operation in the \((i+1)\)th iteration with our original target trace. To decide whether the ith bit of k is zero or one, we compare the trace that the doubling operation in the \((i+1)\)th iteration would give for \(k_{i+1} = 0\) with the target trace. For completeness, we can compare the target trace with a trace obtained for \(k_{i+1} = 1\) and verify that it has lower pattern match percentage; in this case, the performed attack needs two online template traces per key bit. However, if during the acquisition phase the noise level is low and the signal is of good quality, we can perform an efficient attack with only our target trace and a single trace for the hypothetical value of \(\varvec{R}_{k_{i+1}}\).
Note that the method above assumes that one template trace is acquired to recover a single bit. It is possible to acquire multiple template traces to recover multiple bits at the same time, for example, three template traces can be produced to recover 2 bits at once. However, attacking single bits is more efficient in terms of storage of template traces and offline precomputation than attacking a group of bits. This is an advantage of building online templates compared to the usual template attacks. In particular, sequentially attacking 2 bits requires 2 template traces; if a template is similar to the target trace, then the bit is guessed correctly; otherwise, the bit is incorrect. Attacking both bits at once requires 3 template traces (the fourth choice can be implied if none of the 3 templates matches the attacked trace). In general, attacking n bits simultaneously requires an offline computation of \((2^{n}1)\) template traces.
3.2 Attacking the righttoleft doubleandaddalways algorithm
Attacking the righttoleft doubleandaddalways algorithm of [27] is a type of keydependent assignment OTA. We target the doubling operation and note that the input point will be doubled either in the first (if \(k_0 = 0\)) or in the second iteration of the loop (if \(k_0 = 1\)). If k is fixed, we can easily decide between the two by inputting different points, since if \(k_0 = 1\) we will see the common operation \(2 \, \varvec{O}\). If the k is not fixed, we simply measure the first two iterations and again use the operation \(2 \, \varvec{O}\) if the template generator should use the first or second iteration. Once we are able to obtain clear traces, the attack itself follows the general description of Sect. 2. If we assume that the first i bits of k are known and we wish to derive the \((i+1)\)th bit, this means that we know the values of \(\varvec{R}_0\) and \(\varvec{R}_1\) at the start of the \((i+1)\)th iteration. By making a hypothesis on the value of the \((i+1)\)th key bit, we can decide according to the matching percentage if \(\varvec{R}_0\) or \(\varvec{R}_1\) was used.
3.3 Attacking the Montgomery ladder
\(k = 100\)  \(k = 110\) 

\(\varvec{R}_{\varvec{0}} = \varvec{P}, \varvec{R}_{\varvec{1}} = 2 \varvec{P}\)  \(\varvec{R}_{\varvec{0}} = \varvec{P}, \varvec{R}_{\varvec{1}} = 2 \varvec{P}\) 
\(b=1: \varvec{R}_{\varvec{1}} = 3 \varvec{P}, \varvec{R}_{\varvec{0}} = 2 \varvec{P}\)  \(b=0: \varvec{R}_{\varvec{0}} = 3 \varvec{P}, \varvec{R}_{\varvec{1}} = 4 \varvec{P}\) 
\(b=1: \varvec{R}_{\varvec{1}} = 5 \varvec{P}, \varvec{R}_{\varvec{0}} = 4 \varvec{P}\)  \(b=1: \varvec{R}_{\varvec{1}} = 7 \varvec{P}, \varvec{R}_{\varvec{0}} = 6 \varvec{P}\) 
3.4 Attacking SideChannel Atomicity
There are certain choices of coordinates and curves where this approach can be deployed by using unified or complete addition formulas for the group operations. For example, the Jacobi form [33] and Hessian [30] curves come with a unified group law and Edwards curves [8, 9] even have a complete group law. For Weierstrass curves, Brier and Joye suggest an approach for unified addition in [10].
Simple atomic algorithms do not offer any protection against online template attacks, because the regularity of point operations does not prevent mounting this sort of attack. The point \(2\varvec{P}\), as output of the third iteration of Algorithm 4, will produce a power trace with very similar pattern to the trace that would have the point \(2\varvec{P}\) as input. Therefore, the attack will be the similar as the one described in Sect. 3.1; the only difference is that instead of the output of the second iteration of the algorithm, we have to focus on the pattern of the third iteration. In general, when an attacker forms a hypothesis about a certain number of bits of k, the hypothesis will include the point in time where \(\varvec{R}_0\) will contain the predicted value. This will mean that an attacker would have to acquire a larger target trace to allow all hypotheses to be tested.
4 Experimental results
This section presents our experimental results. Firstly, in Sect. 4.1 we describe the attacked implementation and the measurement setup that we use to perform attacks. Then, we present experimental results of an OTA with extended projective coordinates of 256bit in Sect. 4.2; this is the usual input value for our smart card. In Sect. 4.3, we present OTA on extended projective coordinates with reduced 255bit input. Finally, Sect. 4.4 presents an OTA applied to input points with affine compressed coordinates. All the attacks target are performed iteratively bit by bit, and they five most significant bits of the scalar.
4.1 Target implementation and experimental setup
To validate feasibility and efficiency of our proposed method, we attack an elliptic curve scalar multiplication implementation running on an “ATmega card,” i.e., an ATmega163 microcontroller [1] in a smart card. To illustrate that our attack also works if the template device is not the same as the target device, we used two different smart cards: one to obtain the target trace and one to obtain the online template traces.
Our measurement setup uses a Picoscope 5203^{2} with sampling rate of 125M samples per second for both target trace and online template traces. This oscilloscope has limited acquisition memory buffer to 32M samples. Since 5 iterations of the scalar multiplication algorithm take around 235 ms, it means that with sampling rate of 125M samples per second we can record a trace of approximately 29.4M samples.
We modified the software to perform a doubleandaddalways scalar multiplication (see Algorithm 1). The whole underlying field and curve arithmetic is the same as in [24]. This means in particular that points are internally represented in extended coordinates as proposed in [22]. In this coordinate system, a point \(\varvec{P} = (x,y)\) is represented as (X : Y : Z : T) with \(x = X/Z, y = Y/Z\) and \(x\cdot y = T/Z\).
4.2 Online template attack with 256bit projective input
In this subsection, we describe how to apply an OTA if the input supplied to the scalar multiplication is in extended projective coordinates, i.e, if the attacker has full control over all coordinates of the starting point. This is a realistic assumption if a protocol avoids inversions entirely and protects against leakage of projective coordinates by randomization as proposed in [37, Sec. 6]. Recall that for extended coordinates, T is fully determined by X, Y and Z; they are an extension of standard projective coordinates.
The attack targets the output of the doubling operation. We performed pattern matching for our traces as described in Sect. 2.1. In this way, we could determine the leakage of keydependent assignments during the execution of the algorithm.
We first demonstrate how to attack a single bit, and then, we present our results from recovering the five most significant unknown bits of the scalar (recall that the highest bit is always set to one; see Algorithm 1). The remaining bits can be attacked iteratively in the same way as described in Sect. 2.1; as stated above, we were not able to do so due to technical limitations of our measurement setup.
In fact, we will see that the correlation between the correct template trace and the target trace is so much higher than between the wrong template trace and the target trace that just one of the two template traces is sufficient to determine the second bit of k. This is depicted in Figs. 2 and 3; all figures are taken with the six most significant bits of k set to 100110. Figure 2 shows power traces of the second iteration of the target trace (brown) and the first iteration of the \(2\varvec{P}\) template trace, i.e., the matching template trace. Figure 3 shows power traces of the second iteration of the target trace (brown) and the first iteration of the \(3\varvec{P}\) template trace, i.e., the nonmatching template trace.
The results presented so far are obtained while attacking one single bit of the exponent. When we attack five bits with one acquisition, we observe lower numbers for pattern matching for both the correct and the wrong scalar guess. The correlation results for pattern matching are not so high, mainly due to the noise that is occurring in our setup during longer acquisitions. This follows from the fact that our power supply is not perfectly stable during acquisitions that are longer than 200 ms. However, the difference between correct and wrong assumptions is still remarkable as depicted in Figs. 5, 6, 7, 8 and 9, showing the OTA on five scalar bits \(k = 100110\) at once.^{3} The templates are always acquired during the first iteration and the target trace contains all five iterations; note that the input points for the templates depend on the already recovered exponent bits.
Correct bit assumptions have 84–88% matching patterns, while the correlation for the wrong assumptions drops to 50–72%. Therefore, we can set a threshold for recognizing a bit to be at \(80\%\).
Note that the attack with projective inputs does not make any assumptions on formulas used for elliptic curve addition and doubling. In fact, we carried out the attack for specialized doubling and for doubling that use the same unified addition formulas as addition. The results were similar, and all traces shown above are from the experiments that used unified addition formulas for both addition and doubling.
4.3 Online template attack with 255bit projective input
In the previous section, for simplicity, we deliberately ignored the case of coordinates reduction in the field, in order to make the concept of the attack clear. The implementation that we attack, for the sake of efficiency, operates on 256bit coordinates and not 255bit coordinates from the field \(\mathbb {F}_{2^{255}19}\). The 256bit coordinates correspond the coordinates from \(\mathbb {F}_{2^{255}19}\) by applying the modulo \(2^{255}19\) operation; by using 256 bits, the implementation can save time by not performing some modulo operations.
So far, we assumed that we can send to the card the optimized 256bit coordinates. It is interesting to examine a more complex attack scenario in which we can only input the 255bit coordinates. In this section, we show that OTA is successful in this scenario too, a fact that makes OTA a powerful attack technique independent of the prime p of the field used. Fast modular reduction is implemented in [24] by using simple shifts and additions, which are relatively cheap on AVRs.

\(\hbox {MSB} = 0\) for the Z projective coordinate (the remaining coordinates can have the most signification bit equal to 1); therefore, the Z coordinate after reduction remains the same. In this case, OTA can be applied in a similar way as in Sect. 4.2. We take the old template coordinates and perform a reduction in all the coordinates modulo our prime number \(2^{255}19\); then, we send those coordinates as input to the card to obtain the new templates.

The Z coordinate has \(\hbox {MSB} = 1\), and therefore, there is a 9 bits difference from the corresponding 256bit coordinate. In this case, the reduced point differs from its 256bit equivalent in the MSB and in the least significant byte due to the pseudomersenne prime that we use (i.e., \(2^{255}19\)). This case is the most interesting and we will analyze in the remaining part of this section.
Figure 10 presents the pattern match between a template trace during computation of \(D \leftarrow Z^2\) with template with 1 bit difference, 9 bit difference or wrong template (iteration 1) to the target trace (iteration 2). As expected, the highest peak corresponds to the template with only 1 bit difference, the slightly smaller peak correspond to the 9 bits difference and the lowest peak corresponds to a wrong template. The results obtained from this attack are similar to the previous section, and therefore, we do not present the correlation figures for all 5 bits.
The results above suggest that the templates with 9 bits difference are sufficient for a successful attack although the correlation values are slightly affected. However, for a more noisy setup, 9 incorrect bits may lower the correlation too much. Therefore, we concentrate on an attack that allows only a single bit to be incorrect.
Successful key guesses (for the templates that have at most 1 bit difference) give correlation values between 81 and \(86\%\), while unsuccessful ones are below \(76\%\). The success and unsuccessful rates are different than in Sect. 4.2 because now we concentrate on a single squaring and not the whole doubling. Furthermore, we used different cards for this attack, because one of the cards used for experiments reported in Sect. 4.2 got broken.
4.4 Online template attack with affine input
The attack as explained in the previous sections makes the assumption that the attacker has a full control over the input in projective coordinates. Most implementations of ECC use inputs in affine (or compressed affine) coordinates and internally convert to projective representation. The input is now given as (x, y) and at the beginning of the computation converted to (x : y : 1 : xy). We observe that the points \(\varvec{P}, 2\varvec{P}\) and \(3\varvec{P}\) do not have any coordinates in common with the projective representations used internally. Already after the first iteration of the doubleandaddalways loop, \(Z=1\) does not hold anymore. Those attacks are more elaborate, since the internal point representation changes at every step of the algorithm.
The main idea of the attack is to focus on the first multiplication \((Y_1  X_1)(Y_2X_2)\), where in case of a doubling would be \((YX)^2\). We give now a detailed description on how to generate the necessary templates for \((YX)^2\).
Let us assume that the target trace with the point \(\varvec{P}\) is already acquired and that we attack bit \(b_i\), where \(0 \le i < x\). Firstly, depending on already recovered bits of scalar \(b_x, \dots b_{i+1}\) (at the beginning we only know that the most significant bit \(b_x\) is 1), the coordinates of P and the bit guess \(b_i \in \{0,1\}\), we can compute the intermediate value \(\lambda =YX\) that is squared in Step 1 (Fig. 11) during acquisition of the target trace. Secondly, we search for a new point \(\varvec{P}^{\varvec{i}}=(x',y')\) such that \(Y'X'=\lambda \).^{6} Such a point \(\varvec{P}^{\varvec{i}}\) cannot always be found on the curve, but we can flip the least significant bit of \(\lambda \) and check whether this point belongs to the curve. If this fails, we flip the bit back and then flip the second least significant bit of \(\lambda \); we continue this way with subsequent least significant bits until we find a point on the curve. From our experiments, we succeed in finding a point on the curve in a maximum of five trials. Such a point will differ from \(\lambda \) on at most 1 bit (in least significant byte of the coordinate).
Figure 13 shows the pattern match between a template trace during computation of \(A \leftarrow (YX)^2\) with input point \(\varvec{P}[\varvec{k}_{{\varvec{x}}}\varvec{1} = \varvec{0}]\) (iteration 1) to the target trace for \(\varvec{P}\) (iteration 2) and the pattern match between the template trace (iteration 1) for \(\varvec{P}[\varvec{k}_{{\varvec{x}}}\varvec{1} = \varvec{1}]\) to the target trace (iteration 2).
We notice that the trace obtained from the point \(\varvec{P}[\varvec{k}_{{\varvec{x}}}\varvec{2} = \varvec{0}]\) is almost identical to the pattern obtained from the target trace; as expected, the correlation is \(86\%\) for the correct key guess and under \(73\%\) for the incorrect one.
Since we know the two most significant we can continue the attack for next bits. We repeat the attack for the 5 most significant bit (in total we will know 6 most significant bits since the most significant is always 1).
The correlation is 84–87% for the correct key guesses. For the nonmatching template point, the correlation value of the matching patterns is at most \(73\%\).
5 Countermeasures and future work
Coron’s first and second DPA countermeasures result in scalar or point being blinded to counteract the statistical analysis of DPA attacks [14]. Given that an attacker needs to predict the intermediate state of an algorithm at a given point in time, we can assume that the countermeasures that are used to prevent DPA will also have an effect on the OTA. All proposed countermeasures rely on some kind of randomization, which can be of either a scalar, a point or the algorithm itself. However, if we assume that the attacker has no technical limitations, i.e an oscilloscope with enough memory to acquire the power consumption during an entire scalar multiplication, it would be possible to derive the entire scalar being used from just one acquisition. Therefore, if one depends on scalar blinding [14, 32], this method provides no protection against our attack, as the attacker could derive a value equivalent to the exponent.
There are methods for changing the representation of a point, which can prevent OTA and make the result unpredictable to the attacker. Most notably those countermeasures are randomizing the projective coordinates, as proposed in [37, Sec. 6] and randomizing the coordinates through a random field isomorphism as described in [28]. However, inserting a point in affine coordinates and changing to (deterministic) projective coordinates during the execution of the scalar multiplication (compressing and decompressing of a point) do not affect our attack.
We aim exclusively at the doubling operation in the execution of each algorithm. Since most of the blinding techniques are based on the cyclic property of the elliptic curve groups, attacking the addition operation in practice would be an interesting future research topic.
6 Conclusions
In this paper, we presented a new sidechannel attack technique, which can be used to recover the private key during a scalar multiplication on ECC with only one target trace and one online template trace per bit. Our attack succeeds against a protected target implementation with unified formulas for doubling and adding and against implementations where the point is given in affine coordinates and changes to projective coordinates representation. By performing our attack on two physically different devices, we showed that keydependent assignments leak, even when there are no branches in the cryptographic algorithm. This fact enhances the feasibility of OTA and validates our initial claim that one target trace is enough to recover the secret scalar.
Footnotes
 1.
By similar device, we mean the same type of microcontroller running the same algorithm. Observe that the target device may be the same as the target one.
 2.
 3.
Observe that the correlation for the incorrect template in Fig. 6 is slightly lower than in the other figures. This can be explained not only by noise, but also by different degrees of similarity between the incorrect inputs for the template traces and the intermediate points for the target trace. Nonetheless, the important fact is that all correlations for the incorrect templates are much lower than the correlations for the correct templates.
 4.
This property follows from the fact that the Z coordinate of \(2\varvec{P}\) during the conversion to extended coordinates is always set to 0x01 while the Z coordinate of the point P after being squared in the first iteration of the exponentiation loop does not equal 0x01 with overwhelming probability.
 5.
 6.
Note that we cannot use \(\varvec{P}(X,Y,Z)\) “freely,” because now \(Z\ne 1\).
References
 1.Batina, L., Chmielewski, L., Papachristodoulou, L., Schwabe, P., Tunstall, M.: Online template attacks. In: Progress in Cryptology—INDOCRYPT 2014—15th International Conference on Cryptology in India, New Delhi, India, December 14–17, 2014, Proceedings, pp. 21–36 (2014)Google Scholar
 2.Bauer, A., Jaulmes, E., Prouff, E., Wild, J.: Horizontal collision correlation attack on elliptic curves. In: Lange, T., Lauter, K., Lisonek, P. (eds.) Selected Areas in Cryptography—SAC 2013, volume 8282 of LNCS, pp. 553–570. Springer, Berlin (2014)Google Scholar
 3.Bauer, A., Jaulmes, E., Prouff, E., Wild, J.: Horizontal collision correlation attack on elliptic curves. In: Lange, T., Lauter, K., Lisonek, P. (eds.) Selected Areas in Cryptography–SAC 2013, volume 8282 of LNCS, pp. 553–570. Springer, Berlin (2014)Google Scholar
 4.Bernstein, D.J., Birkner, P., Joye, M., Lange, T., Peters, C.: Twisted Edwards curves. In: Vaudenay, S. (ed.) Progress in Cryptology—AFRICACRYPT 2008, volume 5023 of LNCS, pp. 389–405. Springer, Berlin (2008). http://cr.yp.to/papers.html#twisted
 5.Bernstein, D.J., Duif, N., Lange, T., Schwabe, P., Yang, B.Y.: Highspeed highsecurity signatures. In: Preneel, B., Takagi, T. (eds.) Cryptographic Hardware and Embedded Systems—CHES 2011, volume 6917 of LNCS, pp. 124–142. Springer, Berlin (2011). See also full version [6]Google Scholar
 6.Bernstein, D.J., Duif, N., Lange, T., Schwabe, P., Yang, B.Y.: Highspeed highsecurity signatures. J. Cryptogr. Eng. 2(2):77–89 (2012). http://cryptojedi.org/papers/#ed25519, see also short version [5]
 7.Bernstein, D.J., Lange, T.: Faster addition and doubling on elliptic curves. In: Kurosawa, K. (ed.) Advances in Cryptology—ASIACRYPT 2007, volume 4833 of LNCS, pp. 29–50. Springer, Berlin (2007). http://cr.yp.to/papers.html#newelliptic
 8.Bernstein, D.J., Lange, T., Farashahi, R.R.: Binary Edwards curves. In: Oswald, E., Rohatgi, P. (eds.) Cryptographic Hardware and Embedded Systems—CHES 2008, volume 5154 of LNCS, pp. 244–265. Springer, Berlin (2008). http://cr.yp.to/papers.html#edwards2
 9.Brier, É., Joye, M.: Weierstraß elliptic curves and sidechannel attacks. In: Naccache, D., Paillier, P. (eds.) Public Key Cryptography, volume 2274 of LNCS, pp. 335–345. Springer, Berlin (2002). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.8.5691&rep=rep1&type=pdf
 10.ChevallierMames, B., Ciet, M., Joye, M.: Lowcost solutions for preventing simple sidechannel analysis: sidechannel atomicity. IEEE Trans. Comput. 53(6), 760–768 (2004)Google Scholar
 11.ChevallierMames, B., Ciet, M., Joye, M.: Lowcost solutions for preventing simple sidechannel analysis: sidechannel atomicity. IEEE Trans. Comput. 53(6), 760–768 (2004)CrossRefzbMATHGoogle Scholar
 12.Clavier, C., Feix, B., Gagnerot, G., Roussellet, M., Verneuil, V.: Horizontal correlation analysis on exponentiation. In: Soriano, M., Qing, S., López, J. (eds.) Information and Communications Security, volume 6476 of LNCS, pp. 46–61. Springer, Berlin (2010). http://eprint.iacr.org/2003/237
 13.Coron, J.S.: Resistance against differential power analysis for elliptic curve cryptosystems. In: Koç, Ç.K., Paar, C. (eds.) Cryptographic Hardware and Embedded Systems—CHES’99, volume 1717 of LNCS, pp. 292–302. Springer, Berlin (1999). http://saluc.engr.uconn.edu/refs/sidechannel/coron99resistance.pdf
 14.Atmel Corporation. ATMEL AVR32UC Technical Reference Manual. ARM Doc Rev.32002F (2010). http://www.atmel.com/images/doc32002.pdf
 15.Dugardin, M., Papachristodoulou, L., Najm, Z., Batina, L., Danger, J.L., Guilley, S.: Dismantling realworld ECC with horizontal and vertical template attacks. In: Constructive SideChannel Analysis and Secure Design—7th International Workshop, COSADE 2016, Graz, Austria, April 14–15, 2016 (2016)Google Scholar
 16.Edwards, H.M.: A normal form for elliptic curves. In: Koç, Ç.K., Paar, C. (eds.) Bulletin of the American Mathematical Society, vol. 44, pp. 393–422 (2007). http://www.ams.org/journals/bull/20074403/S0273097907011536/home.html
 17.Fouque, P.A., Valette, F.: The doubling attack—why upwards is better than downwards. In: Walter, C.D., Koç, Ç.K., Paar, C. (eds.) Cryptographic Hardware and Embedded Systems—CHES 2003, volume 2779 of LNCS, pp. 269–280. Springer, Berlin (2003). www.ssi.gouv.fr/archive/fr/sciences/fichiers/lcr/fova03.pdf
 18.Hanley, N., Kim, H., Tunstall, M.: Exploiting collisions in addition chainbased exponentiation algorithms using a single trace. Cryptology ePrint Archive, Report 2012/485 (2012). http://eprint.iacr.org/2012/485/
 19.Hanley, N., Kim, H., Tunstall, M.: Exploiting collisions in addition chainbased exponentiation algorithms using a single trace. In: Nyberg, K. (ed.) Topics in Cryptology—CTRSA 2015, volume 9048 of LNCS, pp. 431–448. Springer, Berlin (2015)Google Scholar
 20.Hanley, N., Kim, H., Tunstall, M.: Exploiting collisions in addition chainbased exponentiation algorithms using a single trace. In: Nyberg, K. (ed.) Topics in Cryptology–CTRSA 2015, volume 9048 of LNCS, pp. 431–448. Springer, Berlin (2015)Google Scholar
 21.Heyszl, J., Mangard, S., Heinz, B., Stumpf, F., Sigl, G.: Localized electromagnetic analysis of cryptographic implementations. In: Dunkelman, O. (ed.) Topics in Cryptology–CTRSA 2012, volume 7178 of LNCS, pp. 231–244. Springer, Berlin (2012)Google Scholar
 22.Homma, N., Miyamoto, A., Aoki, T., Satoh, A., Shamir, A.: Collisionbased power analysis of modular exponentiation using chosenmessage pairs. In: Oswald, E., Rohatgi, P. (eds.) Cryptographic Hardware and Embedded Systems—CHES 2008, volume 5154 of LNCS, pp. 15–29. Springer, Berlin (2008). http://www.aoki.ecei.tohoku.ac.jp/crypto/pdf/CHES2008_homma.pdf
 23.Hutter, M., Schwabe, P.: NaCl on 8bit AVR microcontrollers. In: Youssef, A., Nitaj, A. (eds.) Progress in Cryptology—AFRICACRYPT 2013, volume 7918 of LNCS, pp. 156–172. Springer, Berlin (2013). http://cryptojedi.org/papers/#avrnacl
 24.Järvinen, K., Balasch, J.: Singletrace sidechannel attacks on scalar multiplications with precomputations. In: Smart Card Research and Advanced Applications—CARDIS 2016 (2016)Google Scholar
 25.Joye, M.: Smartcard implementation of elliptic curve cryptography and DPAtype attacks. In: Quisquater, J.J., Paradinas, P., Deswarte, Y., El Kalam, A.A. (eds.) Smart Card Research and Advanced Applications VI, volume 135 of IFIP International Federation for Information Processing, pp. 115–125. Springer, Berlin (2004)Google Scholar
 26.Joye, M.: Highly regular righttoleft algorithms for scalar multiplication. In: Paillier, P., Verbauwhede, I. (eds.) Cryptographic Hardware and Embedded Systems—CHES 2007, volume 4727 of LNCS, pp. 135–147. Springer, Berlin (2007)Google Scholar
 27.Joye, M.: Highly regular righttoleft algorithms for scalar multiplication. In: Paillier, P., Verbauwhede, I. (eds.) Cryptographic Hardware and Embedded Systems—CHES 2007, volume 4727 of LNCS, pp. 135–147. Springer, Berlin (2007). https://www.iacr.org/archive/ches2007/47270135/47270135.pdf
 28.Joye, M.: Smartcard implementation of elliptic curve cryptography and DPAtype attacks. In: Quisquater, J.J., Paradinas, P., Deswarte, Y., El Kalam, A.A. (eds.) Smart Card Research and Advanced Applications VI, volume 135 of IFIP International Federation for Information Processing, pp. 115–125. Springer, Berlin (2004)Google Scholar
 29.Joye, M.: Highly regular righttoleft algorithms for scalar multiplication. In: Paillier, P., Verbauwhede, I. (eds.) Cryptographic Hardware and Embedded Systems–CHES 2007, volume 4727 of LNCS, pp. 135–147. Springer, Berlin (2007)Google Scholar
 30.Joye, M., Quisquater, J.J.: Hessian elliptic curves and sidechannel attacks. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) Cryptographic Hardware and Embedded Systems–CHES 2001, volume 2162 of LNCS, pp. 402–410. Springer, Berlin (2001)Google Scholar
 31.Kocher, P.C.: Timing attacks on implementations of Diffie–Hellman, RSA, DSS, and other systems. In: Koblitz, N. (ed.) Advances in Cryptology—CRYPTO ’96, volume 1109 of LNCS, pp. 104–113. Springer, Berlin (1996). http://www.cryptography.com/public/pdf/TimingAttacks.pdf
 32.Liardet, P.Y., Smart, N.P.: Preventing SPA/DPA in ECC systems using the Jacobi form. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) Cryptogaphic Hardware and Embedded Systems—CHES 2001, volume 2162 of LNCS, pp. 391–401. Springer, Berlin (2001)Google Scholar
 33.Liardet, P.Y., Smart, N.P.: Preventing SPA/DPA in ECC systems using the Jacobi form. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) Cryptogaphic Hardware and Embedded Systems–CHES 2001, volume 2162 of LNCS, pp. 391–401. Springer, Berlin (2001)Google Scholar
 34.Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks: Revealing the Secrets of Smart Cards (Advances in Information Security). Springer, New York (2007)zbMATHGoogle Scholar
 35.Montgomery, P.L.: Speeding the Pollard and elliptic curve methods of factorization. Math. Comput. 48(177):243–264 (1987). http://www.ams.org/journals/mcom/198748177/S00255718198708661137/S00255718198708661137.pdf
 36.De Mulder, E., Hutter, M., Marson, M.E., Pearson, P.: Using Bleichenbacher”s solution to the hidden number problem to attack nonce leaks in 384bit ECDSA. In: Bertoni, G., Coron, J.S. (eds.) Cryptographic Hardware and Embedded Systems—CHES 2013, volume 8086 of LNCS, pp. 435–452. Springer, Berlin (2013). https://online.tugraz.at/tug_online/voe_main2.getvolltext?pCurrPk=71281
 37.Naccache, D., Smart, N.P., Stern, J.: Projective coordinates leak. In: Cachin, C., Camenisch, J. (eds.) Advances in Cryptology—EUROCRYPT 2004, volume 3027 of LNCS, pp. 257–267. Springer, Berlin (2004) https://www.iacr.org/archive/eurocrypt2004/30270258/projective.pdf
 38.Özgen, E., Papachristodoulou, L., Batina, L.: Classification algorithms for template matching. In: IEEE International Symposium on Hardware Oriented Security and Trust, HOST 2016, McLean, VA, USA (2016, to appear)Google Scholar
 39.Römer, T., Seifert, J.P.: Information leakage attacks against smart card implementations of the elliptic curve digital signature algorithm. In: Attali, I., Jensen, T. (eds.) Smart Card Programming and Security, volume 2140 of LNCS, pp. 211–219. Springer, Berlin (2001)Google Scholar
 40.Schramm, K., Wollinger, T., Paar, C.: A new class of collision attacks and its application to DES. In: Johansson, T. (ed.) Fast Software Encryption, volume 2887 of LNCS, pp. 206–222. Springer, Berlin (2003). https://www.emsec.rub.de/research/publications/newclasscollisionattacksanditsapplicationde/
 41.Walter, C.D.: Sliding windows succumbs to Big Mac attack. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) Cryptographic Hardware and Embedded Systems—CHES 2001, volume 2162 of LNCS, pp. 286–299. Springer, Berlin (2001). https://www.comodo.com/resources/research/cryptography/CDW_Ches2001.ps
 42.Wenger, E., Korak, T., Kirschbaum, M.: Analyzing sidechannel leakage of RFIDsuitable lightweight ECC hardware. In: Hutter, M., Schmidt, J.M. (eds.) Radio Frequency Identification, volume 8262 of LNCS, pp. 128–144. Springer, Berlin (2013). https://online.tugraz.at/tug_online/voe_main2.getvolltext?pCurrPk=71289
 43.Witteman, M.F., van Woudenberg, J.G.J., Menarini, F.: Defeating RSA multiplyalways and message blinding countermeasures. In Kiayias, A. (ed.) Topics in Cryptology—CTRSA 2011, volume 6558 of LNCS, pp. 77–88. Springer, Berlin (2011). https://www.riscure.com/benzine/documents/rsacc_ctrsa_final.pdf
 44.Yen, S.M., Ko, L.C., Moon, S., Ha, J.: Relative doubling attack against montgomery ladder. In: Won, D.H., Kim, S. (eds.) Information Security and Cryptology—ICISC 2005, volume 3935 of LNCS, pp. 117–128. Springer, Berlin (2005). http://islab.hoseo.ac.kr/jcha/paper/ICISC2005.pdf
 45.Zhang, Z., Wu, L., Mu, Z., Zhang, X.: A novel template attack on wnaf algorithm of ECC. In: 2014 Tenth International Conference on Computational Intelligence and Security (CIS), pp. 671–675. IEEE (2014)Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.