Improving recent side-channel attacks against the DES key schedule

Recent publications consider side-channel attacks against the key schedule of the Data Encryption Standard (DES). These publications identify a leakage model depending on the XOR of register values in the DES key schedule. Building on this leakage model, we first revisit a discrete model which assumes that the Hamming distances between subsequent round keys leak without error. We analyze this model formally and provide theoretical explanations for observations made in previous works. Next we examine a continuous model which considers more points of interest and also takes noise into account. The model gives rise to an evaluation function for key candidates and an associated notion of key ranking. We develop an algorithm for enumerating key candidates up to a desired rank which is based on the Fincke–Pohst lattice point enumeration algorithm. We derive information-theoretic bounds and estimates for the remaining entropy and compare them with our experimental results. We apply our attack to side-channel measurements of a security controller. Using our enumeration algorithm we are able to significantly improve the results reported previously for the same measurement data.


Introduction
Publications by Wagner et al. [8,[14][15][16][17] attempt side-channel attacks against the key schedule of the Data Encryption Standard (DES), which are further investigated in [7]. They conduct template attacks against several microcontrollers and demonstrate that the entropy of the 56-bit DES keys can be reduced to 48 bits on average in their experimental setting.
In this article we consider the leakage model identified in aforementioned works. The model assumes that information about the XOR of register values in the DES key schedule leaks.
First we revisit a discrete model examined in [14], which assumes that the Hamming distances between subsequent round keys leak without error. We analyze this model formally and provide theoretical explanations for observations made in previous works. Next we examine a continuous model which considers more points of interest (POI) and B Andreas Wiemers andreas.wiemers@bsi.bund.de Johannes Mittmann johannes.mittmann@bsi.bund.de 1 Bundesamt für Sicherheit in der Informationstechnik (BSI), Bonn, Germany also takes noise into account. The parameters of this model can be learned in a profiling phase using linear regression. The model gives rise to an evaluation function for key candidates and an associated notion of key ranking. We develop an algorithm for enumerating key candidates up to a desired rank which is based on the Fincke-Pohst lattice point enumeration algorithm [4].
We apply our attack to side-channel measurements provided by the authors of [7]. The measurements are obtained from an implementation without countermeasures. In the profiling phase we use nearly 900,000 measurements to learn the parameters of our model and in the attack phase we use averages of several hundred measurements. Using our enumeration algorithm we are able to explicitly compute the ranks of the correct keys and find that the entropy of the DES keys is reduced to 15 bits on average and below 21 bits in 75% of the considered cases. Furthermore, we conduct a series of experiments on simulated measurements in different noise regimes.
We derive information-theoretic bounds and estimates for the remaining entropy and compare them with our experimental results. Our bounds and heuristics may be used by evaluators as theoretical tools for assessing side-channel leakage of DES implementations. The considered leakage model is quite general and should be adaptable to typical cryptographic implementations on security controllers. Moreover, the information-theoretic bounds are independent of the DES algorithm and depend only on the leakage model. Therefore, we believe that our theoretical results could be useful for assessing side-channel leakage of other cryptographic implementations.
Fortunately, our attack becomes infeasible in the presence of large noise. Therefore it is possible to design effective countermeasures against this attack based on randomization (e.g. masking) and/or limited key usage.
The original DES cipher can already be broken by exhaustive key search, but the use of Triple DES is still considered practically secure for some applications. Although more secure block ciphers such as AES are recommended in general, the use of Triple DES remains quite widespread (e.g. in electronic payment systems). Therefore, many security controllers still offer hardware support for (Triple) DES today and assessing the security of these implementations remains an important topic. We note that our attack extends to Triple DES by applying it to the three DES invocations individually and by using the classic meet-in-the-middle approach.
We denote by 0 n = (0, . . . , 0) ∈ R n the all-zero-vector, by 1 n = (1, . . . , 1) ∈ R n the all-one-vector, by I n ∈ R n×n the identity matrix, and by 0 m,n ∈ R m×n the zero matrix. The Euclidean norm of a vector v ∈ R n is denoted by v .
Let a = a 1 · · · a n ∈ {0, 1} n be a bit-string. Depending on the context, we identify a with the (column) vector (a 1 , . . . , a n ) ∈ R n or the (big-endian represented) integer n i=1 a i 2 n−i ∈ {0, 1, . . . , 2 n − 1}. The bit-wise XOR of a, b ∈ {0, 1} n is denoted by a ⊕ b. The bit-wise complement of a ∈ {0, 1} n is denoted by a := a ⊕ 1 n , the cyclic left-shift (rotation) of a by k ∈ Z positions is denoted by a ≪ k := a 1+k · · · a n+k (where indices are to be interpreted modulo n with representatives in [n]), and the Hamming weight of a is denoted by wt(a) := n i=1 a i .

DES key schedule
The Data Encryption Standard (DES) is defined in [11]. In this article we are only concerned with the DES key schedule, which we describe below.
For simplicity and without loss of generality, we assume that DES keys are represented by k = (c, d) ∈ {0, 1} 56 , where c, d ∈ {0, 1} 28 denote the contents of the Cand Dregister after the map PC-1 (permuted choice 1) has been applied to the actual DES master key KEY (i.e. c, d correspond to C 0 , D 0 in the notation of [11]).
The DES round keys k 1 , . . . , k 16 ∈ {0, 1} 48 are derived from k = (c, d) as follows. We write c = c 1 · · · c 28 and d = d 1 · · · d 28 . In each round i ∈ [16], the values of the Cand D-registers are cyclically shifted (i.e. rotated) by 1 or 2 positions to the left. The number δ(i) of shifts in round i is given by δ = The accumulated number ρ(i) of shifts (modulo 28) up to round i is given by ρ = i.e. we have ρ(1) = δ (1) and The values of the Cand Dregisters in round i are therefore given by where the indices are to be interpreted modulo 28 (with representatives in [28]).
We denote by M σ := {9, 18, 22, 25} and M τ := {7, 10, 15, 26} the sets of elements in [28] which are "missing" from the images of σ and τ , respectively. The round keys are finally defined as k i := PC-2(c i , d i ) for i ∈ [16]. Written in terms of the original key k = (c, d), we have where the indices are again to be interpreted modulo 28.

Leakage models
We consider variations of the leakage models identified in previous works [7,8,[14][15][16][17]. The models assume that the keydependent leakage originates from updates (c i+1 , d i+1 ) ← (c i , d i ) of the Cand D-registers and/or updates k i+1 ← k i of the round-key register for i ∈ [15]. Moreover, it is assumed that the leakage stemming from a bit transition b ← a in those register updates depends only on a ⊕ b (XOR leakage) for a, b ∈ {0, 1}. Let a ∈ {0, 1} 28 be one half of a DES key (c, d) ∈ {0, 1} 56 in the Cor D-register. By (2), (3), and (4), the bit transitions occurring in the DES key schedule are of the form a i+1 ← a i (shift-1 transitions) or a i+2 ← a i (shift-2 transitions) for some i ∈ [28] and with indices interpreted modulo 28. Hence we may assume that the leakage depends only on a i ⊕ a i+1 and a i ⊕ a i+2 or, equivalently, only on (−1) a i ⊕a i+1 and (−1) a i ⊕a i+2 for all i ∈ [28]. Since shift-1 transitions appear in 3 rounds and shift-2 transitions in 12 rounds of the key schedule (cf. (1)), it is conceivable that shift-2 transitions will have a higher impact on the total leakage.
Based on this discussion, we introduce explanatory variables for the leakage models as follows. For a shift k ∈ {1, 2} and a key half a ∈ {0, 1} 28 , we define the vector Written differently, we have Furthermore, we define the stacked vectors for all key halves a ∈ {0, 1} 28 and full keys (c, d) ∈ {0, 1} 56 . The components of (a) are illustrated in Fig. 1. The vector (c, d) captures all possible bit transitions in the key schedule of (c, d) and will serve as explanatory variable for the leakage models.
where denotes componentwise multiplication. A vector x ∈ {±1} 28 is in the image of k iff x has an even number of positive components iff x has an even number of negative components iff 28 i=1 x i = 0 (mod 4). The kernel of k is the cyclic group generated by 1 28 . In particular, we have Now we can define the general form of the leakage models under consideration. We restrict ourselves to one of the simplest conceivable settings in which the leakage for a key (c, d) is given by an R-linear function of (c, d) and a keyindependent error term.
Leakage Model 1 (General model) Let m ≥ 1, let W ∈ R m×112 be a fixed weight matrix, and let K = (C, D) be a uniformly distributed random variable on {0, 1} 56 . We define the random variable Y on R m by where ε is a zero-mean random variable on R m which is independent of K .
We refer to realizations y ∈ R m of Y as observations, to realizations of ε in R m as errors or noise, and to m as the number of points of interest (POIs). The following lemma collects some general properties of the random variables in Leakage Model 1.

Hamming weight model
In this section we consider a discrete leakage model, whose observations consist of the (centered) Hamming distances between subsequent round keys and are error-free. This model was already examined in [14,Section 5].
Leakage Model 2 (Hamming weight model) Let K = (C, D) be a uniformly distributed random variable on {0, 1} 56 and let K 1 , . . . , K 16 be the random variables on {0, 1} 48 derived from K as defined by Eq. (4). We define the random variable Y on Z 15 by The components Y i take values in [−24, 24] ∩ Z.
This leakage model is a special instance of Leakage Model 1 with m = 15, a weight matrix W ∈ {− 1 2 , 0} 15×112 , and error ε = 0 15 . The weight matrix W is completely determined by the model assumptions and will be derived in Sect. 3.1.

Remark 2
In the case of noisy measurements, the error can be reduced by averaging repeated measurements for a fixed key. If the maximum norm of the error vector is less than 1 2 , an exact observation as in (8) can be recovered from the noisy version by rounding each component to the nearest integer.

Determination of the weight and covariance matrix
Let K = (C, D) and Y be the random variables as defined in Leakage Model 2. We want to determine a weight matrix where the elements in the shifted sets ρ(i) + M σ and ρ(i) + M τ are to be interpreted modulo 28 (with representatives in [28]). From (9) the weight matrix W ∈ {− 1 2 , 0} 15×112 can be easily read off, see Fig. 2.

Key ranking and key enumeration
Let y ∈ Z 15 be an observation under Leakage Model 2 corresponding to an unknown key the set of key candidates for observation y. The rank of k * is defined as Note that R(k * ) is a multiple of 4 (cf. Remark 1). We call log 2 R(k * ) the logarithmic key rank of k * . At first glance, enumerating the set C( y) looks like a 56-bit (or 54-bit) problem. However, we can apply a meet-in-themiddle approach (cf. [14,Section 5]). By Lemma 1(a), we have the decomposition where . This leads to the following simple enumeration procedure.

Compute the lists
and sort them by the second component of their elements (e.g. using the lexicographical order on Z 15 ).

Set
(Since L 1 and L 1 are ordered by the second component of their elements, the lists only have to be traversed once in order to find all collisions in the second component.) Return C and stop.

Experiments
In order to estimate the expected logarithmic key rank, we implemented Algorithm 1 in the Julia programming language [2] and conducted 1000 trials. For each trial we chose a random DES key and calculated the associated observation vector y. Then we enumerated all candidates (c, d) such that y = W (c, d). Each attempt took approximately 140 seconds of single-core computing time on a standard computer. The results of the experiments are given in Table 1.
We observe that in one half of all cases the logarithmic key rank is less than 16. With such low logarithmic key ranks we note that the classic meet-in-the-middle approach against 3-key Triple DES has very moderate running time. For average keys we can expect a running time of roughly 2 32 DES encryptions/decryptions.

Theoretical estimation of the remaining entropy
The conditional entropy H(C, D | Y ) is an informationtheoretic measure for the expected logarithmic key rank, which we call remaining entropy. We have

A lower bound for the remaining entropy
The following lemma provides an upper bound for H(Y ).
where λ > 0 is the smallest eigenvalue of .
(by the "trace trick") Using the Poisson summation formula (cf.
Since y y ≥ λ y 2 for all y ∈ R m , we have finishing the proof.

Remark 3
We note that [1, Lemma (1.5) (i)] implies a better bound for the term m log 2 (. . .) of Eq. (13) in general. However, the bound of Lemma 2 is sufficient for our purposes.
Applying Lemma 2 to Leakage Model 2, we obtain λ ≈ 0.65 and H(Y ) ≤ 41.73 by numerical methods. We also note that the term (14) in (13) is negligible for this random variable. We obtain the lower bound

A heuristic for the remaining entropy
Based on the experiments reported in Sect. 3.3, we propose the heuristic formula for the remaining entropy. Each component Y i of Y is a sum of independent random variables and we can certainly approximate the distribution of each Y i by a continuous normal distribution. But is it a valid approximation, if we replace the distribution of Y by a 15-dimensional normal distribution?
Let m ≤ n, let X be a uniformly distributed random variable on {±1} n , and let A ∈ R m×n be a matrix of full row rank such that AX takes values in Z m . We have the following general properties: where U ∈ R m×m and V ∈ R n×n are orthogonal matrices and D ∈ R m×n is a rectangular diagonal matrix with non-negative elements on the diagonal. This representation of A easily implies that AX takes values in an m-dimensional ellipsoid with semiaxes equal to the non-zeros elements of D times √ n. The volume of this ellipsoid is where V m (1) denotes the volume of the m-dimensional ball with radius 1.
• The heuristic argumentation based on the singular value decomposition: (i) If the volume of this ellipsoid is smaller than 2 n , then we can expect that all integer points of this ellipsoid occur in the support of AX.
(ii) The components of V X have expectation 0 and variance 1. Therefore, the bulk of the support of AX lies in a smaller ellipsoid with semiaxes equal to the nonzero elements of D times √ m. The volume of this smaller ellipsoid is (iii) Furthermore, if (i) is valid, we expect that the discrete distribution of AX is "similar" to the continuous distribution AZ, where Z is normally distributed with covariance matrix I n . AZ is therefore normally distributed with covariance matrix AA . The entropy of AZ is well known and given by the formula Note that the approaches (ii) and (iii) lead to very similar approximations, since

Distribution of the remaining entropy
Why has the remaining entropy in the experiments in Sect. 3.3 such a large variation? Each y i is a realization of a binomially distributed random variable. If y i takes on extreme values near ±24, we have a large amount of information about the key (c, d). On the other hand, for y i = 0 there are many candidates for (c, d).
As argued in Sect. 3.4.2, we expect that in our case This leads to the following approximation of the remaining entropy H(C, D | Y = y) for fixed y: In approximation the remaining entropy depends only on −1/2 y . If −1/2 y is small, the remaining entropy H(C, D | Y = y) is large ("strong keys"). If −1/2 y is large, the remaining entropy H(C, D | Y = y) is small ("weak keys"). We expect that the largest remaining entropy should occur near y = 0 15 . The largest number of candidates we found experimentally was indeed

Isolated consideration of the C-and D-register
As a natural approach one might try to find the values of the Cand D-register separately. In the publication [7], for instance, the authors discuss a template attack on even smaller parts of the Cand D-registers. Here we want to clarify that this reduces the amount of mutual information significantly. We consider the following general strategy: 1. Define an appropriate evaluation function that depends only on the key part c in the C-register. Find a set C of likely candidates for the C-register.

Define an appropriate evaluation function that depends
only on the key part d in the D-register. Find a set D of likely candidates for the D-register.

Check all combinations
The work load of this approach is bounded by 2 27 in step 1 and 2, but in step 3 we have to check all combinations. We can now give an easy to compute indication of how successful such an approach could be. The random variable W 2 ( D) takes on certain values in a 15-dimensional space and its entropy H(W 2 ( D)) is clearly bounded by 27. By Lemma  Now we use the following heuristic. The success of an evaluation function that depends only on the key part c of the C-register should be restricted to the mutual information of C and the random variable Applying Lemma 2 to Y 1 , we get Using this approximation, we obtain

Linear regression model
In this section we consider a continuous leakage model, whose observations cover more points of interest but may contain errors. The weight matrix of this model is not derived by theoretical considerations, but must be learned in a profiling phase using linear regression.

Leakage Model 3 (Linear regression model)
Let m ≥ 112, let W ∈ R m×112 be a fixed weight matrix of full column rank, and let K = (C, D) be a uniformly distributed random variable on {0, 1} 56 . We define the random variable Y on R m by where ε is a zero-mean normal random variable on R m with covariance matrix σ 2 I m which is independent of K .

Remark 5
The leakage of a real implementation might not strictly follow this leakage model. For instance, there might be a non-linear key-dependent influence on the leakage and the error might follow a different distribution or be keydependent as well. One may attempt to fit real measurements better to this model by recentering and decorrelating the measurements. Moreover, the error can be reduced by averaging over several observations for the same key, see Sect. 4.3.1.

Remark 6
Leakage Model 3 is expressed in the setting of the DES algorithm. However, in principle it should be adaptable to other algorithms on security controllers. As a protection against side-channel attacks, an implementation should use many operations in parallel if key-dependent information is being processed. In our leakage model each observation is a large, weighted sum of certain key-dependent bits that is disturbed by noise.

Key ranking and key enumeration
Let y ∈ R m be an observation under Leakage Model 3 corresponding to an unknown key k * = (c * , d * ) ∈ {0, 1} 56 , i.e. we have y = W (c * , d * ) + ε for some unknown noise vector ε ∈ R m . We also assume for the moment that we know the weight matrix W ∈ R m×112 (in Sect. 4.2 we describe how W can be estimated). We define the evaluation function We denote by the set of key candidates for observation y with error bound B ∈ R ≥0 . The rank of the correct key k * with respect to W and y is defined as Note that R W , y (k * ) is a multiple of 4 (cf. Remark 1). We call log 2 R W , y (k * ) the logarithmic key rank of k * . A quick check for assessing the quality of η W , y can be obtained by testing the condition η W , y (c j , d j ) ≤ η W , y (c * , d * ) for several random key candidates (c j , d j ) ∈ {0, 1} 56 with j = 1, . . . , N . If N is large and R W , y (k * ) is not too small, we can expect Next we develop an algorithm to enumerate the set C W ( y, B) based on the Fincke-Pohst lattice point enumeration algorithm [4]. In principle, this algorithm explores the whole key space {0, 1} 56 (or {0, 1} 54 ), but in many instances the search tree can be pruned considerably.
The following preparatory lemma shows that the weight matrix W ∈ R m×112 can be replaced by an upper triangular matrix R ∈ R 112×112 . At the same time, the observation vector y is projected onto the range of W .  2 for all x ∈ R 112 .

Remark 7
Consider the situation of Lemma 3. By the thin/reduced QR factorization of W , there exists a unique matrix Q ∈ R m×112 with orthonormal columns and a unique upper triangular matrix R ∈ R 112×112 with positive diagonal elements such that W = Q R (cf.   (c, d) and the columns of W have to be reordered accordingly (before applying Lemma 3). Let k = k 1 · · · k 56 = (c, d) = c 1 · · · c 28 d 1 · · · d 28 ∈ {0, 1} 56 . First we choose a permutation π : [56] → [56] that determines the order k π(1) , k π(2) , . . . , k π(56) in which we want to traverse the bits of k in the enumeration procedure. By Remark 1 we can keep one bit of c and one bit of d fixed, so we move those bits to the front and do not change them during the enumeration. For example, we may choose π(1) = 1 and π(2) = 29, hence k π(1) = c 1 and k π(2) = d 1 .
We proceed by choosing bits as to maximize the number of components in (k) that are determined by the current choices. In other words, we pick nodes in the graph of Fig. 1 such that the number of edges between the chosen nodes is maximized. For example, we may choose π = 1 2 3 4 · · · 28 29 30 31 · · · 55 56 1 29 2 3 · · · 27 28 30 31 · · · 55 56 .
Next we determine a permutation matrix P ∈ R 112×112 and integers s 1 = s 2 = 113 > s 3 > · · · > s 56 = 1 such that for all ∈ [56] the components of x := P (k) ∈ {±1} 112 , which are determined by k π(1) , . . . , k π( ) , are the trailing components x s , . . . , x 112 of x. In our example, we have Applying Lemma 3 to the column-permuted matrix W P and the observation y ∈ R m , we obtain an upper triangular matrix R ∈ R 112×112 and t ∈ R 112 such that Therefore, we have k ∈ C W ( y, B) if and only if Note that ρ depends on the components x s , . . . , x 112 of x = P (k) which are determined completely by k π(1) , . . . , k π( ) . Furthermore, ρ can be computed recursively, since ρ 1 = ρ 2 = 0 and Using the backtracking scheme described in [9, Section 7.2.2, Algorithm B], we obtain the following algorithm.

Algorithm 2
Input: A matrix W ∈ R m×112 of full column rank, a vector y ∈ R m , and a bound B ∈ R ≥0 . Output: The set of key candidates C W ( y, B).
Otherwise return C and stop.

Remark 8
We note some possible variations and optimizations of Algorithm 2.
i,i for 3 ≤ ≤ 56. By using these recurrence relations and by reusing values of σ i,h that are still valid during the enumeration, the partial squared norms ρ can be computed with fewer operations. For details, see Algorithm 3 in the appendix. (b) Using pruning [12,13], we can heuristically reject partial assignments k π(1) · · · k π( ) during the enumeration if the partial squared norm ρ is already so large that ρ 56 ≤ B becomes unlikely for any choice of k π( +1) · · · k π(56) . This can be done by replacing the if-condition "ρ ≤ B" in step 4 by "ρ ≤ B " for suitable bounds 0 ≤ B 3 ≤ · · · ≤ B 56 = B. In our experiments we used the bounds if 37 ≤ ≤ 56.
Note that we cannot use extreme pruning [5], since we want to find all (or almost all) key candidates in C W ( y, B). (c) For a given weight matrix W , the running time of Algorithm 2 may be optimized by choosing different permutations π and P (in compliance with the conditions outlined above). Preprocessing W with general unimodular transformations (e.g. lattice basis reduction, cf. [4, (2.12)]) seems not possible in our setting. (d) To find N best key candidates for the evaluation function η W , y , Algorithm 2 can be modified as follows. We start the algorithm with B := ∞. The set C is replaced by a list that is ordered by η W , y and keeps only the N best key candidates. Each time a key candidate gets evicted from C, the bound B can be updated according to the currently worst key candidate in C.

Estimation of the weight matrix
The weight matrix W ∈ R m×112 of Leakage Model 3 can be estimated in a profiling phase using linear regression if observations for several known keys are available. Let N prf 112. Assume we are given observations y prf, j ∈ R m of Leakage Model 3 for known and randomly chosen keys k prf, j = (c prf, j , d prf, j ) ∈ {0, 1} 56 for j ∈ [N prf ]. We denote by X prf ∈ {±1} 112×N prf the matrix with columns x prf, j := (c prf, j , d prf, j ) and by Y prf ∈ R m×N prf the matrix with columns y prf, j for j ∈ [N prf ].
We want to find an approximation W ∈ R m×112 of W such that Y prf ≈ W X prf . Since the error vector in Leakage Model 3 has independent components, we may estimate the rows of W independently. Let i ∈ [m] and let y prf,i ∈ R 1×N prf denote the i-th row of Y prf . We approximate the i-th row of W by a least squares estimate, i.e. by a vector w ∈ R 1×112 minimizing Since N prf 112, we may assume that X prf has full row rank and X prf X prf ∈ R 112×112 is non-singular. This implies that (24) is minimized by the (unique) vector w i := y prf,i X prf (X prf X prf ) −1 ∈ R 1×112 (cf. [6, Section 5.3.1]). Combining the estimated rows of the weight matrix, we obtain the matrix Since the "true", unknown weight matrix W is assumed to have full column rank, we may hope that the same holds for the estimated weight matrix W . We just mention that this was indeed the case in our experiments with real measurements reported in Sect. 4.3.1.

Experiments
We performed experiments using real and simulated measurements.

Real measurements
The authors of [7] provided us with their measurement data. The provided data set consists of a profiling set and an attack set. The measurements are already aligned and trimmed to m = 460 points of interest.
The profiling set comprises N prf = 882547 measurements of DES operations with random keys k prf, j = (c prf, j , d prf, j ) ∈ {0, 1} 56 for j ∈ [N prf ]. We denote by Y prf ∈ R m×N prf the matrix of measurements (arranged in columns) and by X prf ∈ {±1} 112×N prf the matrix with columns (c prf, j , d prf, j ).
The attack set comprises N att = 247088 measurements of DES operations with random keys k att, j ∈ {0, 1} 56 for j ∈ [288], where N att, j ∈ {761, . . . , 927} measurements have been performed with key k att, j . (The authors of [7] also carried out measurements for so-called weak keys, but we do not consider them in this article.) We denote by Y att, j ∈ R m×N att, j the matrix of measurements with key k att, j (arranged in columns) for j ∈ [288].
In order to fit the measurement data to Leakage Model 3, we preprocessed the data sets as follows. Using the mean of the measurements Y prf of the profiling set, we centered Y prf and Y att, j by replacing Using the empirical covariance matrix of the (centered) measurements Y prf of the profiling set, we decorrelated Y prf and Y att, j (Mahalanobis whitening) by replacing Finally, we computed the averaged measurements Due to a slight shift in the averaged measurements, we also recentered them amongst each other by replacing Using the preprocessed data of the profiling set, we computed an estimate W ∈ R m×112 of the weight matrix according to Leakage Model 3 as in (25). A matrix plot of W is shown in Fig. 4. The plot illustrates the locations where updates of the Cand D-register (rotation by 1 resp. 2 positions) take place. In particular, it is visible that the measurement covers a DES-encryption followed by a full DES-decryption, presumably as a countermeasure against fault attacks. The upper half and the mirrored lower half of the plot bears some resemblance with the weight matrix of Leakage Model 2 (cf. Fig. 2). This visual structure of W is a first indication that Leakage Model 3 is adequate for the measurements.
We implemented Algorithm 2 in the Julia programming language [2] with the optimizations of Remark 8(a) and (b). We computed the key ranks for 287 of the 288 averaged measurements y i on a standard computer by explicit enumeration. The distribution of the computed ranks R i and the single-core running times is described in Table 2.
One half of the computed ranks are below 2 15 and 75% of them are below 2 21 . The key enumerations finished in under 7 minutes in one half of the cases using a single CPU-core.   The key ranks were computed explicitly using Algorithm 2 with the optimizations of Remark 8(a) and (b) measurements in the profiling phase is insufficient, the estimated weight matrix may differ significantly from the "true" weight matrix. If the number of measurements in the attack phase is insufficient, the noise of the averaged measurements may be too large.

Simulated measurements
In order to investigate the influence of the error distribution on the key rank, we performed a series of experiments with simulated measurements in different noise regimes. For the simulated measurements, we used the weight matrix W ∈ R m×112 with m = 460 estimated from the real measurements as described in Sect. 4.3.1 (cf. Fig. 4) and generated the observations as samples from Leakage Model 3, where the keys were drawn uniformly from {0, 1} 56 and the errors were drawn from a centered normal distribution on R m with covariance matrix σ 2 I m . In contrast to Sect. 4.3.1, we used the observations directly without averaging over several observations.
For each σ ∈ {0.02, 0.03, . . . , 0.07}, we generated 100 observations and computed the corresponding key ranks explicitly using our implementation of Algorithm 2 with the optimizations of Remark 8(a) and (b). The distributions of the computed ranks and the single-core running times are described in Table 3.
Comparing the distributions of Tables 2 and 3, we recognize that the averaged observations in Sect. 4.3.1 behave similarly to observations of Leakage Model 3 with σ ≈ 0.07.
For larger values of σ , we have resorted to the Monte-Carlo heuristic (22). For each σ ∈ {0.2, 0.3, . . . , 0.7}, we generated 100 observations and estimated the corresponding key ranks using the Monte-Carlo heuristic (22) with an appropriate number N of random keys. The distribution of the estimated ranks is described in Table 4. The key ranks were estimated using the Monte-Carlo heuristic (22) with N random keys

Theoretical estimation of the remaining entropy
Similar to the discrete case we use mutual information as a measure for the uncertainty about the key if an observation is given. However, mutual information of continuous random variables should be treated with care, since some of its properties are different compared to the discrete case (cf. Since W W and W W have identical non-zero eigenvalues, we get and the assertion follows.   Table 3 Fig. 7 Graph of the remaining entropy estimate (26) as a function of σ together with the logarithmic key ranks estimated in the experiments reported in Table 4 Based on the experiments in Sect. 4.3.2, we propose the heuristic formula for the remaining entropy. Figures 6 and 7 compare this heuristic with the results of the experiments with simulated measurements reported in Tables 3 and 4, respectively.

Isolated consideration of the C-and D-register
In Sect. 3.5 we looked at an approach that considers the Cand D-register separately. We argued that the mutual information is much lower in this setting. However, in the continuous case with error the situation is different. On the one hand, we have an additional error so that each observation gives less information compared to Leakage Model 2. On the other hand, we have much more POIs in Leakage Model 3. The general strategy is again as follows: 1. Define an appropriate evaluation function that depends only on the key part c in the C-register. Find a set C of likely candidates for the C-register. The work load of this approach is again bounded by 2 27 in step 1 and 2, but in step 3 we have to check all combinations. Here we consider the following heuristic. We replace the random variable ( D) by a normal distributed random variable N 2 with mean 0 m and covariance matrix I m in Y , i.e. we set As an indication of the success of an evaluation function that depends only on the key part c of the C-register, we compute the mutual information of C and the observation Y 1 We assume that (26) We expect that the work load of step 3 is roughly of size 2 H(C|Y 1 )+H( D|Y 2 ) in the algorithm above, where Y 2 is defined analogously for the D-register. Applying heuristic (28) to the weight matrix W estimated in Sect. 4.3.1 for the real measurements and σ = 0.07 as estimated in Sect. 4.3.2, we obtain H(C | Y 1 ) ≈ 6.55 and H( D | Y 2 ) ≈ 16.64. We computed the ranks of the key parts in the Cand D-register with respect to η W , y i ,1 and the analogously defined evaluation function η W , y i ,2 , respectively. The distribution of the computed ranks is described in Table 5. The average logarithmic key ranks were 6.55 and 17.26 for the Cand D-register in good agreement with heuristic (28).

Remark 10
The key ranks in the experiments vary a lot. Therefore, in practice, the sets C and D of likely candidates have to be chosen larger than 2 H(C|Y 1 ) and 2 H( D|Y 2 ) in step 1 and 2, respectively, or one has to accept that the algorithm finds the correct combined key only with a certain probability. Algorithm 2 in Sect. 4.1 does not have this drawback.
Acknowledgements We thank the authors of [7] for providing us with their measurement data.
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.

Appendix: An optimized version of Algorithm 2
The following algorithm is a variation of Algorithm 2 with the optimization described in Remark 8(a).

Algorithm 3
Input: A matrix W ∈ R m×112 of full column rank, a vector y ∈ R m , and a bound B ∈ R ≥0 . Output: The set of key candidates C W ( y, B).  , d), (c, d), (c, d), (c, d)}, and go to step 6. Otherwise set k π( ) ← 0 and v ← max{v −1 , v }.