Improving recent side-channel attacks against the DES key schedule

Wiemers, Andreas; Mittmann, Johannes

doi:10.1007/s13389-021-00279-2

Improving recent side-channel attacks against the DES key schedule

Regular Paper
Open access
Published: 04 December 2021

Volume 13, pages 1–17, (2023)
Cite this article

Download PDF

You have full access to this open access article

Journal of Cryptographic Engineering Aims and scope Submit manuscript

Improving recent side-channel attacks against the DES key schedule

Download PDF

2327 Accesses
2 Citations
Explore all metrics

Abstract

Recent publications consider side-channel attacks against the key schedule of the Data Encryption Standard (DES). These publications identify a leakage model depending on the XOR of register values in the DES key schedule. Building on this leakage model, we first revisit a discrete model which assumes that the Hamming distances between subsequent round keys leak without error. We analyze this model formally and provide theoretical explanations for observations made in previous works. Next we examine a continuous model which considers more points of interest and also takes noise into account. The model gives rise to an evaluation function for key candidates and an associated notion of key ranking. We develop an algorithm for enumerating key candidates up to a desired rank which is based on the Fincke–Pohst lattice point enumeration algorithm. We derive information-theoretic bounds and estimates for the remaining entropy and compare them with our experimental results. We apply our attack to side-channel measurements of a security controller. Using our enumeration algorithm we are able to significantly improve the results reported previously for the same measurement data.

An Optimal Key Enumeration Algorithm and Its Application to Side-Channel Attacks

Fast and Memory-Efficient Key Recovery in Side-Channel Attacks

A Heuristic Approach to Assist Side Channel Analysis of the Data Encryption Standard

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Publications by Wagner et al. [8, 14,15,16,17] attempt side-channel attacks against the key schedule of the Data Encryption Standard (DES), which are further investigated in [7]. They conduct template attacks against several microcontrollers and demonstrate that the entropy of the 56-bit DES keys can be reduced to 48 bits on average in their experimental setting.

In this article we consider the leakage model identified in aforementioned works. The model assumes that information about the XOR of register values in the DES key schedule leaks.

First we revisit a discrete model examined in [14], which assumes that the Hamming distances between subsequent round keys leak without error. We analyze this model formally and provide theoretical explanations for observations made in previous works. Next we examine a continuous model which considers more points of interest (POI) and also takes noise into account. The parameters of this model can be learned in a profiling phase using linear regression. The model gives rise to an evaluation function for key candidates and an associated notion of key ranking. We develop an algorithm for enumerating key candidates up to a desired rank which is based on the Fincke–Pohst lattice point enumeration algorithm [4].

We apply our attack to side-channel measurements provided by the authors of [7]. The measurements are obtained from an implementation without countermeasures. In the profiling phase we use nearly 900,000 measurements to learn the parameters of our model and in the attack phase we use averages of several hundred measurements. Using our enumeration algorithm we are able to explicitly compute the ranks of the correct keys and find that the entropy of the DES keys is reduced to 15 bits on average and below 21 bits in 75% of the considered cases. Furthermore, we conduct a series of experiments on simulated measurements in different noise regimes.

We derive information-theoretic bounds and estimates for the remaining entropy and compare them with our experimental results. Our bounds and heuristics may be used by evaluators as theoretical tools for assessing side-channel leakage of DES implementations. The considered leakage model is quite general and should be adaptable to typical cryptographic implementations on security controllers. Moreover, the information-theoretic bounds are independent of the DES algorithm and depend only on the leakage model. Therefore, we believe that our theoretical results could be useful for assessing side-channel leakage of other cryptographic implementations.

Fortunately, our attack becomes infeasible in the presence of large noise. Therefore it is possible to design effective countermeasures against this attack based on randomization (e.g. masking) and/or limited key usage.

The original DES cipher can already be broken by exhaustive key search, but the use of Triple DES is still considered practically secure for some applications. Although more secure block ciphers such as AES are recommended in general, the use of Triple DES remains quite widespread (e.g. in electronic payment systems). Therefore, many security controllers still offer hardware support for (Triple) DES today and assessing the security of these implementations remains an important topic. We note that our attack extends to Triple DES by applying it to the three DES invocations individually and by using the classic meet-in-the-middle approach.

2 Preliminaries

2.1 Notation

Let $[n] := \{1, \dotsc , n\}$.

We denote by $\varvec{0}_n = (0, \dotsc , 0)^\top \in \mathbb {R}^n$ the all-zero-vector, by $\varvec{1}_n = (1, \dotsc , 1)^\top \in \mathbb {R}^n$ the all-one-vector, by $\varvec{I}_n \in \mathbb {R}^{n \times n}$ the identity matrix, and by $\varvec{0}_{m,n} \in \mathbb {R}^{m \times n}$ the zero matrix. The Euclidean norm of a vector $\varvec{v} \in \mathbb {R}^n$ is denoted by $\Vert \varvec{v} \Vert $.

Let $\varvec{a} = a_1 \cdots a_n \in \{0,1\}^n$ be a bit-string. Depending on the context, we identify $\varvec{a}$ with the (column) vector $(a_1, \dotsc , a_n)^\top \in \mathbb {R}^n$ or the (big-endian represented) integer $\sum _{i=1}^n a_i 2^{n-i} \in \{0, 1, \dotsc , 2^n-1\}$. The bit-wise XOR of $\varvec{a}, \varvec{b} \in \{0,1\}^n$ is denoted by $\varvec{a} \oplus \varvec{b}$. The bit-wise complement of $\varvec{a} \in \{0,1\}^n$ is denoted by $\overline{\varvec{a}} := \varvec{a} \oplus \varvec{1}_n$, the cyclic left-shift (rotation) of $\varvec{a}$ by $k \in \mathbb {Z}$ positions is denoted by $\varvec{a} \lll k := a_{1+k} \cdots a_{n+k}$ (where indices are to be interpreted modulo n with representatives in [n]), and the Hamming weight of $\varvec{a}$ is denoted by ${{\,\mathrm{wt}\,}}(\varvec{a}) := \sum _{i=1}^n a_i$.

2.2 DES key schedule

The Data Encryption Standard (DES) is defined in [11]. In this article we are only concerned with the DES key schedule, which we describe below.

For simplicity and without loss of generality, we assume that DES keys are represented by $\varvec{k} = (\varvec{c}, \varvec{d}) \in \{0,1\}^{56}$, where $\varvec{c}, \varvec{d} \in \{0,1\}^{28}$ denote the contents of the C- and D-register after the map PC-1 (permuted choice 1) has been applied to the actual DES master key KEY (i.e. $\varvec{c}, \varvec{d}$ correspond to $\mathsf {C}_0, \mathsf {D}_0$ in the notation of [11]).

The DES round keys $\varvec{k}_1, \dotsc , \varvec{k}_{16} \in \{0,1\}^{48}$ are derived from $\varvec{k} = (\varvec{c}, \varvec{d})$ as follows. We write $\varvec{c} = c_1 \cdots c_{28}$ and $\varvec{d} = d_1 \cdots d_{28}$. In each round $i \in [16]$, the values of the C- and D-registers are cyclically shifted (i.e. rotated) by 1 or 2 positions to the left. The number $\delta (i)$ of shifts in round i is given by

(1)

The accumulated number $\rho (i)$ of shifts (modulo 28) up to round i is given by

(2)

i.e. we have $\rho (1) = \delta (1)$ and $\rho (i) = \rho (i-1) + \delta (i) \pmod {28}$ for $2 \le i \le 16$. The values of the C- and D-registers in round i are therefore given by

$$\begin{aligned} (\varvec{c}_i, \varvec{d}_i)&:= (c_{\rho (i)+1} \cdots c_{\rho (i)+28}, d_{\rho (i)+1}\nonumber \\&\quad \cdots d_{\rho (i)+28}) \quad \text {for}\,\; i \in [16] , \end{aligned}$$

(3)

where the indices are to be interpreted modulo 28 (with representatives in [28]).

In each round $i \in [16]$, the map PC-2 (permuted choice 2) is applied to $(\varvec{c}_i, \varvec{d}_i)$ to obtain the round key $\varvec{k}_i$. The map PC-2 is defined as

$$\begin{aligned} {{\,\mathrm{PC-2}\,}} :\{0,1\}^{56}&\rightarrow \{0,1\}^{48},\quad (c_1 \cdots c_{28}, d_1 \cdots d_{28}) \\&\mapsto (c_{\sigma (1)} \cdots c_{\sigma (24)}, d_{\tau (1)} \cdots d_{\tau (24)}), \end{aligned}$$

where $\sigma :[24] \rightarrow [28]{\setminus }\{9,18,22,25\}$ and $\tau :[24] \rightarrow [28]{\setminus }\{7,10,15,26\}$ are the bijections defined by

We denote by $M_\sigma := \{9, 18, 22, 25\}$ and $M_\tau := \{7, 10, 15, 26\}$ the sets of elements in [28] which are “missing” from the images of $\sigma $ and $\tau $, respectively. The round keys are finally defined as $\varvec{k}_i := {{\,\mathrm{PC-2}\,}}(\varvec{c}_i, \varvec{d}_i)$ for $i \in [16]$. Written in terms of the original key $\varvec{k} = (\varvec{c}, \varvec{d})$, we have

$$\begin{aligned} \varvec{k}_i&= \bigl ( c_{\rho (i)+\sigma (1)} \cdots c_{\rho (i)+\sigma (24)}, d_{\rho (i)+\tau (1)}\nonumber \\&\quad \cdots d_{\rho (i)+\tau (24)} \bigr ) \quad \text {for}\,\; i \in [16], \end{aligned}$$

(4)

where the indices are again to be interpreted modulo 28.

2.3 Leakage models

We consider variations of the leakage models identified in previous works [7, 8, 14,15,16,17]. The models assume that the key-dependent leakage originates from updates $(\varvec{c}_{i+1}, \varvec{d}_{i+1}) \leftarrow (\varvec{c}_i, \varvec{d}_i)$ of the C- and D-registers and/or updates $\varvec{k}_{i+1} \leftarrow \varvec{k}_i$ of the round-key register for $i \in [15]$. Moreover, it is assumed that the leakage stemming from a bit transition $b \leftarrow a$ in those register updates depends only on $a \oplus b$ (XOR leakage) for $a, b \in \{0,1\}$.

Let $\varvec{a} \in \{0,1\}^{28}$ be one half of a DES key $(\varvec{c}, \varvec{d}) \in \{0,1\}^{56}$ in the C- or D-register. By (2), (3), and (4), the bit transitions occurring in the DES key schedule are of the form $a_{i+1} \leftarrow a_i$ (shift-1 transitions) or $a_{i+2} \leftarrow a_i$ (shift-2 transitions) for some $i \in [28]$ and with indices interpreted modulo 28. Hence we may assume that the leakage depends only on $a_i \oplus a_{i+1}$ and $a_i \oplus a_{i+2}$ or, equivalently, only on $(-1)^{a_i \oplus a_{i+1}}$ and $(-1)^{a_i \oplus a_{i+2}}$ for all $i \in [28]$. Since shift-1 transitions appear in 3 rounds and shift-2 transitions in 12 rounds of the key schedule (cf. (1)), it is conceivable that shift-2 transitions will have a higher impact on the total leakage.

Based on this discussion, we introduce explanatory variables for the leakage models as follows. For a shift $k \in \{1,2\}$ and a key half $\varvec{a} \in \{0,1\}^{28}$, we define the vector

$$\begin{aligned} \Delta _k(\varvec{a}) := \varvec{1}_{28} - 2 \cdot \bigl ( \varvec{a} \oplus (\varvec{a} \lll k) \bigr ) \in \{\pm 1\}^{28}. \end{aligned}$$

(5)

Written differently, we have

$$\begin{aligned} \Delta _1(\varvec{a})&= \bigl ( (-1)^{a_1 \oplus a_2}, (-1)^{a_2 \oplus a_3}, \dotsc , (-1)^{a_{27} \oplus a_{28}}, (-1)^{a_{28} \oplus a_1} \bigr )^\top \quad \text {and} \\ \Delta _2(\varvec{a})&= \bigl ( (-1)^{a_1 \oplus a_3}, (-1)^{a_2 \oplus a_4}, \dotsc , (-1)^{a_{27} \oplus a_1},\; (-1)^{a_{28} \oplus a_2} \bigr )^\top . \end{aligned}$$

Furthermore, we define the stacked vectors

$$\begin{aligned} \Delta (\varvec{a})&:= \begin{pmatrix} \Delta _1(\varvec{a}) \\ \Delta _2(\varvec{a}) \end{pmatrix} \in \{\pm 1\}^{56} \quad \text {and}\quad \nonumber \\ \Delta (\varvec{c},\varvec{d})&:= \begin{pmatrix} \Delta (\varvec{c}) \\ \Delta (\varvec{d}) \end{pmatrix} \in \{\pm 1\}^{112} \end{aligned}$$

(6)

for all key halves $\varvec{a} \in \{0,1\}^{28}$ and full keys $(\varvec{c}, \varvec{d}) \in \{0,1\}^{56}$. The components of $\Delta (\varvec{a})$ are illustrated in Fig. 1. The vector $\Delta (\varvec{c}, \varvec{d})$ captures all possible bit transitions in the key schedule of $(\varvec{c}, \varvec{d})$ and will serve as explanatory variable for the leakage models.

Remark 1

Let $k \in \{1,2\}$. The map $\Delta _k$ is a group homomorphism from $(\{0,1\}^{28}, \oplus )$ to $(\{\pm 1\}^{28}, \odot )$, where $\odot $ denotes componentwise multiplication. A vector $\varvec{x} \in \{\pm 1\}^{28}$ is in the image of $\Delta _k$ iff $\varvec{x}$ has an even number of positive components iff $\varvec{x}$ has an even number of negative components iff $\sum _{i=1}^{28} x_i = 0 \pmod {4}$. The kernel of $\Delta _k$ is the cyclic group generated by $\varvec{1}_{28}$. In particular, we have $\Delta _k(\varvec{a}) = \Delta _k(\overline{\varvec{a}})$ and $\Delta (\varvec{a}) = \Delta (\overline{\varvec{a}})$ for all $\varvec{a} \in \{0,1\}^{28}$, as well as $\Delta (\varvec{c}, \varvec{d}) = \Delta (\overline{\varvec{c}}, \varvec{d}) = \Delta (\varvec{c}, \overline{\varvec{d}}) = \Delta (\overline{\varvec{c}}, \overline{\varvec{d}})$ for all $\varvec{c}, \varvec{d} \in \{0,1\}^{28}$.

Now we can define the general form of the leakage models under consideration. We restrict ourselves to one of the simplest conceivable settings in which the leakage for a key $(\varvec{c}, \varvec{d})$ is given by an $\mathbb {R}$-linear function of $\Delta (\varvec{c}, \varvec{d})$ and a key-independent error term.

Leakage Model 1

(General model) Let $m \ge 1$, let $\varvec{W} \in \mathbb {R}^{m \times 112}$ be a fixed weight matrix, and let $\varvec{K} = (\varvec{C}, \varvec{D})$ be a uniformly distributed random variable on $\{0,1\}^{56}$. We define the random variable $\varvec{Y}$ on $\mathbb {R}^m$ by

$$\begin{aligned} \varvec{Y} = \varvec{W} \Delta (\varvec{C}, \varvec{D}) + \varvec{\varepsilon }, \end{aligned}$$

(7)

where $\varvec{\varepsilon }$ is a zero-mean random variable on $\mathbb {R}^m$ which is independent of $\varvec{K}$.

We refer to realizations $\varvec{y} \in \mathbb {R}^m$ of $\varvec{Y}$ as observations, to realizations of $\varvec{\varepsilon }$ in $\mathbb {R}^m$ as errors or noise, and to m as the number of points of interest (POIs). The following lemma collects some general properties of the random variables in Leakage Model 1.

Lemma 1

Consider the situation of Leakage Model 1 and let $\varvec{W}_1, \varvec{W}_2 \in \mathbb {R}^{m \times 56}$ such that $\varvec{W} = (\varvec{W}_1, \varvec{W}_2)$.

(a)
We have $\varvec{W} \Delta (\varvec{C}, \varvec{D}) = \varvec{W}_1 \Delta (\varvec{C}) + \varvec{W}_2 \Delta (\varvec{D})$.
(b)
We have ${{\,\mathrm{E}\,}}(\Delta (\varvec{C})) = {{\,\mathrm{E}\,}}(\Delta (\varvec{D})) = \varvec{0}_{56}$ and ${{\,\mathrm{Cov}\,}}(\Delta (\varvec{C})) = {{\,\mathrm{Cov}\,}}(\Delta (\varvec{D})) = \varvec{I}_{56}$.
(c)
We have ${{\,\mathrm{E}\,}}(\Delta (\varvec{C}, \varvec{D})) = \varvec{0}_{112}$ and ${{\,\mathrm{Cov}\,}}(\Delta (\varvec{C}, \varvec{D})) = \varvec{I}_{112}$.
(d)
We have ${{\,\mathrm{E}\,}}(\varvec{Y}) = \varvec{0}_m$ and ${{\,\mathrm{Cov}\,}}(\varvec{Y}) = \varvec{W} \varvec{W}^\top + {{\,\mathrm{Cov}\,}}(\varvec{\varepsilon }) = \varvec{W}_1 \varvec{W}_1^\top + \varvec{W}_2 \varvec{W}_2^\top + {{\,\mathrm{Cov}\,}}(\varvec{\varepsilon })$.

Proof

Assertion (a) is obvious. To show (b), denote $\varvec{X} := \Delta (\varvec{C})$. Clearly ${{\,\mathrm{E}\,}}(\varvec{X}) = \varvec{0}_{56}$, hence ${{\,\mathrm{Cov}\,}}(\varvec{X}) = {{\,\mathrm{E}\,}}(\varvec{X} \varvec{X}^\top ) = \bigl ( {{\,\mathrm{E}\,}}(X_i X_j) \bigr ){}_{i,j \in [56]}$. Let $i, j \in [56]$. We distinguish two cases:

If $i=j$, then ${{\,\mathrm{E}\,}}(X_i X_j) = {{\,\mathrm{E}\,}}(X_i^2) = 1$.
If $i \ne j$, then there are $p, q, r, s \in [28]$ such that $\{p,q\} \ne \{r,s\}$, $q-p \bmod {28} \in \{1,2\}$, $s-r \bmod {28} \in \{1,2\}$, and
$$\begin{aligned} {{\,\mathrm{E}\,}}(X_i X_j)&= {{\,\mathrm{E}\,}}\bigl ( (-1)^{C_p \oplus C_q} (-1)^{C_r \oplus C_s} \bigr ) \\&= {{\,\mathrm{E}\,}}\bigl ( (-1)^{C_p} (-1)^{C_q} (-1)^{C_r} (-1)^{C_s} \bigr ). \end{aligned}$$
Since $p \ne q$, $r \ne s$, and $\{p,q\} \ne \{r,s\}$, we have $p \notin \{q,r,s\}$ or $q \notin \{p,r,s\}$. Let us assume $p \notin \{q,r,s\}$ (the case $q \notin \{p,r,s\}$ can be handled analogously). Then $(-1)^{C_p}$ and $(-1)^{C_q} (-1)^{C_r} (-1)^{C_s}$ are independent, therefore ${{\,\mathrm{E}\,}}(X_i X_j) = {{\,\mathrm{E}\,}}\bigl ( (-1)^{C_p} \bigr ) {{\,\mathrm{E}\,}}\bigl ( (-1)^{C_q} (-1)^{C_r} (-1)^{C_s} \bigr ) = 0$.

We have shown that ${{\,\mathrm{Cov}\,}}(\varvec{X}) = \varvec{I}_{56}$. The remaining assertions of (b) follow analogously. Since $\Delta (\varvec{C})$ and $\Delta (\varvec{D})$ are independent, (b) implies (c). Assertion (d) follows from (a), (b), and (c) by linearity of expectation and independence of $\varvec{C}, \varvec{D}$, and $\varvec{\varepsilon }$. $\square $

3 Hamming weight model

In this section we consider a discrete leakage model, whose observations consist of the (centered) Hamming distances between subsequent round keys and are error-free. This model was already examined in [14, Section 5].

Leakage Model 2

(Hamming weight model) Let $\varvec{K} = (\varvec{C}, \varvec{D})$ be a uniformly distributed random variable on $\{0,1\}^{56}$ and let $\varvec{K}_1, \dotsc , \varvec{K}_{16}$ be the random variables on $\{0,1\}^{48}$ derived from $\varvec{K}$ as defined by Eq. (4). We define the random variable $\varvec{Y}$ on $\mathbb {Z}^{15}$ by

$$\begin{aligned} Y_i := {{\,\mathrm{wt}\,}}(\varvec{K}_i \oplus \varvec{K}_{i+1}) - 24, \quad i \in [15]. \end{aligned}$$

(8)

The components $Y_i$ take values in $[-24,24] \cap \mathbb {Z}$.

This leakage model is a special instance of Leakage Model 1 with $m=15$, a weight matrix $\varvec{W} \in \{-\frac{1}{2}, 0\}^{15 \times 112}$, and error $\varvec{\varepsilon } = \varvec{0}_{15}$. The weight matrix $\varvec{W}$ is completely determined by the model assumptions and will be derived in Sect. 3.1.

Remark 2

In the case of noisy measurements, the error can be reduced by averaging repeated measurements for a fixed key. If the maximum norm of the error vector is less than $\tfrac{1}{2}$, an exact observation as in (8) can be recovered from the noisy version by rounding each component to the nearest integer.

3.1 Determination of the weight and covariance matrix

Let $\varvec{K} = (\varvec{C}, \varvec{D})$ and $\varvec{Y}$ be the random variables as defined in Leakage Model 2. We want to determine a weight matrix $\varvec{W} \in \{-\frac{1}{2}, 0\}^{15 \times 112}$ such that $\varvec{Y} = \varvec{W} \Delta (\varvec{C}, \varvec{D})$. Let $i \in [15]$. Then

$$\begin{aligned} Y_i&= {{\,\mathrm{wt}\,}}(\varvec{K}_i \oplus \varvec{K}_{i+1}) - 24 \\&= \sum _{j=1}^{24} (C_{\rho (i)+\sigma (j)} \oplus C_{\rho (i+1)+\sigma (j)}) \\&\quad + \sum _{j=1}^{24} (D_{\rho (i)+\tau (j)} \oplus D_{\rho (i+1)+\tau (j)}) - 24 \\&= -\frac{1}{2} \sum _{j=1}^{24} (-1)^{C_{\rho (i)+\sigma (j)} \oplus C_{\rho (i+1)+\sigma (j)}} \\&\quad - \frac{1}{2} \sum _{j=1}^{24} (-1)^{D_{\rho (i)+\tau (j)} \oplus D_{\rho (i+1)+\tau (j)}}. \end{aligned}$$

The images of $\sigma $ and $\tau $ are $[28] {\setminus } M_\sigma $ and $[28] {\setminus } M_\tau $, respectively, where $M_\sigma = \{9, 18, 22, 25\}$ and $M_\tau = \{7, 10, 15, 26\}$ (cf. Sect. 2.2). Changing the summation order, we obtain the representation

$$\begin{aligned} Y_i&= -\frac{1}{2} \sum _{j \in [28] {\setminus } (\rho (i)+M_\sigma )} (-1)^{C_{j} \oplus C_{j+\delta (i+1)}} \nonumber \\&\quad - \frac{1}{2} \sum _{j \in [28] {\setminus } (\rho (i)+M_\tau )} (-1)^{D_{j} \oplus D_{j+\delta (i+1)}}\,, \end{aligned}$$

(9)

where the elements in the shifted sets $\rho (i)+M_\sigma $ and $\rho (i)+M_\tau $ are to be interpreted modulo 28 (with representatives in [28]). From (9) the weight matrix $\varvec{W} \in \{-\frac{1}{2}, 0\}^{15 \times 112}$ can be easily read off, see Fig. 2.

Next we want to determine the covariance matrix $\varvec{\Sigma } = {{\,\mathrm{Cov}\,}}(\varvec{Y})$. By Lemma 1(d), we have $\varvec{\Sigma } = \varvec{W} \varvec{W}^\top $. Let $i,j \in [15]$. If $\delta (i+1) \ne \delta (j+1)$ (rounds $i+1$ and $j+1$ have different shifts), then $\sigma _{i,j} = 0$. If $\delta (i+1) = \delta (j+1)$ (rounds $i+1$ and $j+1$ have the same shift), then

$$\begin{aligned} \sigma _{i,j}&= \frac{1}{4} \Bigl ( 56 - \#\bigl ((\rho (i)+M_\sigma ) \cup (\rho (j)+M_\sigma )\bigr ) \\&\quad - \#\bigl ((\rho (i)+M_\tau ) \cup (\rho (j)+M_\tau )\bigr ) \Bigr ). \end{aligned}$$

More concretely, we get

We note that $\det (\varvec{\Sigma }) = 4^{-15} \cdot 4650233960271024$.

3.2 Key ranking and key enumeration

Let $\varvec{y} \in \mathbb {Z}^{15}$ be an observation under Leakage Model 2 corresponding to an unknown key $\varvec{k}^* = (\varvec{c}^*, \varvec{d}^*) \in \{0,1\}^{56}$, i.e. we have $\varvec{y} = \varvec{W} \Delta (\varvec{c}^*, \varvec{d}^*)$. We denote by

$$\begin{aligned} \mathcal {C}(\varvec{y}) := \bigl \{ (\varvec{c}, \varvec{d}) \in \{0,1\}^{56} \,\vert \; \varvec{y} = \varvec{W} \Delta (\varvec{c}, \varvec{d}) \bigr \} \end{aligned}$$

(10)

the set of key candidates for observation $\varvec{y}$. The rank of $\varvec{k}^*$ is defined as

$$\begin{aligned} \mathcal {R}(\varvec{k}^*) := \# \mathcal {C}(\varvec{y}) . \end{aligned}$$

(11)

Note that $\mathcal {R}(\varvec{k}^*)$ is a multiple of 4 (cf. Remark 1). We call $\log _2 \mathcal {R}(\varvec{k}^*)$ the logarithmic key rank of $\varvec{k}^*$.

At first glance, enumerating the set $\mathcal {C}(\varvec{y})$ looks like a 56-bit (or 54-bit) problem. However, we can apply a meet-in-the-middle approach (cf. [14, Section 5]). By Lemma 1(a), we have the decomposition

$$\begin{aligned} \varvec{W} \Delta (\varvec{c}, \varvec{d}) = \varvec{W}_1 \Delta (\varvec{c}) + \varvec{W}_2 \Delta (\varvec{d}) \quad \text {for all}\;(\varvec{c}, \varvec{d}) \in \{0,1\}^{56}, \end{aligned}$$

(12)

where $\varvec{W}_1, \varvec{W}_2 \in \mathbb {R}^{15 \times 56}$ such that $\varvec{W} = (\varvec{W}_1, \varvec{W}_2)$. This leads to the following simple enumeration procedure.

Algorithm 1

Input: A vector $\varvec{y} \in ([-24,24] \cap \mathbb {Z})^{15}$.

Output: The set of key candidates $\mathcal {C}(\varvec{y})$.

1.:

Compute the lists

$$\begin{aligned} \mathcal {L}_1&\leftarrow \bigl \{ (\varvec{c}, \varvec{y} - \varvec{W}_1 \Delta (\varvec{c})) \,\vert \; \varvec{c} = 0, \dotsc , 2^{27}-1 \bigr \} \,, \\ \mathcal {L}_2&\leftarrow \bigl \{ (\varvec{d}, \varvec{W}_2 \Delta (\varvec{d})) \,\vert \; \varvec{d} = 0, \dotsc , 2^{27}-1 \bigr \} \end{aligned}$$

and sort them by the second component of their elements (e.g. using the lexicographical order on $\mathbb {Z}^{15}$).

2.:

Set $\mathcal {C} \leftarrow \varnothing $. For all $(\varvec{c}, \varvec{y}_1) \in \mathcal {L}_1$ and $(\varvec{d}, \varvec{y}_2) \in \mathcal {L}_2$ with $\varvec{y}_1 = \varvec{y}_2$, set

$$\begin{aligned} \mathcal {C} \leftarrow \mathcal {C} \cup \{(\varvec{c}, \varvec{d}), (\overline{\varvec{c}}, \varvec{d}), (\varvec{c}, \overline{\varvec{d}}), (\overline{\varvec{c}}, \overline{\varvec{d}}) \}. \end{aligned}$$

(Since $\mathcal {L}_1$ and $\mathcal {L}_1$ are ordered by the second component of their elements, the lists only have to be traversed once in order to find all collisions in the second component.) Return $\mathcal {C}$ and stop.

3.3 Experiments

In order to estimate the expected logarithmic key rank, we implemented Algorithm 1 in the Julia programming language [2] and conducted 1000 trials. For each trial we chose a random DES key and calculated the associated observation vector $\varvec{y}$. Then we enumerated all candidates $(\varvec{c}, \varvec{d})$ such that $\varvec{y} = \varvec{W} \Delta (\varvec{c}, \varvec{d})$. Each attempt took approximately 140 seconds of single-core computing time on a standard computer. The results of the experiments are given in Table 1.

Table 1 Empirical distribution of the logarithmic key rank based on 1000 trials with random keys

Full size table

We observe that in one half of all cases the logarithmic key rank is less than 16. With such low logarithmic key ranks we note that the classic meet-in-the-middle approach against 3-key Triple DES has very moderate running time. For average keys we can expect a running time of roughly $2^{32}$ DES encryptions/decryptions.

3.4 Theoretical estimation of the remaining entropy

The conditional entropy ${{\,\mathrm{H}\,}}(\varvec{C}, \varvec{D} \mid \varvec{Y})$ is an information-theoretic measure for the expected logarithmic key rank, which we call remaining entropy. We have

$$\begin{aligned} {{\,\mathrm{H}\,}}(\varvec{C}, \varvec{D} \mid \varvec{Y}) = {{\,\mathrm{H}\,}}(\varvec{C}, \varvec{D}) - {{\,\mathrm{I}\,}}(\varvec{Y}; \varvec{C}, \varvec{D}) = 56 - {{\,\mathrm{I}\,}}(\varvec{Y}; \varvec{C}, \varvec{D}), \end{aligned}$$

where ${{\,\mathrm{I}\,}}(\varvec{Y}; \varvec{C}, \varvec{D})$ is the mutual information of $\varvec{Y}$ and $(\varvec{C}, \varvec{D})$. Since $\varvec{Y}$ is a deterministic function of $(\varvec{C}, \varvec{D})$, we have ${{\,\mathrm{I}\,}}(\varvec{Y}; \varvec{C}, \varvec{D}) = {{\,\mathrm{H}\,}}(\varvec{Y})$, hence ${{\,\mathrm{H}\,}}(\varvec{C}, \varvec{D} \mid \varvec{Y}) = 56 - {{\,\mathrm{H}\,}}(\varvec{Y})$.

3.4.1 A lower bound for the remaining entropy

The following lemma provides an upper bound for ${{\,\mathrm{H}\,}}(\varvec{Y})$.

Lemma 2

Let $\varvec{Y}$ be a random variable on $\mathbb {Z}^m$ with ${{\,\mathrm{E}\,}}(\varvec{Y}) = \varvec{0}_m$ and positive-definite covariance matrix $\varvec{\Sigma } := {{\,\mathrm{Cov}\,}}(\varvec{Y}) \in \mathbb {R}^{m \times m}$. Then

$$\begin{aligned} {{\,\mathrm{H}\,}}(\varvec{Y}) \le \frac{1}{2} \log _2 \bigl ( \det ( 2 \pi e \varvec{\Sigma }) \bigr ) + m \log _2 \left( \frac{1+e^{- 2 \pi ^2 \lambda }}{1-e^{- 2 \pi ^2 \lambda }} \right) , \end{aligned}$$

(13)

where $\lambda > 0$ is the smallest eigenvalue of $\varvec{\Sigma }$.

Proof

Let $\mathcal {Y} \subseteq \mathbb {Z}^m$ be the support of $\varvec{Y}$. By Gibbs’ inequality (cf. [3, Theorem 2.6.3]), we have

$$\begin{aligned} {{\,\mathrm{H}\,}}(\varvec{Y})&= - \sum _{\varvec{y} \in \mathcal {Y}} \Pr (\varvec{Y}=\varvec{y}) \log _2\bigl (\Pr (\varvec{Y}=\varvec{y})\bigr )\\&\le - \sum _{\varvec{y} \in \mathcal {Y}} \Pr (\varvec{Y} = \varvec{y}) \, \log _2\bigl ( p(\varvec{y}) \bigr ) \end{aligned}$$

for any probability distribution $p :\mathcal {Y} \rightarrow [0,1]$ with support $\mathcal {Y}$. Setting

$$\begin{aligned} p(\varvec{y}) := \mu \, e^{-\frac{1}{2} \varvec{y}^\top \varvec{\Sigma }^{-1} \varvec{y}} \quad \text {with}\quad \mu := \left( \sum _{\varvec{y} \in \mathcal {Y}} e^{-\frac{1}{2} \varvec{y}^\top \varvec{\Sigma }^{-1} \varvec{y}} \right) ^{-1}, \end{aligned}$$

we obtain

$$\begin{aligned} {{\,\mathrm{H}\,}}(\varvec{Y})&\le -\log _2(\mu ) + \frac{\log _2(e)}{2} \sum _{\varvec{y} \in \mathcal {Y}} \Pr (\varvec{Y} = \varvec{y}) \varvec{y}^\top \varvec{\Sigma }^{-1} \varvec{y} \\&= -\log _2(\mu ) + \frac{\log _2(e)}{2} {{\,\mathrm{E}\,}}(\varvec{Y}^\top \varvec{\Sigma }^{-1} \varvec{Y}) \\&= -\log _2(\mu ) + \frac{\log _2(e)}{2} {{\,\mathrm{tr}\,}}\bigl ( \varvec{\Sigma }^{-1} {{\,\mathrm{E}\,}}(\varvec{Y} \varvec{Y}^\top ) \bigr )\\&\quad \text {(by the ``trace trick'')} \\&= -\log _2(\mu ) + \frac{m \log _2(e)}{2}. \end{aligned}$$

Using the Poisson summation formula (cf. [1, Lemma (1.1) (i)]), we get

$$\begin{aligned} {{\,\mathrm{H}\,}}(\varvec{Y})&\le \frac{m \log _2(e)}{2} + \log _2\left( \sum _{\varvec{y} \in \mathcal {Y}} e^{-\frac{1}{2} \varvec{y}^\top \varvec{\Sigma }^{-1} \varvec{y}} \right) \\&\le \frac{m \log _2(e)}{2} + \log _2\left( \sum _{\varvec{y} \in \mathbb {Z}^m} e^{-\frac{1}{2} \varvec{y}^\top \varvec{\Sigma }^{-1} \varvec{y}} \right) \\&=\frac{m \log _2(e)}{2} + \log _2\Bigl ( \sqrt{(2\pi )^m \det (\varvec{\Sigma })} \Bigr )\\&\quad + \log _2\left( \sum _{\varvec{y} \in \mathbb {Z}^m} e^{-2\pi ^2 \varvec{y}^\top \varvec{\Sigma } \varvec{y}} \right) \\&= \frac{1}{2} \log _2 \bigl ( \det ( 2 \pi e \varvec{\Sigma }) \bigr ) + \log _2\left( \sum _{\varvec{y} \in \mathbb {Z}^m} e^{-2\pi ^2 \varvec{y}^\top \varvec{\Sigma } \varvec{y}} \right) . \end{aligned}$$

Since $\varvec{y}^\top \varvec{\Sigma } \varvec{y} \ge \lambda \Vert \varvec{y} \Vert ^2$ for all $\varvec{y} \in \mathbb {R}^m$, we have

$$\begin{aligned} \log _2\left( \sum _{\varvec{y} \in \mathbb {Z}^m} e^{-2\pi ^2 \varvec{y}^\top \varvec{\Sigma } \varvec{y}} \right)&\le \log _2\left( \sum _{\varvec{y} \in \mathbb {Z}^m} e^{-2\pi ^2 \lambda \Vert \varvec{y} \Vert ^2} \right) \\&= m \log _2\left( \sum _{z \in \mathbb {Z}} e^{-2\pi ^2 \lambda z^2} \right) \\&\le m \log _2\left( -1 + 2\sum _{n \ge 0} e^{-2\pi ^2 \lambda n} \right) \\&= m \log _2 \left( \frac{1+e^{- 2 \pi ^2 \lambda }}{1-e^{- 2 \pi ^2 \lambda }} \right) , \end{aligned}$$

finishing the proof. $\square $

Remark 3

We note that [1, Lemma (1.5) (i)] implies a better bound for the term $m \log _2(\ldots )$ of Eq. (13) in general. However, the bound of Lemma 2 is sufficient for our purposes.

Applying Lemma 2 to Leakage Model 2, we obtain $\lambda \approx 0.65$ and ${{\,\mathrm{H}\,}}(\varvec{Y}) \le 41.73$ by numerical methods. We also note that the term

$$\begin{aligned} 15 \log _2 \left( \frac{1+e^{- 2 \pi ^2 \lambda }}{1-e^{- 2 \pi ^2 \lambda }} \right) \approx 0.0001 \end{aligned}$$

(14)

in (13) is negligible for this random variable. We obtain the lower bound

$$\begin{aligned} {{\,\mathrm{H}\,}}(\varvec{C}, \varvec{D} \mid \varvec{Y}) = 56 - {{\,\mathrm{H}\,}}(\varvec{Y}) \ge 14.27 \end{aligned}$$

(15)

for the remaining entropy. The experiments reported in Sect. 3.3 (cf. Table 1) suggest that the remaining entropy is close to this lower bound. In Sect. 3.4.2 we support this hypothesis by geometric considerations.

3.4.2 A heuristic for the remaining entropy

Based on the experiments reported in Sect. 3.3, we propose the heuristic formula

$$\begin{aligned} {{\,\mathrm{H}\,}}(\varvec{C}, \varvec{D} \mid \varvec{Y}) \approx 56 - \frac{1}{2} \log _2 \bigl (\det (2 \pi e \varvec{\Sigma }) \bigr ) \approx 14.27 \end{aligned}$$

(16)

for the remaining entropy.

Each component $Y_i$ of $\varvec{Y}$ is a sum of independent random variables and we can certainly approximate the distribution of each $Y_i$ by a continuous normal distribution. But is it a valid approximation, if we replace the distribution of $\varvec{Y}$ by a 15-dimensional normal distribution?

Let $m \le n$, let $\varvec{X}$ be a uniformly distributed random variable on $\{\pm 1\}^n$, and let $\varvec{A} \in \mathbb {R}^{m \times n}$ be a matrix of full row rank such that $\varvec{A} \varvec{X}$ takes values in $\mathbb {Z}^m$. We have the following general properties:

If the support of $\varvec{A} \varvec{X}$ is contained in a subset $S \subseteq \mathbb {R}^m$, then clearly ${{\,\mathrm{H}\,}}(\varvec{A} \varvec{X}) \le \log _2\bigl ( \#(S \cap \mathbb {Z}^m)\bigr )$. In addition, we can expect that $\log _2\bigl ( \#(S \cap \mathbb {Z}^m)\bigr ) \approx \log _2\bigl ({{\,\mathrm{vol}\,}}(S)\bigr )$ for “natural” sets S.
Let $\varvec{A} = \varvec{U} \varvec{D} \varvec{V}^\top $ be the singular value decomposition of $\varvec{A}$, where $\varvec{U} \in \mathbb {R}^{m \times m}$ and $\varvec{V} \in \mathbb {R}^{n \times n}$ are orthogonal matrices and $\varvec{D} \in \mathbb {R}^{m \times n}$ is a rectangular diagonal matrix with non-negative elements on the diagonal. This representation of $\varvec{A}$ easily implies that $\varvec{A} \varvec{X}$ takes values in an m-dimensional ellipsoid with semiaxes equal to the non-zeros elements of $\varvec{D}$ times $\sqrt{n}$. The volume of this ellipsoid is
$$\begin{aligned} V_m(1) n^{m/2} \sqrt{\det \bigl ( \varvec{D} \varvec{D}^\top \bigr )} = V_m(1) n^{m/2} \sqrt{\det \bigl ( \varvec{A} \varvec{A}^\top \bigr )} \,, \end{aligned}$$
where $V_m(1)$ denotes the volume of the m-dimensional ball with radius 1.
The heuristic argumentation based on the singular value decomposition:
1. (i)
  If the volume of this ellipsoid is smaller than $2^n$, then we can expect that all integer points of this ellipsoid occur in the support of $\varvec{A} \varvec{X}$.
2. (ii)
  The components of $\varvec{V}^\top \varvec{X}$ have expectation 0 and variance 1. Therefore, the bulk of the support of $\varvec{A} \varvec{X}$ lies in a smaller ellipsoid with semiaxes equal to the non-zero elements of $\varvec{D}$ times $\sqrt{m}$. The volume of this smaller ellipsoid is
  $$\begin{aligned} V_m(1) m^{m/2} \sqrt{\det \bigl ( \varvec{A} \varvec{A}^\top \bigr )}. \end{aligned}$$
3. (iii)
  Furthermore, if (i) is valid, we expect that the discrete distribution of $\varvec{A} \varvec{X}$ is “similar” to the continuous distribution $\varvec{A} \varvec{Z}$, where $\varvec{Z}$ is normally distributed with covariance matrix $\varvec{I}_n$. $\varvec{A} \varvec{Z}$ is therefore normally distributed with covariance matrix $\varvec{A} \varvec{A}^\top $. The entropy of $\varvec{A} \varvec{Z}$ is well known and given by the formula
  $$\begin{aligned} \frac{1}{2} \log _2 \bigl (\det (2 \pi e \varvec{A} \varvec{A}^\top ) \bigr ). \end{aligned}$$
  Note that the approaches (ii) and (iii) lead to very similar approximations, since
  $$\begin{aligned} \log _2\bigl ( V_m(1) m^{m/2} \bigr ) \approx \frac{m}{2} \log _2(2 \pi e). \end{aligned}$$

3.4.3 Distribution of the remaining entropy

Why has the remaining entropy in the experiments in Sect. 3.3 such a large variation?

Each $y_i$ is a realization of a binomially distributed random variable. If $y_i$ takes on extreme values near $\pm 24$, we have a large amount of information about the key $(\varvec{c}, \varvec{d})$. On the other hand, for $y_i=0$ there are many candidates for $(\varvec{c}, \varvec{d})$. As argued in Sect. 3.4.2, we expect that in our case

$$\begin{aligned} \Pr (\varvec{Y} = \varvec{y})&= \frac{\# \bigl \{ (\varvec{c}, \varvec{d}) \in \{0, 1\}^{56} \,\vert \; \varvec{W} \Delta (\varvec{c}, \varvec{d}) = \varvec{y} \bigr \} }{ 2^{56} } \\&\approx \frac{1}{\sqrt{ (2\pi )^{15} \det (\varvec{\Sigma }) } } \exp \left( -\tfrac{1}{2} \varvec{y}^\top \varvec{\Sigma }^{-1} \varvec{y} \right) . \end{aligned}$$

This leads to the following approximation of the remaining entropy ${{\,\mathrm{H}\,}}(\varvec{C}, \varvec{D} \mid \varvec{Y} = \varvec{y})$ for fixed $\varvec{y}$:

$$\begin{aligned} \begin{aligned} {{\,\mathrm{H}\,}}(\varvec{C}, \varvec{D} \mid \varvec{Y}=\varvec{y})&= \log _2 \bigl (\# \bigl \{ (\varvec{c}, \varvec{d}) \in \{0, 1\}^{56}\\&\,\vert \; \varvec{W} \Delta (\varvec{c}, \varvec{d}) = \varvec{y} \bigr \} \bigr ) \\&= 56 + \log _2 \bigl ( \Pr (\varvec{Y} = \varvec{y}) \bigr ) \\&\approx \max \Bigl \{ 2 ,\; 56 - \tfrac{1}{2} \log _2 \bigl ( \det (2 \pi \varvec{\Sigma })\bigr )\\&\quad - \tfrac{1}{2} \log _2(e) \Vert \varvec{\Sigma }^{-1/2} \varvec{y} \Vert ^2 \Bigr \} \\&\approx \max \Bigl \{ 2 ,\; 25.09 - 0.72 \cdot \Vert \varvec{\Sigma }^{-1/2} \varvec{y} \Vert ^2 \Bigr \}. \end{aligned} \end{aligned}$$

(17)

In approximation the remaining entropy depends only on $\Vert \varvec{\Sigma }^{-1/2} \varvec{y} \Vert $. If $\Vert \varvec{\Sigma }^{-1/2} \varvec{y} \Vert $ is small, the remaining entropy ${{\,\mathrm{H}\,}}(\varvec{C}, \varvec{D} \mid \varvec{Y}=\varvec{y})$ is large (“strong keys”). If $\Vert \varvec{\Sigma }^{-1/2} \varvec{y} \Vert $ is large, the remaining entropy ${{\,\mathrm{H}\,}}(\varvec{C}, \varvec{D} \mid \varvec{Y}=\varvec{y})$ is small (“weak keys”).

We expect that the largest remaining entropy should occur near $\varvec{y} = \varvec{0}_{15}$. The largest number of candidates we found experimentally was indeed

$$\begin{aligned} \# \mathcal {C}(\varvec{0}_{15}) = 34296072, \end{aligned}$$

exactly for the observation $\varvec{y} = \varvec{0}_{15}$. Since $\log _2(34296072) \approx 25.03$, this fits well to the heuristic argumentation above. At the other extreme, for the key $\varvec{k} = \varvec{0}_{56}$ we have

$$\begin{aligned} \mathcal {R}(\varvec{0}_{56}) = \# \mathcal {C}(-24 \cdot \varvec{1}_{15}) = 4. \end{aligned}$$

In order to test the heuristic (17), we conducted some further experiments. For every $a \in \{0.6, 0.8, \dotsc , 7.4\}$, we generated 10 random observations $\varvec{y} = \varvec{W} \Delta (\varvec{k})$ such that $\Vert \varvec{\Sigma }^{-1/2} \varvec{y} \Vert \in [a, a+0.2]$ by rejection sampling and computed $\#\mathcal {C}(\varvec{y})$ using Algorithm 1. Figure 3 shows a plot of the remaining entropy estimate (17) as a function of $\Vert \varvec{\Sigma }^{-1/2} \varvec{y} \Vert $ together with the logarithmic key ranks computed in the experiments. Note that for unconditionally chosen random keys the observations would be concentrated around the middle region of the graph (cf. Sect. 3.3). The experiments confirm that the approximation (17) is good.

In the end, we can expect that the distribution of ${{\,\mathrm{H}\,}}(\varvec{C}, \varvec{D} \mid \varvec{Y} = \varvec{y})$ is near the continuous distribution of the random variable $25.1- \frac{1}{2} \varvec{y}^\top \varvec{\Sigma }^{-1} \varvec{y} \log _2(e)$, where $\varvec{y}$ is normally distributed with expectation $\varvec{0}_{15}$ and covariance matrix $\varvec{\Sigma }$. Note that the probability density function of this continuous distribution is identical to the probability density function of the random variable $25.1- \frac{1}{2} \varvec{u}^\top \varvec{u} \log _2(e)$, where $\varvec{u}$ is a standard normal random vector. Since $\varvec{u}^\top \varvec{u}$ is $\chi ^2$-distributed we know that

$$\begin{aligned}&{{\,\mathrm{E}\,}}\left( 25.09 - \tfrac{1}{2} \varvec{u}^\top \varvec{u} \log _2(e)\right) = 25.09 - \frac{1}{2} 15 \log _2(e) \approx 14.27 , \\&{{\,\mathrm{Var}\,}}\left( 25.09 - \tfrac{1}{2} \varvec{u}^\top \varvec{u} \log _2(e)\right) = 30 \cdot \Bigl (\frac{1}{2} \log _2(e) \Bigr )^2 \approx 15.61. \end{aligned}$$

Note that the expectation fits to the value ${{\,\mathrm{H}\,}}(\varvec{C}, \varvec{D} \mid \varvec{Y}) = 56 - {{\,\mathrm{H}\,}}(\varvec{Y}) \approx 56 - 41.7 \approx 14.3$ as estimated above. In comparison, we obtained a mean value of 15.08 and empirical variance of 12.02 for the logarithmic key ranks in the experiment reported in Sect. 3.3.

3.5 Isolated consideration of the C- and D-register

As a natural approach one might try to find the values of the C- and D-register separately. In the publication [7], for instance, the authors discuss a template attack on even smaller parts of the C- and D-registers. Here we want to clarify that this reduces the amount of mutual information significantly. We consider the following general strategy:

1.
Define an appropriate evaluation function that depends only on the key part $\varvec{c}$ in the C-register. Find a set $\mathcal {C}$ of likely candidates for the C-register.
2.
Define an appropriate evaluation function that depends only on the key part $\varvec{d}$ in the D-register. Find a set $\mathcal {D}$ of likely candidates for the D-register.
3.
Check all combinations $(\varvec{c}, \varvec{d}) \in \mathcal {C} \times \mathcal {D}$.

The work load of this approach is bounded by $2^{27}$ in step 1 and 2, but in step 3 we have to check all combinations. We can now give an easy to compute indication of how successful such an approach could be. The random variable $\varvec{W}_2 \Delta (\varvec{D})$ takes on certain values in a 15-dimensional space and its entropy ${{\,\mathrm{H}\,}}(\varvec{W}_2 \Delta (\varvec{D}))$ is clearly bounded by 27. By Lemma 2, we have

$$\begin{aligned} {{\,\mathrm{H}\,}}(\varvec{W}_2 \Delta (\varvec{D})) \lesssim \frac{1}{2} \log _2 \bigl (\det (2\pi e \varvec{W}_2 \varvec{W}_2^\top ) \approx 34.05 , \end{aligned}$$

where we have neglected the second term in (13). Let $\varvec{U}_2$ be a uniformly distributed random variable on $\{\pm 1\}^{56}$. We assume that in our case the upper bound of Lemma 2 is in fact a good approximation

$$\begin{aligned} {{\,\mathrm{H}\,}}(\varvec{W}_2 \varvec{U}_2) \approx 34.05 . \end{aligned}$$

Now we use the following heuristic. The success of an evaluation function that depends only on the key part $\varvec{c}$ of the C-register should be restricted to the mutual information of $\varvec{C}$ and the random variable

$$\begin{aligned} \varvec{Y}_1 := \varvec{W}_1 \Delta (\varvec{C}) + \varvec{W}_2 \varvec{U}_2 . \end{aligned}$$

Applying Lemma 2 to $\varvec{Y}_1$, we get

$$\begin{aligned} {{\,\mathrm{I}\,}}(\varvec{Y}_1; \varvec{C})&= {{\,\mathrm{I}\,}}(\varvec{W}_1 \Delta (\varvec{C}) + \varvec{W}_2 \varvec{U}_2; \varvec{C}) ={{\,\mathrm{H}\,}}(\varvec{Y}_1) - {{\,\mathrm{H}\,}}(\varvec{W}_2 \varvec{U}_2) \\&\approx \frac{1}{2} \log _2 \bigl (\det (2\pi e \varvec{W} \varvec{W}^\top ) \bigr )\\&\quad - \frac{1}{2} \log _2 \bigl (\det ( 2 \pi e \varvec{W}_2 \varvec{W}_2^\top ) \bigr ) \\&= \frac{1}{2} \log _2 \bigl (\det ( \varvec{W} \varvec{W}^\top ) \bigr ) - \frac{1}{2} \log _2 \bigl (\det ( \varvec{W}_2 \varvec{W}_2^\top ) \bigr ) . \end{aligned}$$

Using this approximation, we obtain

$$\begin{aligned} {{\,\mathrm{I}\,}}(\varvec{Y}_1; \varvec{C}) \approx 7.68 \quad \text {and} \quad {{\,\mathrm{I}\,}}(\varvec{Y}_2; \varvec{D}) \approx 8.68, \end{aligned}$$

where $\varvec{Y}_2 := \varvec{W}_1 \varvec{U}_1 + \varvec{W}_2 \Delta (\varvec{D})$ is defined analoguously to $\varvec{Y}_1$ with $\varvec{U}_1$ uniformly distributed on $\{\pm 1\}^{56}$.

Remark 4

These small concrete values do not come as a complete surprise. By construction of $\varvec{W}$ we know that $\varvec{W}_1 \varvec{W}_1^\top \approx \varvec{W}_2 \varvec{W}_2^\top $, so that we can expect roughly

$$\begin{aligned} {{\,\mathrm{I}\,}}(\varvec{Y}_1; \varvec{C})&\approx \frac{1}{2} \log _2 \bigl (\det (\varvec{W} \varvec{W}^\top )\bigr ) - \frac{1}{2} \log _2 \bigl (\det (\varvec{W}_2 \varvec{W}_2^\top )\bigr ) \\&\approx \frac{1}{2} \log _2 \bigl (\det (2\varvec{W}_2 \varvec{W}_2^\top )\bigr ) - \frac{1}{2} \log _2 \bigl (\det (\varvec{W}_2 \varvec{W}_2^\top )\bigr )\\&= \frac{1}{2}\log _2 (2^{15}) = \frac{15}{2} . \end{aligned}$$

4 Linear regression model

In this section we consider a continuous leakage model, whose observations cover more points of interest but may contain errors. The weight matrix of this model is not derived by theoretical considerations, but must be learned in a profiling phase using linear regression.

Leakage Model 3

(Linear regression model) Let $m \ge 112$, let $\varvec{W} \in \mathbb {R}^{m \times 112}$ be a fixed weight matrix of full column rank, and let $\varvec{K} = (\varvec{C}, \varvec{D})$ be a uniformly distributed random variable on $\{0,1\}^{56}$. We define the random variable $\varvec{Y}$ on $\mathbb {R}^m$ by

$$\begin{aligned} \varvec{Y} = \varvec{W} \Delta (\varvec{C}, \varvec{D}) + \varvec{\varepsilon } , \end{aligned}$$

(18)

where $\varvec{\varepsilon }$ is a zero-mean normal random variable on $\mathbb {R}^m$ with covariance matrix $\sigma ^2 \varvec{I}_m$ which is independent of $\varvec{K}$.

Remark 5

The leakage of a real implementation might not strictly follow this leakage model. For instance, there might be a non-linear key-dependent influence on the leakage and the error might follow a different distribution or be key-dependent as well. One may attempt to fit real measurements better to this model by recentering and decorrelating the measurements. Moreover, the error can be reduced by averaging over several observations for the same key, see Sect. 4.3.1.

Remark 6

Leakage Model 3 is expressed in the setting of the DES algorithm. However, in principle it should be adaptable to other algorithms on security controllers. As a protection against side-channel attacks, an implementation should use many operations in parallel if key-dependent information is being processed. In our leakage model each observation is a large, weighted sum of certain key-dependent bits that is disturbed by noise.

4.1 Key ranking and key enumeration

Let $\varvec{y} \in \mathbb {R}^m$ be an observation under Leakage Model 3 corresponding to an unknown key $\varvec{k}^* = (\varvec{c}^*, \varvec{d}^*) \in \{0,1\}^{56}$, i.e. we have $\varvec{y} = \varvec{W} \Delta (\varvec{c}^*, \varvec{d}^*) + \varvec{\varepsilon }$ for some unknown noise vector $\varvec{\varepsilon } \in \mathbb {R}^m$. We also assume for the moment that we know the weight matrix $\varvec{W} \in \mathbb {R}^{m \times 112}$ (in Sect. 4.2 we describe how $\varvec{W}$ can be estimated).

We define the evaluation function

$$\begin{aligned} \eta _{\varvec{W}, \varvec{y}} :\{0,1\}^{56} \rightarrow \mathbb {R}_{\ge 0} , \quad (\varvec{c}, \varvec{d}) \mapsto \bigl \Vert \varvec{y} - \varvec{W} \Delta (\varvec{c}, \varvec{d}) \bigr \Vert ^2 . \end{aligned}$$

(19)

We denote by

$$\begin{aligned} \mathcal {C}_{\varvec{W}}(\varvec{y}, B) := \bigl \{ (\varvec{c}, \varvec{d}) \in \{0,1\}^{56} \,\vert \; \eta _{\varvec{W}, \varvec{y}}(\varvec{c}, \varvec{d}) \le B \bigr \} \end{aligned}$$

(20)

the set of key candidates for observation $\varvec{y}$ with error bound $B \in \mathbb {R}_{\ge 0}$. The rank of the correct key $\varvec{k}^*$ with respect to $\varvec{W}$ and $\varvec{y}$ is defined as

$$\begin{aligned} \mathcal {R}_{\varvec{W}, \varvec{y}}(\varvec{k}^*) := \# \mathcal {C}_{\varvec{W}}\bigl (\varvec{y}, \eta _{\varvec{W}, \varvec{y}}(\varvec{k}^*)\bigr ) . \end{aligned}$$

(21)

Note that $\mathcal {R}_{\varvec{W}, \varvec{y}}(\varvec{k}^*)$ is a multiple of 4 (cf. Remark 1). We call $\log _2 \mathcal {R}_{\varvec{W}, \varvec{y}}(\varvec{k}^*)$ the logarithmic key rank of $\varvec{k}^*$.

A quick check for assessing the quality of $\eta _{\varvec{W}, \varvec{y}}$ can be obtained by testing the condition $\eta _{\varvec{W}, \varvec{y}}(\varvec{c}_j, \varvec{d}_j) \le \eta _{\varvec{W}, \varvec{y}}(\varvec{c}^*, \varvec{d}^*)$ for several random key candidates $(\varvec{c}_j, \varvec{d}_j) \in \{0,1\}^{56}$ with $j = 1, \dotsc , N$. If N is large and $\mathcal {R}_{\varvec{W}, \varvec{y}}(\varvec{k}^*)$ is not too small, we can expect

$$\begin{aligned}&\mathcal {R}_{\varvec{W}, \varvec{y}}(\varvec{k}^*) \approx \frac{2^{56}}{N} \cdot \#\bigl \{j \in [N] \,\vert \; \eta _{\varvec{W}, \varvec{y}}(\varvec{c}_j, \varvec{d}_j)\nonumber \\&\quad \le \eta _{\varvec{W}, \varvec{y}}(\varvec{c}^*, \varvec{d}^*) \bigr \}. \end{aligned}$$

(22)

Next we develop an algorithm to enumerate the set $\mathcal {C}_{\varvec{W}}(\varvec{y}, B)$ based on the Fincke–Pohst lattice point enumeration algorithm [4]. In principle, this algorithm explores the whole key space $\{0,1\}^{56}$ (or $\{0,1\}^{54}$), but in many instances the search tree can be pruned considerably.

The following preparatory lemma shows that the weight matrix $\varvec{W} \in \mathbb {R}^{m \times 112}$ can be replaced by an upper triangular matrix $\varvec{R} \in \mathbb {R}^{112 \times 112}$. At the same time, the observation vector $\varvec{y}$ is projected onto the range of $\varvec{W}$.

Lemma 3

Let $m \ge 112$, let $\varvec{W} \in \mathbb {R}^{m \times 112}$ be a matrix of full column rank, and let $\varvec{y} \in \mathbb {R}^m$. Then there exists a unique upper triangular matrix $\varvec{R} \in \mathbb {R}^{112 \times 112}$ with positive diagonal elements such that $\varvec{W}^\top \varvec{W} = \varvec{R}^\top \varvec{R}$. We have

$$\begin{aligned} \bigl \Vert \varvec{y} {-} \varvec{W} \varvec{x} \bigr \Vert ^2 {=} \bigl \Vert \varvec{R}(\varvec{t}-\varvec{x}) \bigr \Vert ^2 {+} \bigl \Vert \varvec{y} {-} \varvec{W} \varvec{t} \bigr \Vert ^2 \quad \mathrm{for\,all}\;\varvec{x} \in \mathbb {R}^{112}, \end{aligned}$$

where $\varvec{t} := (\varvec{W}^\top \varvec{W})^{-1} \varvec{W}^\top \varvec{y} \in \mathbb {R}^{112}$.

Proof

Since $\varvec{W}$ has full column rank, the matrix $\varvec{W}^\top \varvec{W}$ is symmetric positive-definite and, in particular, non-singular. The existence and uniqueness of $\varvec{R}$ follow from the Cholesky factorization of $\varvec{W}^\top \varvec{W}$ (cf. [6, Theorem 4.2.5]). Since $\varvec{W} (\varvec{W}^\top \varvec{W})^{-1} \varvec{W}^\top $ is the orthogonal projection onto the range of $\varvec{W}$, we have $\langle \varvec{W} \varvec{x}, \varvec{y} - \varvec{W} \varvec{t} \rangle = 0$ for all $\varvec{x} \in \mathbb {R}^{112}$, hence $\Vert \varvec{y} - \varvec{W} \varvec{x} \Vert ^2 = \Vert \varvec{W}(\varvec{t}-\varvec{x}) + \varvec{y} - \varvec{W} \varvec{t} \Vert ^2 = \Vert \varvec{W}(\varvec{t}-\varvec{x}) \Vert ^2 + \Vert \varvec{y} - \varvec{W} \varvec{t} \Vert ^2$ for all $\varvec{x} \in \mathbb {R}^{112}$. Finally, we have $\Vert \varvec{W}(\varvec{t}-\varvec{x}) \Vert ^2 = (\varvec{t}-\varvec{x})^\top \varvec{W}^\top \varvec{W}(\varvec{t}-\varvec{x}) = (\varvec{t}-\varvec{x})^\top \varvec{R}^\top \varvec{R}(\varvec{t}-\varvec{x}) = \Vert \varvec{R}(\varvec{t}-\varvec{x}) \Vert ^2$ for all $\varvec{x} \in \mathbb {R}^{112}$. $\square $

Remark 7

Consider the situation of Lemma 3. By the thin/reduced QR factorization of $\varvec{W}$, there exists a unique matrix $\varvec{Q} \in \mathbb {R}^{m \times 112}$ with orthonormal columns and a unique upper triangular matrix $\varvec{R} \in \mathbb {R}^{112 \times 112}$ with positive diagonal elements such that $\varvec{W} = \varvec{Q} \varvec{R}$ (cf. [6, Theorem 5.2.2]). We have $\varvec{W}^\top \varvec{W} = \varvec{R}^\top \varvec{R}$ (in particular, $\varvec{R}$ is the Cholesky factor of $\varvec{W}^\top \varvec{W}$) and $\varvec{t} = (\varvec{W}^\top \varvec{W})^{-1} \varvec{W}^\top \varvec{y} = \varvec{R}^{-1} \varvec{Q}^\top \varvec{y}$.

By Lemma 3, we have $\mathcal {C}_{\varvec{W}}(\varvec{y}, B) = \mathcal {C}_{\varvec{R}}(\varvec{R}\varvec{t}, B - \Vert \varvec{y} - \varvec{W} \varvec{t} \Vert ^2)$. Let $(\varvec{c}, \varvec{d}) \in \{0,1\}^{56}$ and $\varvec{x} := \Delta (\varvec{c}, \varvec{d}) \in \{\pm 1\}^{112}$. Then $(\varvec{c}, \varvec{d}) \in \mathcal {C}_{\varvec{W}}(\varvec{y}, B)$ if and only if

$$\begin{aligned} \bigl \Vert \varvec{R}(\varvec{t}-\varvec{x}) \bigr \Vert ^2 = \sum _{i=1}^{112} \left( \sum _{j=i}^{112} r_{i,j} (t_j - x_j) \right) ^2 \le B - \bigl \Vert \varvec{y} - \varvec{W} \varvec{t} \bigr \Vert ^2 . \end{aligned}$$

(23)

In principle, we could enumerate all vectors $\varvec{x} \in \{\pm 1\}^{112}$ satisfying (23) using backtracking (cf. [9, Section 7.2.2, Algorithm B]). If the elements of $\varvec{x}$ are traversed in the order $x_{112}, x_{111}, \dotsc , x_1$, then partial assignments $x_s \cdots x_{112}$ violating the condition

$$\begin{aligned} \sum _{i=s}^{112} \left( \sum _{j=i}^{112} r_{i,j} (t_j - x_j) \right) ^2 \le B - \bigl \Vert \varvec{y} - \varvec{W} \varvec{t} \bigr \Vert ^2 \end{aligned}$$

can be rejected immediately (without trying further values for $x_1 \cdots x_{s-1}$). For each vector $\varvec{x} \in \{\pm 1\}^{112}$ satisfying (23) we could then compute the preimage under $\Delta $ to obtain the corresponding key candidates (cf. Remark 1). However, this approach is impractical due to the size of $\{\pm 1\}^{112}$. Therefore, we enumerate the keys $\varvec{k} = (\varvec{c}, \varvec{d}) \in \{0,1\}^{56}$ directly. In order to make this approach work, we have to reorder the components of $\varvec{x} = \Delta (\varvec{k}) = \Delta (\varvec{c}, \varvec{d})$ and the columns of $\varvec{W}$ have to be reordered accordingly (before applying Lemma 3).

Let $\varvec{k} = k_1 \cdots k_{56} = (\varvec{c}, \varvec{d}) = c_1 \cdots c_{28} \, d_1 \cdots d_{28} \in \{0,1\}^{56}$. First we choose a permutation $\pi :[56] \rightarrow [56]$ that determines the order $k_{\pi (1)}, k_{\pi (2)}, \dotsc , k_{\pi (56)}$ in which we want to traverse the bits of $\varvec{k}$ in the enumeration procedure. By Remark 1 we can keep one bit of $\varvec{c}$ and one bit of $\varvec{d}$ fixed, so we move those bits to the front and do not change them during the enumeration. For example, we may choose $\pi (1) = 1$ and $\pi (2) = 29$, hence $k_{\pi (1)} = c_1$ and $k_{\pi (2)} = d_1$. We proceed by choosing bits as to maximize the number of components in $\Delta (\varvec{k})$ that are determined by the current choices. In other words, we pick nodes in the graph of Fig. 1 such that the number of edges between the chosen nodes is maximized. For example, we may choose

$$\begin{aligned} \pi = \left( \begin{array}{llllllllllll} 1 &{} 2 &{} 3 &{} 4 &{} \cdots &{} 28 &{} 29 &{} 30 &{} 31 &{} \cdots &{} 55 &{} 56 \\ 1 &{} 29 &{} 2 &{} 3 &{} \cdots &{} 27 &{} 28 &{} 30 &{} 31 &{} \cdots &{} 55 &{} 56\\ \end{array}\right) . \end{aligned}$$

Next we determine a permutation matrix $\varvec{P} \in \mathbb {R}^{112 \times 112}$ and integers $s_1=s_2=113> s_3> \cdots > s_{56} = 1$ such that for all $\ell \in [56]$ the components of $\varvec{x} := \varvec{P}^\top \Delta (\varvec{k}) \in \{\pm 1\}^{112}$, which are determined by $k_{\pi (1)}, \dotsc , k_{\pi (\ell )}$, are the trailing components $x_{s_\ell }, \dotsc , x_{112}$ of $\varvec{x}$. In our example, we have

$$\begin{aligned} s_3&= 112 ,&s_{3+i}&= 112-2i&\quad \text {for}\; i \in [24] ,\\ s_{28}&= 61 ,&s_{29}&= 57 , \\ s_{30}&= 56 ,&s_{30+i}&= 56-2i&\quad \text {for}\; i \in [24] ,\\ s_{55}&= 5 ,&s_{56}&= 1 , \end{aligned}$$

and we may choose $\varvec{P}$ such that

$$\begin{aligned} x_1, x_2, x_3, x_4&= (-1)^{d_1 \oplus d_{28}}, (-1)^{d_2 \oplus d_{28}},(-1)^{d_{26}\oplus d_{28}},\\&\quad (-1)^{d_{27} \oplus d_{28}} , \\ x_5, x_6, x_7&= (-1)^{d_1 \oplus d_{27}}, (-1)^{d_{25} \oplus d_{27}}, (-1)^{d_{26} \oplus d_{27}} , \\ x_{56-2i}, x_{57-2i}&= (-1)^{d_i \oplus d_{i+2}}, (-1)^{d_{i+1} \oplus d_{i+2}} \quad \text {for}\, i \in [24] , \\ x_{56}&= (-1)^{d_1 \oplus d_2} , \\ x_{57}, x_{58}, x_{59}, x_{60}&= (-1)^{c_1 \oplus c_{28}}, (-1)^{c_2 \oplus c_{28}}, (-1)^{c_{26} \oplus c_{28}},\\&\quad (-1)^{c_{27} \oplus c_{28}} , \\ x_{61}, x_{62}, x_{63}&= (-1)^{c_1 \oplus c_{27}}, (-1)^{c_{25} \oplus c_{27}}, (-1)^{c_{26} \oplus c_{27}} , \\ x_{112-2i}, x_{113-2i}&= (-1)^{c_i \oplus c_{i+2}}, (-1)^{c_{i+1} \oplus c_{i+2}} \quad \text {for}\, i \in [24], \\ x_{112}&= (-1)^{c_1 \oplus c_2} . \end{aligned}$$

Applying Lemma 3 to the column-permuted matrix $\varvec{W} \varvec{P}$ and the observation $\varvec{y} \in \mathbb {R}^m$, we obtain an upper triangular matrix $\varvec{R} \in \mathbb {R}^{112 \times 112}$ and $\varvec{t} \in \mathbb {R}^{112}$ such that

$$\begin{aligned} \bigl \Vert \varvec{y} - \varvec{W} \Delta (\varvec{k}) \bigr \Vert ^2&= \bigl \Vert \varvec{y} - \varvec{W} \varvec{P} \varvec{P}^\top \Delta (\varvec{k}) \bigr \Vert ^2 \\&= \bigl \Vert \varvec{R}(\varvec{t} - \varvec{x}) \bigr \Vert ^2 + \bigl \Vert \varvec{y} - \varvec{W} \varvec{P} \varvec{t} \bigr \Vert ^2 . \end{aligned}$$

Therefore, we have $\varvec{k} \in \mathcal {C}_{\varvec{W}}(\varvec{y}, B)$ if and only if

$$\begin{aligned} \rho _\ell&:= \sum _{i=s_\ell }^{112} \left( \sum _{j=i}^{112} r_{i,j} (t_j - x_j) \right) ^2 \\&\le B - \bigl \Vert \varvec{y} - \varvec{W} \varvec{P} \varvec{t} \bigr \Vert ^2 \quad \text {for all}\; \ell \in [56]. \end{aligned}$$

Note that $\rho _\ell $ depends on the components $x_{s_\ell }, \dotsc , x_{112}$ of $\varvec{x} = \varvec{P}^\top \Delta (\varvec{k})$ which are determined completely by $k_{\pi (1)}, \dotsc , k_{\pi (\ell )}$. Furthermore, $\rho _\ell $ can be computed recursively, since $\rho _1 = \rho _2 = 0$ and

$$\begin{aligned} \rho _\ell = \rho _{\ell -1} + \sum _{i=s_\ell }^{s_{\ell -1}-1} \left( \sum _{j=i}^{112} r_{i,j} (t_j - x_j) \right) ^2 \quad \text {for}\; 3 \le \ell \le 56. \end{aligned}$$

Using the backtracking scheme described in [9, Section 7.2.2, Algorithm B], we obtain the following algorithm.

Algorithm 2

Input: A matrix $\varvec{W} \in \mathbb {R}^{m \times 112}$ of full column rank, a vector $\varvec{y} \in \mathbb {R}^m$, and a bound $B \in \mathbb {R}_{\ge 0}$.

Output: The set of key candidates $\mathcal {C}_{\varvec{W}}(\varvec{y}, B)$.

1.:

[Initialize.] Set $\mathcal {C} \leftarrow \varnothing $, $\ell \leftarrow 3$, $\varvec{k} \leftarrow \varvec{0}_{56}$, $\varvec{x} \leftarrow \varvec{1}_{112}$, and $\varvec{\rho } \leftarrow \varvec{0}_{56}$.

2.:

[Preprocess.] Set $\varvec{W} \leftarrow \varvec{W} \varvec{P}$. Compute an upper triangular matrix $\varvec{R} \in \mathbb {R}^{112 \times 112}$ such that $\varvec{W}^\top \varvec{W} = \varvec{R}^\top \varvec{R}$. Set $\varvec{t} \leftarrow (\varvec{W}^\top \varvec{W})^{-1} \varvec{W}^\top \varvec{y}$ and $B \leftarrow B - \Vert \varvec{y} - \varvec{W} \varvec{t} \Vert ^2$. If $B < 0$, return $\mathcal {C}$ and stop.

3.:

[Enter level $\ell $.] If $\ell = 57$, set $(\varvec{c}, \varvec{d}) \leftarrow \varvec{k}$, set $\mathcal {C} \leftarrow \mathcal {C} \cup \{(\varvec{c}, \varvec{d}), (\overline{\varvec{c}}, \varvec{d}), (\varvec{c}, \overline{\varvec{d}}), (\overline{\varvec{c}}, \overline{\varvec{d}}) \}$, and go to step 6. Otherwise set $k_{\pi (\ell )} \leftarrow 0$.

4.:

[Try $k_{\pi (\ell )}$.] Set $x_i \leftarrow (\varvec{P}^\top \Delta (\varvec{k}))_i$ for $i = s_\ell , \dotsc , s_{\ell -1}-1$ and set

$$\begin{aligned} \rho _\ell \leftarrow \rho _{\ell -1} + \sum _{i=s_\ell }^{s_{\ell -1}-1} \biggl ( \sum _{j=i}^{112} r_{i,j} (t_j - x_j) \biggr )^2 . \end{aligned}$$

If $\rho _\ell \le B$, set $\ell \leftarrow \ell + 1$ and go to step 3.

5.:

[Try again.] If $k_{\pi (\ell )} = 0$, set $k_{\pi (\ell )} \leftarrow 1$ and go to step 4.

6.:

[Backtrack.] Set $\ell \leftarrow \ell - 1$. If $\ell \ge 3$, go to step 5. Otherwise return $\mathcal {C}$ and stop.

Remark 8

We note some possible variations and optimizations of Algorithm 2.

(a)
To avoid repeated computations, Algorithm 2 can be modified as follows (cf. [5, Appendix B] and [10, Appendix A]). Let $\sigma _{i, h} := \sum _{j=h}^{112} r_{i,j} (t_j - x_j)$ for $i \in [112]$ and $h \in [113]$. The value $\sigma _{i, h}$ can be computed recursively, since $\sigma _{i, 113} = 0$ and $\sigma _{i, h} = \sigma _{i, h+1} + r_{i, h} (t_h - x_h)$ for all $i, h \in [112]$. Further, we have $\rho _\ell = \rho _{\ell -1} + \sum _{i=s_\ell }^{s_{\ell -1}-1} \sigma _{i,i}^2$ for $3 \le \ell \le 56$. By using these recurrence relations and by reusing values of $\sigma _{i,h}$ that are still valid during the enumeration, the partial squared norms $\rho _{\ell }$ can be computed with fewer operations. For details, see Algorithm 3 in the appendix.
(b)
Using pruning [12, 13], we can heuristically reject partial assignments $k_{\pi (1)} \cdots k_{\pi (\ell )}$ during the enumeration if the partial squared norm $\rho _\ell $ is already so large that $\rho _{56} \le B$ becomes unlikely for any choice of $k_{\pi (\ell +1)} \cdots k_{\pi (56)}$. This can be done by replacing the if-condition “$\rho _\ell \le B$” in step 4 by “$\rho _\ell \le B_{\ell }$” for suitable bounds $0 \le B_3 \le \cdots \le B_{56} = B$. In our experiments we used the bounds
$$\begin{aligned} B_\ell := {\left\{ \begin{array}{ll} \frac{\ell +17}{54} B , &{} \text {if}\quad 3 \le \ell \le 36 , \\ B , &{} \text {if}\quad 37 \le \ell \le 56 . \end{array}\right. } \end{aligned}$$
Note that we cannot use extreme pruning [5], since we want to find all (or almost all) key candidates in $\mathcal {C}_{\varvec{W}}(\varvec{y}, B)$.
(c)
For a given weight matrix $\varvec{W}$, the running time of Algorithm 2 may be optimized by choosing different permutations $\pi $ and $\varvec{P}$ (in compliance with the conditions outlined above). Preprocessing $\varvec{W}$ with general unimodular transformations (e.g. lattice basis reduction, cf. [4, (2.12)]) seems not possible in our setting.
(d)
To find N best key candidates for the evaluation function $\eta _{\varvec{W}, \varvec{y}}$, Algorithm 2 can be modified as follows. We start the algorithm with $B := \infty $. The set $\mathcal {C}$ is replaced by a list that is ordered by $\eta _{\varvec{W}, \varvec{y}}$ and keeps only the N best key candidates. Each time a key candidate gets evicted from $\mathcal {C}$, the bound B can be updated according to the currently worst key candidate in $\mathcal {C}$.

4.2 Estimation of the weight matrix

The weight matrix $\varvec{W} \in \mathbb {R}^{m \times 112}$ of Leakage Model 3 can be estimated in a profiling phase using linear regression if observations for several known keys are available.

Let $N_{\text {prf}} \gg 112$. Assume we are given observations $\varvec{y}_{\text {prf}, j} \in \mathbb {R}^m$ of Leakage Model 3 for known and randomly chosen keys $\varvec{k}_{\text {prf}, j} = (\varvec{c}_{\text {prf}, j}, \varvec{d}_{\text {prf}, j}) \in \{0,1\}^{56}$ for $j \in [N_\text {prf}]$. We denote by $\varvec{X}_{\text {prf}} \in \{\pm 1\}^{112 \times N_\text {prf}}$ the matrix with columns $\varvec{x}_{\text {prf}, j} := \Delta (\varvec{c}_{\text {prf}, j}, \varvec{d}_{\text {prf}, j})$ and by $\varvec{Y}_{\text {prf}} \in \mathbb {R}^{m \times N_\text {prf}}$ the matrix with columns $\varvec{y}_{\text {prf}, j}$ for $j \in [N_\text {prf}]$.

We want to find an approximation $\widetilde{\varvec{W}} \in \mathbb {R}^{m \times 112}$ of $\varvec{W}$ such that $\varvec{Y}_{\text {prf}} \approx \widetilde{\varvec{W}} \varvec{X}_{\text {prf}}$. Since the error vector in Leakage Model 3 has independent components, we may estimate the rows of $\varvec{W}$ independently. Let $i \in [m]$ and let $\varvec{y}_{\text {prf}, i} \in \mathbb {R}^{1 \times N_\text {prf}}$ denote the i-th row of $\varvec{Y}_{\text {prf}}$. We approximate the i-th row of $\varvec{W}$ by a least squares estimate, i.e. by a vector $\varvec{w} \in \mathbb {R}^{1 \times 112}$ minimizing

$$\begin{aligned} \bigl \Vert \varvec{y}_{\text {prf}, i} - \varvec{w} \varvec{X}_{\text {prf}} \bigr \Vert ^2 . \end{aligned}$$

(24)

Since $N_{\text {prf}} \gg 112$, we may assume that $\varvec{X}_\text {prf}$ has full row rank and $\varvec{X}_\text {prf} \varvec{X}_\text {prf}^\top \in \mathbb {R}^{112 \times 112}$ is non-singular. This implies that (24) is minimized by the (unique) vector $\widetilde{\varvec{w}}_i := \varvec{y}_{\text {prf}, i} \varvec{X}_\text {prf}^\top (\varvec{X}_\text {prf} \varvec{X}_\text {prf}^\top )^{-1} \in \mathbb {R}^{1 \times 112}$ (cf. [6, Section 5.3.1]). Combining the estimated rows of the weight matrix, we obtain the matrix

$$\begin{aligned} \widetilde{\varvec{W}} := \varvec{Y}_\text {prf} \varvec{X}_\text {prf}^\top (\varvec{X}_\text {prf} \varvec{X}_\text {prf}^\top )^{-1} \in \mathbb {R}^{m \times 112} . \end{aligned}$$

(25)

Since the “true”, unknown weight matrix $\varvec{W}$ is assumed to have full column rank, we may hope that the same holds for the estimated weight matrix $\widetilde{\varvec{W}}$. We just mention that this was indeed the case in our experiments with real measurements reported in Sect. 4.3.1.

4.3 Experiments

We performed experiments using real and simulated measurements.

4.3.1 Real measurements

The authors of [7] provided us with their measurement data. The provided data set consists of a profiling set and an attack set. The measurements are already aligned and trimmed to $m = 460$ points of interest.

The profiling set comprises $N_\text {prf} = 882547$ measurements of DES operations with random keys $\varvec{k}_{\text {prf}, j} = (\varvec{c}_{\text {prf}, j}, \varvec{d}_{\text {prf}, j}) \in \{0,1\}^{56}$ for $j \in [N_\text {prf}]$. We denote by $\varvec{Y}_{\text {prf}} \in \mathbb {R}^{m \times N_\text {prf}}$ the matrix of measurements (arranged in columns) and by $\varvec{X}_{\text {prf}} \in \{\pm 1\}^{112 \times N_\text {prf}}$ the matrix with columns $\Delta (\varvec{c}_{\text {prf}, j}, \varvec{d}_{\text {prf}, j})$.

The attack set comprises $N_\text {att} = 247088$ measurements of DES operations with random keys $\varvec{k}_{\text {att}, j} \in \{0,1\}^{56}$ for $j \in [288]$, where $N_{\text {att}, j} \in \{761, \dotsc , 927\}$ measurements have been performed with key $\varvec{k}_{\text {att}, j}$. (The authors of [7] also carried out measurements for so-called weak keys, but we do not consider them in this article.) We denote by $\varvec{Y}_{\text {att}, j} \in \mathbb {R}^{m \times N_{\text {att}, j}}$ the matrix of measurements with key $\varvec{k}_{\text {att}, j}$ (arranged in columns) for $j \in [288]$.

In order to fit the measurement data to Leakage Model 3, we preprocessed the data sets as follows. Using the mean

$$\begin{aligned} \widetilde{\varvec{\mu }}_{\text {prf}} := N_{\text {prf}}^{-1} \varvec{Y}_{\text {prf}} \varvec{1}_{N_{\text {prf}}} \in \mathbb {R}^m \end{aligned}$$

of the measurements $\varvec{Y}_{\text {prf}}$ of the profiling set, we centered $\varvec{Y}_{\text {prf}}$ and $\varvec{Y}_{\text {att}, j}$ by replacing

$$\begin{aligned} \varvec{Y}_\text {prf}&\leftarrow \varvec{Y}_\text {prf} - \widetilde{\varvec{\mu }}_\text {prf} \, \varvec{1}_{N_\text {prf}}^\top \quad \text {and}\\ \varvec{Y}_{\text {att}, j}&\leftarrow \varvec{Y}_{\text {att}, j} - \widetilde{\varvec{\mu }}_\text {prf} \, \varvec{1}_{N_{\text {att}, j}}^\top \quad \text {for}\;\, j \in [288]. \end{aligned}$$

Using the empirical covariance matrix

$$\begin{aligned} \widetilde{\varvec{\Sigma }}_{\text {prf}} := N_{\text {prf}}^{-1} \varvec{Y}_{\text {prf}} \varvec{Y}_{\text {prf}}^\top \in \mathbb {R}^{m \times m} \end{aligned}$$

of the (centered) measurements $\varvec{Y}_{\text {prf}}$ of the profiling set, we decorrelated $\varvec{Y}_{\text {prf}}$ and $\varvec{Y}_{\text {att}, j}$ (Mahalanobis whitening) by replacing

$$\begin{aligned} \varvec{Y}_\text {prf}&\leftarrow \widetilde{\varvec{\Sigma }}{}^{-1/2} \varvec{Y}_\text {prf} \quad \text {and}\\ \varvec{Y}_{\text {att}, j}&\leftarrow \widetilde{\varvec{\Sigma }}{}^{-1/2} \varvec{Y}_{\text {att}, j} \quad \text {for}\,\; j \in [288]. \end{aligned}$$

Finally, we computed the averaged measurements

$$\begin{aligned} \overline{\varvec{y}}_j := N_{\text {att}, j}^{-1} \, \varvec{Y}_{\text {att}, j} \, \varvec{1}_{N_{\text {att}, j}} \quad \text {for}\;\, j \in [288]. \end{aligned}$$

Due to a slight shift in the averaged measurements, we also recentered them amongst each other by replacing

$$\begin{aligned} (\overline{\varvec{y}}_1, \dotsc , \overline{\varvec{y}}_{288}) \leftarrow (\overline{\varvec{y}}_1, \dotsc , \overline{\varvec{y}}_{288}) (\varvec{I}_{288} - 288^{-1} \, \varvec{1}_{288} \, \varvec{1}_{288}^\top ) . \end{aligned}$$

Using the preprocessed data of the profiling set, we computed an estimate $\widetilde{\varvec{W}} \in \mathbb {R}^{m \times 112}$ of the weight matrix according to Leakage Model 3 as in (25). A matrix plot of $\widetilde{\varvec{W}}$ is shown in Fig. 4. The plot illustrates the locations where updates of the C- and D-register (rotation by 1 resp. 2 positions) take place. In particular, it is visible that the measurement covers a DES-encryption followed by a full DES-decryption, presumably as a countermeasure against fault attacks. The upper half and the mirrored lower half of the plot bears some resemblance with the weight matrix of Leakage Model 2 (cf. Fig. 2). This visual structure of $\widetilde{\varvec{W}}$ is a first indication that Leakage Model 3 is adequate for the measurements.

We implemented Algorithm 2 in the Julia programming language [2] with the optimizations of Remark 8(a) and (b). We computed the key ranks

$$\begin{aligned} \mathcal {R}_i := \mathcal {R}_{\widetilde{\varvec{W}}, \overline{\varvec{y}}_i}(\varvec{k}_{\text {att}, i}) \end{aligned}$$

for 287 of the 288 averaged measurements $\overline{\varvec{y}}_i$ on a standard computer by explicit enumeration. The distribution of the computed ranks $\mathcal {R}_i$ and the single-core running times is described in Table 2.

Table 2 Empirical distribution of the logarithmic key ranks and key enumeration running time for 287/288 averaged, real measurements

Full size table

One half of the computed ranks are below $2^{15}$ and 75% of them are below $2^{21}$. The key enumerations finished in under 7 minutes in one half of the cases using a single CPU-core. A log-log plot of the running times and key ranks is shown in Fig. 5. With such low key ranks we note that the classic meet-in-the-middle approach against 3-key Triple DES has very moderate running time. For average keys we can expect a running time of roughly $2^{30}$ DES encryptions/decryptions.

The experiments demonstrate that Leakage Model 3 is adequate for the measurement data. Although the leakage model might only approximate the real leakage, the attack is successful. We note that, apart from model errors, there may be further obstacles to a successful attack. If the number of measurements in the profiling phase is insufficient, the estimated weight matrix may differ significantly from the “true” weight matrix. If the number of measurements in the attack phase is insufficient, the noise of the averaged measurements may be too large.

4.3.2 Simulated measurements

In order to investigate the influence of the error distribution on the key rank, we performed a series of experiments with simulated measurements in different noise regimes.

For the simulated measurements, we used the weight matrix $\widetilde{\varvec{W}} \in \mathbb {R}^{m \times 112}$ with $m=460$ estimated from the real measurements as described in Sect. 4.3.1 (cf. Fig. 4) and generated the observations as samples from Leakage Model 3, where the keys were drawn uniformly from $\{0,1\}^{56}$ and the errors were drawn from a centered normal distribution on $\mathbb {R}^m$ with covariance matrix $\sigma ^2 \varvec{I}_m$. In contrast to Sect. 4.3.1, we used the observations directly without averaging over several observations.

For each $\sigma \in \{ 0.02, 0.03, \dotsc , 0.07\}$, we generated 100 observations and computed the corresponding key ranks explicitly using our implementation of Algorithm 2 with the optimizations of Remark 8(a) and (b). The distributions of the computed ranks and the single-core running times are described in Table 3.

Table 3 Empirical distribution of the logarithmic key ranks and key enumeration running times for 100 simulated measurements per value of $\sigma $

Full size table

Comparing the distributions of Tables 2 and 3, we recognize that the averaged observations in Sect. 4.3.1 behave similarly to observations of Leakage Model 3 with $\sigma \approx 0.07$.

For larger values of $\sigma $, we have resorted to the Monte-Carlo heuristic (22). For each $\sigma \in \{ 0.2, 0.3, \dotsc , 0.7\}$, we generated 100 observations and estimated the corresponding key ranks using the Monte-Carlo heuristic (22) with an appropriate number N of random keys. The distribution of the estimated ranks is described in Table 4.

Table 4 Empirical distribution of the estimated logarithmic key ranks for 100 simulated measurements per value of $\sigma $

Full size table

4.4 Theoretical estimation of the remaining entropy

Similar to the discrete case we use mutual information as a measure for the uncertainty about the key if an observation is given. However, mutual information of continuous random variables should be treated with care, since some of its properties are different compared to the discrete case (cf. [3]). The following lemma provides an upper bound for the mutual information of $\varvec{Y}$ and $(\varvec{C}, \varvec{D})$ in Leakage Model 3.

Lemma 4

Let $\varvec{K} = (\varvec{C}, \varvec{D})$ and $\varvec{Y} = \varvec{W} \Delta (\varvec{C}, \varvec{D}) + \varvec{\varepsilon }$ with ${{\,\mathrm{Cov}\,}}(\varvec{\varepsilon }) = \sigma ^2 \varvec{I}_m$ as in Leakage Model 3. Then

$$\begin{aligned} {{\,\mathrm{I}\,}}(\varvec{Y}; \varvec{C}, \varvec{D}) \le \frac{1}{2} \log _2\bigl ( \det (\sigma ^{-2} \varvec{W}^\top \varvec{W} + \varvec{I}_{112}) \bigr ) . \end{aligned}$$

Proof

We have ${{\,\mathrm{I}\,}}(\varvec{Y}; \varvec{C}, \varvec{D}) = {{\,\mathrm{H}\,}}(\varvec{Y}) - H(\varvec{\varepsilon })$. Let $\varvec{\Sigma } = {{\,\mathrm{Cov}\,}}(\varvec{Y})$. By Lemma 1(d), we have $\varvec{\Sigma } = \varvec{W} \varvec{W}^\top + \sigma ^2 \varvec{I}_m$. By [3, Theorem 8.6.5], we obtain ${{\,\mathrm{H}\,}}(\varvec{Y}) \le \frac{1}{2} \log _2\bigl ( \det (2 \pi e \varvec{\Sigma }) \bigr )$ and ${{\,\mathrm{H}\,}}(\varvec{\varepsilon }) = \frac{1}{2} \log _2\bigl ( \det (2 \pi e \sigma ^2 \varvec{I}_m) \bigr )$. This implies

$$\begin{aligned} {{\,\mathrm{I}\,}}(\varvec{Y}; \varvec{C}, \varvec{D})&\le \frac{1}{2} \log _2\bigl ( \det (2 \pi e \varvec{\Sigma }) / \det (2 \pi e \sigma ^2 \varvec{I}_m) \bigr ) \\&= \frac{1}{2} \log _2\bigl ( \det (\sigma ^{-2} \varvec{W} \varvec{W}^\top + \varvec{I}_m) \bigr ) . \end{aligned}$$

Since $\varvec{W} \varvec{W}^\top $ and $\varvec{W}^\top \varvec{W}$ have identical non-zero eigenvalues, we get

$$\begin{aligned} \det (\sigma ^{-2} \varvec{W} \varvec{W}^\top + \varvec{I}_m) = \det (\sigma ^{-2} \varvec{W}^\top \varvec{W} + \varvec{I}_{112}) \end{aligned}$$

and the assertion follows. $\square $

Remark 9

The validity of the upper bound in Lemma 4 only depends on the leakage model and the properties of the random vector $\Delta (\varvec{C}, \varvec{D})$ in Lemma 1. Although $\Delta (\varvec{C}, \varvec{D})$ is specific to the DES setting, a similar upper bound could be derived for other cryptographic implementations if a leakage model with the respective properties is applicable.

Based on the experiments in Sect. 4.3.2, we propose the heuristic formula

$$\begin{aligned}&{{\,\mathrm{H}\,}}(\varvec{C}, \varvec{D} \mid \varvec{Y}) \nonumber \\&\approx \max \Bigl \{ 2 ,\; 56 - \tfrac{1}{2} \log _2\bigl ( \det (\sigma ^{-2} \varvec{W}^\top \varvec{W} + \varvec{I}_{112}) \bigr ) \Bigr \} \end{aligned}$$

(26)

for the remaining entropy. Figures 6 and 7 compare this heuristic with the results of the experiments with simulated measurements reported in Tables 3 and 4, respectively.

4.5 Isolated consideration of the C- and D-register

In Sect. 3.5 we looked at an approach that considers the C- and D-register separately. We argued that the mutual information is much lower in this setting. However, in the continuous case with error the situation is different. On the one hand, we have an additional error so that each observation gives less information compared to Leakage Model 2. On the other hand, we have much more POIs in Leakage Model 3. The general strategy is again as follows:

1.
Define an appropriate evaluation function that depends only on the key part $\varvec{c}$ in the C-register. Find a set $\mathcal {C}$ of likely candidates for the C-register.
2.
Define an appropriate evaluation function that depends only on the key part $\varvec{d}$ in the D-register. Find a set $\mathcal {D}$ of likely candidates for the D-register.
3.
Check all combinations $(\varvec{c}, \varvec{d}) \in \mathcal {C} \times \mathcal {D}$.

The work load of this approach is again bounded by $2^{27}$ in step 1 and 2, but in step 3 we have to check all combinations. Here we consider the following heuristic. We replace the random variable $\Delta (\varvec{D})$ by a normal distributed random variable $\varvec{N}_2$ with mean $\varvec{0}_m$ and covariance matrix $\varvec{I}_m$ in $\varvec{Y}$, i.e. we set

$$\begin{aligned} \varvec{Y}_1 := \varvec{W}_1 \Delta (\varvec{C}) + \varvec{W}_2 \varvec{N}_2 + \varvec{\varepsilon } . \end{aligned}$$

As an indication of the success of an evaluation function that depends only on the key part $\varvec{c}$ of the C-register, we compute the mutual information of $\varvec{C}$ and the observation $\varvec{Y}_1$. First we normalize the new resulting error by setting

$$\begin{aligned} \varvec{Y}_1' := \varvec{\Sigma }_2^{-1/2} \varvec{Y}_1 , \end{aligned}$$

where $\varvec{\Sigma }_2 := {{\,\mathrm{Cov}\,}}(\varvec{W}_2 \varvec{N}_2 + \varvec{\varepsilon }) = \varvec{W}_2 \varvec{W}_2^\top + \sigma ^2 \varvec{I}_m$. Analogously to Sect. 4.1, we define the evaluation function for the key part $\varvec{c}$ in the C-register as

$$\begin{aligned}&\eta _{\varvec{W}, \varvec{y}, 1} :\{0,1\}^{28} \rightarrow \mathbb {R}_{\ge 0} ,\nonumber \\&\quad \varvec{c} \mapsto \bigl \Vert \varvec{\Sigma }_2^{-1/2} \varvec{y} - \varvec{\Sigma }_2^{-1/2} \varvec{W}_1 \Delta (\varvec{c}) \bigr \Vert ^2 . \end{aligned}$$

(27)

We assume that (26) can be applied analogously and get

$$\begin{aligned} \begin{aligned}&{{\,\mathrm{H}\,}}(\varvec{C} \mid \varvec{Y}_1') \\&\approx \max \Bigl \{ 1 ,\; 28- \tfrac{1}{2} \log _2\bigl ( \det (\varvec{\Sigma }_2^{-1/2} \varvec{W}_1\varvec{W}_1^\top \varvec{\Sigma }_2^{-1/2} + \varvec{I}_m) \bigr ) \Bigr \} \\&= \max \Bigl \{ 1 ,\; 28 - \tfrac{1}{2} \log _2\bigl ( \det (\sigma ^{-2} \varvec{W} \varvec{W}^\top + \varvec{I}_m) \bigl / \det (\sigma ^{-2} \varvec{W}_2 \varvec{W}_2^\top + \varvec{I}_m) \bigr ) \Bigr \} . \end{aligned} \end{aligned}$$

(28)

We expect that the work load of step 3 is roughly of size $2^{{{\,\mathrm{H}\,}}(\varvec{C} \mid \varvec{Y}_1') + {{\,\mathrm{H}\,}}(\varvec{D} \mid \varvec{Y}_2')}$ in the algorithm above, where $\varvec{Y}_2'$ is defined analogously for the D-register.

Applying heuristic (28) to the weight matrix $\widetilde{\varvec{W}}$ estimated in Sect. 4.3.1 for the real measurements and $\sigma = 0.07$ as estimated in Sect. 4.3.2, we obtain ${{\,\mathrm{H}\,}}(\varvec{C} \mid \varvec{Y}_1') \approx 6.55$ and ${{\,\mathrm{H}\,}}(\varvec{D} \mid \varvec{Y}_2') \approx 16.64$. We computed the ranks of the key parts in the C- and D-register with respect to $\eta _{\widetilde{\varvec{W}}, \overline{\varvec{y}}_i, 1}$ and the analogously defined evaluation function $\eta _{\widetilde{\varvec{W}}, \overline{\varvec{y}}_i, 2}$, respectively. The distribution of the computed ranks is described in Table 5. The average logarithmic key ranks were 6.55 and 17.26 for the C- and D-register in good agreement with heuristic (28).

Table 5 Empirical distribution of the logarithmic key ranks of the key parts in the C- and D-register for 288 averaged, real measurements (cf. Sect. 4.3.1)

Full size table

Remark 10

The key ranks in the experiments vary a lot. Therefore, in practice, the sets $\mathcal {C}$ and $\mathcal {D}$ of likely candidates have to be chosen larger than $2^{{{\,\mathrm{H}\,}}(\varvec{C} \mid \varvec{Y}_1')}$ and $2^{{{\,\mathrm{H}\,}}(\varvec{D} \mid \varvec{Y}_2')}$ in step 1 and 2, respectively, or one has to accept that the algorithm finds the correct combined key only with a certain probability. Algorithm 2 in Sect. 4.1 does not have this drawback.

References

Banaszczyk, W.: New bounds in some transference theorems in the geometry of numbers. Math. Ann. 296(4), 625–636 (1993)
Article MathSciNet MATH Google Scholar
Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: a fresh approach to numerical computing. SIAM Rev. 59(1), 65–98 (2017)
Article MathSciNet MATH Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley, London (2006)
MATH Google Scholar
Fincke, U., Pohst, M.: Improved methods for calculating vectors of short length in a lattice, including a complexity analysis. Math. Comput. 44(170), 463–471 (1985)
Article MathSciNet MATH Google Scholar
Gama, N., Nguyen, P.Q., Regev, O.: Lattice enumeration using extreme pruning. In: Gilbert, H. (ed.) Advances in Cryptology—EUROCRYPT 2010, 29th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Monaco/French Riviera, May 30–June 3, 2010. Proceedings, vol. 6110. Lecture Notes in Computer Science, pp. 257–278. Springer, Berlin (2010)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Heyszl, J., Miller, K., Unterstein, F., Schink, M., Wagner, A., Gieser, H.A., Freud, S., Damm, T., Klein, D., Kügler, D.: Investigating profiled side-channel attacks against the DES key schedule. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2020(3), 22–72 (2020)
Article Google Scholar
Hu, Y., Zhang, C., Zheng, Y., Wagner, M.: Ciphertext and plaintext leakage reveals the entire TDES key. IACR Cryptol. ePrint Arch. 2016, 1143 (2016)
Knuth, D.E.: The Art of Computer Programming. Fascicle 5: Mathematical Preliminaries Redux; Introduction to Backtracking; Dancing Links, vol. 4. Addison-Wesley Professional, Reading (2019)
Liu, M., Nguyen, P.Q.: Solving BDD by enumeration: an update. In: Dawson, E. (ed.) Topics in Cryptology—CT-RSA 2013—The Cryptographers’ Track at the RSA Conference 2013, San Francisco, CA, USA, February 25–March 1, 2013. Proceedings, vol. 7779. Lecture Notes in Computer Science, pp. 293–309. Springer, Berlin (2013)
National Institute of Standards and Technology. FIPS Publication 46-3: Data encryption standard (DES) (1999)
Schnorr, C.-P., Euchner, M.: Lattice basis reduction: improved practical algorithms and solving subset sum problems. Math. Program. 66, 181–199 (1994)
Article MathSciNet MATH Google Scholar
Schnorr, C.-P., Hörner, H.H.: Attacking the Chor-Rivest cryptosystem by improved lattice reduction. In: Guillou, L.C., Quisquater, J.-J. (eds) Advances in Cryptology—EUROCRYPT’95, International Conference on the Theory and Application of Cryptographic Techniques, Saint-Malo, France, May 21–25, 1995, Proceeding, vol. 921. Lecture Notes in Computer Science, pp. 1–12. Springer, Berlin (1995)
Wagner, M., Heyse, S.: Brute-force search strategies for single-trace and few-traces template attacks on the DES round keys of a recent smart card. IACR Cryptol. ePrint Arch. 2017, 614 (2017)
Wagner, M., Heyse, S.: Single-trace template attack on the DES round keys of a recent smart card. IACR Cryptol. ePrint Arch. 2017, 57 (2017)
Wagner, M., Heyse, S.: Improved brute-force search strategies for single-trace and few-traces template attacks on the DES round keys. IACR Cryptol. ePrint Arch. 2018, 937 (2018)
Wagner, M., Yongbo, H., Zhang, C., Zheng, Y.: Comparative study of various approximations to the covariance matrix in template attacks. IACR Cryptol. ePrint Arch. 2016, 1155 (2016)

Download references

Acknowledgements

We thank the authors of [7] for providing us with their measurement data.

Open Access

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Bundesamt für Sicherheit in der Informationstechnik (BSI), Bonn, Germany
Andreas Wiemers & Johannes Mittmann

Authors

Andreas Wiemers
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Mittmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Wiemers.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: An optimized version of Algorithm 2

The following algorithm is a variation of Algorithm 2 with the optimization described in Remark 8(a).

Algorithm 3

Input: A matrix $\varvec{W} \in \mathbb {R}^{m \times 112}$ of full column rank, a vector $\varvec{y} \in \mathbb {R}^m$, and a bound $B \in \mathbb {R}_{\ge 0}$.

Output: The set of key candidates $\mathcal {C}_{\varvec{W}}(\varvec{y}, B)$.

1.:

[Initialize.] Set $\mathcal {C} \leftarrow \varnothing $, $\ell \leftarrow 3$, $\varvec{k} \leftarrow \varvec{0}_{56}$, $\varvec{x} \leftarrow \varvec{1}_{112}$, $\varvec{\rho } \leftarrow \varvec{0}_{56}$, $\varvec{\sigma } \leftarrow \varvec{0}_{112, 113}$, and $\varvec{v} \leftarrow 112 \cdot \varvec{1}_{56}$.

2.:

[Preprocess.] Set $\varvec{W} \leftarrow \varvec{W} \varvec{P}$. Compute an upper triangular matrix $\varvec{R} \in \mathbb {R}^{112 \times 112}$ such that $\varvec{W}^\top \varvec{W} = \varvec{R}^\top \varvec{R}$. Set $\varvec{t} \leftarrow (\varvec{W}^\top \varvec{W})^{-1} \varvec{W}^\top \varvec{y}$ and $B \leftarrow B - \Vert \varvec{y} - \varvec{W} \varvec{t} \Vert ^2$. If $B < 0$, return $\mathcal {C}$ and stop.

3.:

[Enter level $\ell $.] If $\ell = 57$, set $(\varvec{c}, \varvec{d}) \leftarrow \varvec{k}$, set $\mathcal {C} \leftarrow \mathcal {C} \cup \{(\varvec{c}, \varvec{d}), (\overline{\varvec{c}}, \varvec{d}), (\varvec{c}, \overline{\varvec{d}}), (\overline{\varvec{c}}, \overline{\varvec{d}}) \}$, and go to step 6. Otherwise set $k_{\pi (\ell )} \leftarrow 0$ and $v_\ell \leftarrow \max \{v_{\ell -1}, v_\ell \}$.

4.:

[Try $k_{\pi (\ell )}$.] Set $\rho _\ell \leftarrow \rho _{\ell -1}$. For $i \leftarrow s_{\ell -1}-1, s_{\ell -1}-2, \dotsc , s_\ell $, do the following:

a.:: Set $x_i \leftarrow (\varvec{P}^\top \Delta (\varvec{k}))_i$.
b.:: For $j \leftarrow v_\ell , v_\ell -1, \dotsc , i$, set $\sigma _{i, j} \leftarrow \sigma _{i, j+1} + r_{i, j} (t_j - x_j)$.
c.:: Set $\rho _\ell \leftarrow \rho _\ell + \sigma _{i, i}^2$.

If $\rho _\ell \le B$, set $\ell \leftarrow \ell + 1$ and go to step 3.

5.:: [Try again.] If $k_{\pi (\ell )} = 0$, set $k_{\pi (\ell )} \leftarrow 1$ and go to step 4.
6.:: [Backtrack.] Set $\ell \leftarrow \ell - 1$. If $\ell \ge 3$, set $v_\ell \leftarrow s_{\ell -1} - 1$ and go to step 5. Otherwise return $\mathcal {C}$ and stop.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wiemers, A., Mittmann, J. Improving recent side-channel attacks against the DES key schedule. J Cryptogr Eng 13, 1–17 (2023). https://doi.org/10.1007/s13389-021-00279-2

Download citation

Received: 02 March 2021
Accepted: 26 October 2021
Published: 04 December 2021
Issue Date: April 2023
DOI: https://doi.org/10.1007/s13389-021-00279-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Improving recent side-channel attacks against the DES key schedule

Abstract

Similar content being viewed by others

An Optimal Key Enumeration Algorithm and Its Application to Side-Channel Attacks

Fast and Memory-Efficient Key Recovery in Side-Channel Attacks

A Heuristic Approach to Assist Side Channel Analysis of the Data Encryption Standard

1 Introduction

2 Preliminaries

2.1 Notation

2.2 DES key schedule

2.3 Leakage models

Remark 1

Leakage Model 1

Lemma 1

Proof

3 Hamming weight model

Leakage Model 2

Remark 2

3.1 Determination of the weight and covariance matrix

3.2 Key ranking and key enumeration

Algorithm 1

3.3 Experiments

3.4 Theoretical estimation of the remaining entropy

3.4.1 A lower bound for the remaining entropy

Lemma 2

Proof

Remark 3

3.4.2 A heuristic for the remaining entropy

3.4.3 Distribution of the remaining entropy

3.5 Isolated consideration of the C- and D-register

Remark 4

4 Linear regression model

Leakage Model 3

Remark 5

Remark 6

4.1 Key ranking and key enumeration

Lemma 3

Proof

Remark 7

Algorithm 2

Remark 8

4.2 Estimation of the weight matrix

4.3 Experiments

4.3.1 Real measurements

4.3.2 Simulated measurements

4.4 Theoretical estimation of the remaining entropy

Lemma 4

Proof

Remark 9

4.5 Isolated consideration of the C- and D-register

Remark 10

References

Acknowledgements

Open Access

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: An optimized version of Algorithm 2

Appendix: An optimized version of Algorithm 2

Algorithm 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation