1 Introduction

In [1], a general statistical model for side-channel attack analysis is proposed. Based on this model, one can calculate a success rate of an attack by numerical simulation. This success rate is the most common evaluation metric for measuring the performance of a particular attack scenario. In [5], it is stated:

“Closed-form expressions of success rate are desirable because they provide an explicit functional dependence on relevant parameters such as number of measurements and signal-to-noise ratio which help to understand the effectiveness of a given attack and how one can mitigate its threat by countermeasures. However, such closed-form expressions involve high-dimensional complex statistical functions that are hard to estimate”. In the following, we will derive an analytic formula for the success rate. Simulation experiments confirm that this analytic formula is a good approximation for the success rate for a wide class of leakage functions.

2 Leakage model

We consider the case of a side-channel attack against a typical block cipher. We assume that this block cipher consists of several rounds for encryption and decryption. In each round, the block cipher uses computations of substitution boxes of small size n (e.g., 6 bits for DES or n bits for AES), where the key is mixed with intermediate values.

We further restrict ourselves to the simplest setting:

  • The attacker tries to find an n-bit subkey \(k_c\) of the S-Box computation in the first round of the block cipher. The input of this S-Box computation is of the form \(p_{w}\oplus k_c\) with plaintext inputs \(p_w\).

  • We have m measurements. m is a multiple of \(N={2^n}\), and all plaintext inputs \(p_w\) of this S-Box are equally distributed over these m measurements.

  • The side-channel measurement is a trace of a certain number of points. We assume that the key-dependent leakage occurs in just one point of time which is known to the attacker.

  • The measurement in this point of time is the sum of a deterministic signal and Gaussian noise. It can be written in the form

    $$\begin{aligned} \tilde{b}_{w}=\tilde{h}(p_{w}\oplus k_c)+\tilde{\tau }_{w}.\end{aligned}$$

    \(\tilde{h}\) is a deterministic function that only depends on the input \(p_{w}\oplus k_c\) of the S-Box computation. \(\tilde{h}\) is completely known to the attacker. \(\tilde{\tau }_w\) describes the noise of the measurement. We assume that \(\tilde{\tau }_w\) are realizations of m independent random variables \(\tilde{T}_w\); each one is normally distributed with known expectation and variance. For ease of notation, we associate the sets \(\{0,1\}^n\) and \(\{0,1,\ldots ,N-1\}\) by the 2-adic representation of an integer. We further assume

    $$\begin{aligned} E(\tilde{T}_w)= & {} 0, V(\tilde{T}_w)=\sigma ^2, \\&\sum _{z=0}^{N-1} \tilde{h}(z)=0, \sum _{z=0}^{N-1} \tilde{h}(z)^2=N \tilde{\delta }^2.\end{aligned}$$
  • We can calculate the mean value of all \(\tilde{b}_w\) with the same \(p_w\). In the representation of \(\tilde{b}_w\), this just reduces the variance of \(\tilde{T}_w\). Additionally, by applying a constant factor to each \(\tilde{b}_w\) we can normalize the representation of \(\tilde{b}_w\). To this end, we get a representation in the form

    $$\begin{aligned}b_{w}={h}(w\oplus k_c)+\tau _{w}, w=0,\ldots ,N-1\end{aligned}$$

    with

    $$\begin{aligned} E(T_w){=}0, V(T_w){=}1, \sum _{z=0}^{N-1} h(z)=0, \sum _{z=0}^{N-1} h(z)^2{=}N \delta ^2.\end{aligned}$$

    If we start with the representation of \(\tilde{b}_w\), the normalized representation \(b_w\) has parameter \(\delta \) with

    $$\begin{aligned}\delta ^2=\frac{m}{N} \frac{\tilde{\delta }^2}{\sigma ^2}.\end{aligned}$$

As in [1], we now apply the maximum likelihood attack: We compute the conditional probability density function of the observations \(b_w\) under each hypothesis k. We choose as the correct key that k which maximizes the probability density function. An easy calculation shows that we have to compare the values

$$\begin{aligned} \sum _{w=0}^{N-1} (b_w-h(w\oplus k))^2. \end{aligned}$$

This can further be reduced to the values

$$\begin{aligned} \sum _{w=0}^{N-1} h(w\oplus k) b_w \end{aligned}$$

since \(\sum _{w=0}^{N-1} h(w\oplus k)^2\) does not depend on k. The success rate as defined in [1] is the probability that

$$\begin{aligned} \hbox {Pr}(X_{k_c} > X_k \hbox { for all } k\ne k_c) \end{aligned}$$

where \(X_k\) is the random variable

$$\begin{aligned} X_k=\sum _{w=0}^{N-1} h(w\oplus k) ({h}({w}\oplus k_c)+T_{w}). \end{aligned}$$

This success rate can certainly be computed by numerical simulation of the \(T_w\).

3 An approximation of the success rate

Let A be the N\(\times \)N-matrix with entries \(h(w\oplus k)\). The rows of A are

$$\begin{aligned}a_k=(h(k),\ldots ,h(w\oplus k),\ldots ,h((N-1)\oplus k)). \end{aligned}$$

Let T be the random vector (as column) of length N with entries \(T_w\). Let \(d = A \cdot a_{k_c}^t\) with entries \(d_k\). We define the set R of all vectors of length N with entries \(y_k\) that fulfill

$$\begin{aligned}y_{k} < y_{k_c} +N \delta ^2-d_k \hbox { for all } k\ne k_c.\end{aligned}$$

An easy calculation shows that the success rate can be written as

$$\begin{aligned}\hbox {Pr}(X_{k_c} > X_k \hbox { for all } k\ne k_c)= \hbox {Pr}(A \cdot T \in R).\end{aligned}$$

A is a symmetric matrix, and therefore there exists an orthonormal basis of eigenvectors \(v_0,\ldots , v_{N-1}\) with corresponding eigenvalues \(\lambda _0,\ldots ,\lambda _{N-1}\) of A. T can be written in the basis of eigenvectors in the form

$$\begin{aligned}T= X_0v_0+\cdots + X_{N-1}v_{N-1}\end{aligned}$$

where the \(X_i\) are independent random variables with standard normal distribution. The distribution of \(A \cdot T\) is the image of the standard normal distribution under A. Each vector in the distribution of T is stretched in the direction of the eigenvectors of A with the corresponding eigenvalue as factor.

$$\begin{aligned}A \cdot T= \lambda _0 X_0v_0+\cdots +\lambda _{N-1}X_{N-1}v_{N-1}. \end{aligned}$$

We easily compute

$$\begin{aligned} E(||A \cdot T||^2)= N^2 \delta ^2=N \delta ^2 E(|| T||^2) = \lambda _0^2+\cdots +\lambda _{N-1}^2. \end{aligned}$$

For values like \(n=6\) or \(n=8\), \(N=2^n\) is a relatively large number, so that the typical vector in the distribution of \(A \cdot T\) has square of norm \(N^2 \delta ^2\). As a heuristic approximation for the success rate, we just replace the distribution of \(A\cdot T\) by the normal distribution stretched by the constant factor \(2^{n/2} \delta \):

$$\begin{aligned}\hbox {1st approx. formula: }\qquad \hbox {Pr}(2^{n/2} \delta \cdot T \in R).\end{aligned}$$

In addition, we omit the influence of d and get

$$\begin{aligned}\hbox {2nd approx. formula: }\qquad \hbox {Pr}( T \in \tilde{R})\end{aligned}$$

where \(\tilde{R}\) is the set of all vectors \(t_k\) that fulfill

$$\begin{aligned}t_{k} < t_{k_c} +2^{n/2} \delta \hbox { for all } k\ne k_c. \end{aligned}$$

The last probability can be in fact computed as a two-dimensional integral

$$\begin{aligned}&\hbox {Pr}( T \in \tilde{R})\nonumber \\&\quad =\int _{-\infty }^{\infty } \frac{1}{\sqrt{2\pi }} \exp (-\frac{1}{2} a^2 ) \left[ \int _{-\infty }^{a+2^{n/2}\delta } \frac{1}{\sqrt{2\pi }} \exp (-\frac{1}{2} t^2 ) \mathrm{{d}}t \right] ^{N-1} da. \end{aligned}$$

This expression only depends on \(\delta \), so that it can easily be listed for different \(\delta \) by numerical methods. Figure 1 plots this approximated success rate as computed by MAPLE software for \(n=8\).

Fig. 1
figure 1

Second approximating formula. Success rate as function in \(\delta \) for \(n=8\)

Remarks:

  • If we start with the representation of \(\tilde{b}_w\), the success rate as computed by the second approximating formula only depends on

    $$\begin{aligned} \delta ^2=\frac{m}{N} \frac{\tilde{\delta }^2}{\sigma ^2}. \end{aligned}$$
  • The approximating formulas are only valid if the eigenvalues do not vary too much. As an extreme example, we can consider the case that only one eigenvalue is large, whereas the others can be neglected. Let \(\lambda _0> 0\) be this large eigenvalue. Then, \(A \cdot T\) is roughly distributed as \(\lambda _0 X_0v_0\). \(\hbox {Pr}(A \cdot T \in R)\) can be written as a one-dimensional integral over the random variable \(X_0\).

  • In our approach, we replaced the covariance matrix \(A^2\) by a diagonal matrix. In effect, we treated \(X_k\) as independent random variables.

  • \(\hbox {Pr}( T \in \tilde{R}) \ge \frac{1}{N}\) with equality for \(\delta =0\). The probability of \(\frac{1}{N}\) for \(\delta =0\) follows from the symmetry of the set \(\tilde{R}\).

4 More on the matrix A

The properties of the matrix A are used in the context of dyadic codes; see [2]. In [3], the matrix A is called dyadic matrix. Due to the structure of A, we can compute the eigenvectors of A explicitly: There are N \(\hbox {GF}(2)\)-linear functions L

$$\begin{aligned} L:\hbox {GF}(2)^n \longrightarrow \hbox {GF}(2). \end{aligned}$$

For every L, \(v_L=[(-1)^{L(w)}]_w\) is a vector of length N. For every k, we have

$$\begin{aligned}&\sum _w h(k\oplus w) (-1)^{L(w)}\\&\quad =\sum _y h(y) (-1)^{L(y\oplus k)} = (-1)^{L(k)} \sum _y h(y) (-1)^{L(y)}. \end{aligned}$$

Therefore, \(v_L\) is an eigenvector with eigenvalue \( \sum _y h(y) (-1)\)\(^{L(y)} \). The rank of A is the number of nonzero eigenvalues.

5 Example: h depends on a single bit

Let S be the S-Box of the AES and G a fixed \(\hbox {GF}(2)\)-linear function. We assume that the leakage function h only depends on \(G \circ S\), i.e., after normalization

$$\begin{aligned}h(w\oplus k)=\delta (-1)^{G (S(w \oplus k))}. \end{aligned}$$

The eigenvalues of A are now

$$\begin{aligned} \sum _y h(y) (-1)^{L(y)} =\delta \sum _y (-1)^{G (S(y))} (-1)^{L(y)}. \end{aligned}$$

With other words: The set of eigenvalues is exactly the Walsh spectrum of the Boolean function \(G\circ S\) multiplied by \(\delta \). Each eigenvalue is a measure how good \(G\circ S\) can be approximated by a linear function L. S is the composition of the inversion over \(F=\hbox {GF}(256)\) and an affine function. The Walsh spectrum of any function of the form \(G\circ S\) is well known: It can be expressed by the so-called Kloosterman sums; see [4].

$$\begin{aligned}K(a)= \sum _{y \in F^x} (-1)^{{tr}(y^{-1}+a y)} \end{aligned}$$

where tr(y) denotes the trace of y over F. Any \(\hbox {GF}(2)\)-linear function \(L: F \longrightarrow \hbox {GF}(2)\) can be written as \(L(y)=tr(l y)\) for exactly one \(l \in F\). Therefore, we find \(c \in F\) such that

$$\begin{aligned}G (S(y)) \oplus L(y)=tr(c y^{-1} \oplus l y) \hbox { for all } y \in F^x\end{aligned}$$

or

$$\begin{aligned} G (S(y)) \oplus L(y)=tr(c y^{-1} \oplus l y) \oplus 1 \hbox { for all } y \in F^x.\end{aligned}$$

Note that for \(c\ne 0\)

$$\begin{aligned}\sum _{y \in F^x} (-1)^{{tr}(c y^{-1}+l y)}=K(c \cdot l). \end{aligned}$$

The distribution of the Kloosterman sums can be described by values of certain class numbers (see [4, Prop. 9.1]), which can be interpreted in terms of the Walsh spectrum.

6 Example: h depends on the Hamming weight of the input

In this example, h does not depend on the output of the substitution box, but on the Hamming weight of the input. After normalization, we can write

$$\begin{aligned}&h(w\oplus k)=\delta g (w \oplus k) \hbox { with } g(z)\\&\quad =\frac{1}{\sqrt{n} } ( (-1)^ {z_1} + \cdots + (-1)^ {z_n}), {z=(z_1,\ldots ,z_n)}. \end{aligned}$$

In this case, A has exactly n eigenvectors with eigenvalues \(\ne 0\) and these are given by the n linear projections

$$\begin{aligned}v_j=\frac{1}{2^{n/2}}[(-1)^{z_j}]_{z=(z_1,\ldots ,z_n)}. \end{aligned}$$

The eigenvalues of these n eigenvectors are equal to \(\delta \frac{N}{\sqrt{n}}\). Since we have only a few eigenvalues \(\ne 0\), we cannot expect that the second approximating formula is a good approximation in this case.

However, we can derive an exact formula for the success rate: Since h is a linear function, we have

$$\begin{aligned} \sum _{w=0}^{N-1} h(w\oplus k) b_w = \frac{\delta }{\sqrt{n} } \sum _{j=1}^n (-1)^{k_j} \left( \sum _{w=0}^{N-1} (-1)^{w_j} b_w. \right) \end{aligned}$$

The sums in brackets do not depend on k, so that

$$\begin{aligned}\max _k \sum _{w=0}^{N-1} h(w\oplus k) b_w = \frac{\delta }{\sqrt{n} } \sum _{j=1}^n \mid \sum _{w=0}^{N-1} (-1)^{w_j} b_w \mid . \end{aligned}$$

The maximum likelihood attack is therefore successful exactly in the event that

$$\begin{aligned}(-1)^{k_{c,j}} \left( \sum _{w=0}^{N-1} (-1)^{w_j} b_w \right) \ge 0 \hbox { for all } j=1,\ldots ,n. \end{aligned}$$

With other words: The success rate is the probability that the random variable \(Y_j\) fulfills

$$\begin{aligned} Y_j= (-1)^{k_{c,j}} \left( \sum _{w=0}^{N-1} (-1)^{w_j} ({h}({w}\oplus k_c)+T_{w}) \right) \ge 0 \hbox { for all } j=1,\ldots ,n. \end{aligned}$$

\(Y_j\) is normally distributed with an expectation value \(\frac{\delta N }{\sqrt{n}}\) and variance N. Since the covariance between \(Y_j\) and \(Y_{\tilde{j}}\) is 0 for \(j \ne \tilde{j}\), the success rate is given by the formula

$$\begin{aligned} \hbox {Pr}( Y_j \ge 0, j{=}1,\ldots ,n) {=} \left[ \int _{{-}\frac{\delta 2^{n/2}}{\sqrt{n}}} ^{\infty } \frac{1}{\sqrt{2\pi }} \exp (-\frac{1}{2} t^2 ) dt \right] ^{n}. \end{aligned}$$

7 Simulation results

We computed the success rate for different n, h and \(\delta \) by numerical simulation of the \(T_w\). Table 1 compares the success rates for \(n=8\), and Table 2 the same for \(n=6\). In both tables, f is chosen as a random function \(\hbox {GF}(2)^n \longrightarrow \hbox {GF(2)}\), but uniformly distributed. P is chosen as a random permutation on \(\hbox {GF}(2)^n\). g is the function from paragraph 6. We repeated the simulation 1000 times with different f and P, so that a mean is given in both tables.

Table 1 Comparison of success rates, \(n=8\)
Table 2 Comparison of success rates, \(n=6\)

We note that the second approximating formula and the Hamming weight formula from paragraph 6 give different values for identical \(\delta \), but both formulas match the numerical values very well. In all experiments, the numerical values in each of the 1000 repetitions were very close to the mean given in the tables. For \(n=8\) (Table  1), the empirical standard deviation was less than 0.004. For \(n=6\) (Table  2), the empirical standard deviation was less than 0.02.

Table 3 Success rates in the case of masking (\(m=10\cdot N^2\), input dependency, \(n=8\))
Table 4 Success rates in the case of masking, (\(m= N^2\), input dependency),\(n=8\)
Table 5 Success rates in the case of masking, (\(m= 10 \cdot N^2\), output dependency), \(n=8\)
Table 6 Success rates in the case of masking, (\(m= N^2\), output dependency),\(n=8\)

8 Success rate in the case of masking

Similar to [5], we can apply the second approximating formula to the case of masking. For a concrete example, we adapt our leakage model in the following way:

  • We have m measurements. m is a multiple of N, and all plaintext inputs \(p_w\) of this S-Box are equally distributed over these m measurements.

  • There are exactly two points of time when meaningful leakages occur. Both points of time are known to the attacker. One leakage is mask-dependent; the other one is key-dependent, but on the input of an S-Box computation.

  • The measurements can be written in the form

    $$\begin{aligned} \tilde{b}'_{w}= & {} \mu (p_{w}\oplus k_c \oplus m_w)+\tilde{\tau }'_{w}\\ \tilde{b}''_{w}= & {} \mu (m_w)+\tilde{\tau }''_{w}. \end{aligned}$$

    \(\mu \) is a centralized form of the Hamming weight, i.e.,

    $$\begin{aligned}\mu (z)=(-1)^{z_1}+ \cdots +(-1)^{z_n}. \end{aligned}$$

    \(\tilde{\tau }'_w\) and \(\tilde{\tau }''_{w}\) describe the noise of the measurement. We assume that \(\tilde{\tau }'_w\) and \(\tilde{\tau }''_{w}\) are realizations of 2m independent random variables \(\tilde{T}'_w\), \(\tilde{T}''_w\); each one is normally distributed with expectation 0 and variance \(\sigma ^2\). \(m_w\) describes the mask. \(m_w\) are the realizations of m independent uniformly distributed random variables \(M_w\) on \(\hbox {GF}(N)\).

We set

$$\begin{aligned}c_\nu = \frac{N}{m}\sum _{w, p_w= \nu } \tilde{b}'_{w}\tilde{b}''_{w}. \end{aligned}$$

The sum is taken over \(\frac{m}{N}\) realizations of independent random variable. For any fixed mask \(m_w\), we compute

$$\begin{aligned} E( ( \mu (p_{w}\oplus k_c \oplus m_w) +\tilde{T}'_w) (\mu (m_w) +\tilde{T}''_w) )\nonumber \\ =\mu (p_{w}\oplus k_c \oplus m_w)\mu (m_w) \end{aligned}$$

and

$$\begin{aligned}&V( ( \mu (p_{w}\oplus k_c \oplus m_w) +\tilde{T}'_w) (\mu (m_w) +\tilde{T}''_w) ) \\&\quad = E( (\mu (p_{w}\oplus k_c \oplus m_w) +\tilde{T}'_w)^2 (\mu (m_w) +\tilde{T}''_w)^2 )\\&\qquad \quad - \mu (p_{w}\oplus k_c \oplus m_w)^2\mu (m_w)^2 \\&\quad = \sigma ^2(\mu (p_{w}\oplus k_c \oplus m_w)^2+\mu (m_w)^2)+\sigma ^4. \end{aligned}$$

If \(\frac{m}{N}\) is not too small, we approximate \(c_\nu \) as realizations of N independent normally distributed random variables, each with expectation

$$\begin{aligned}&\frac{N}{m}\sum _{w, p_w= \nu } \mu (p_{w}\oplus k_c \oplus m_w)\mu (m_w) \\&\quad =\frac{N}{m}\sum _{w, p_w= \nu } \mu (\nu \oplus k_c \oplus m_w)\mu (m_w) \end{aligned}$$

and variance

$$\begin{aligned}\left( \frac{N}{m} \right) ^2\sum _{w, p_w= \nu } \left( \sigma ^2(\mu (\nu \oplus k_c \oplus m_w)^2+\mu (m_w)^2)+\sigma ^4\right) . \end{aligned}$$

Again if \(\frac{m}{N}\) is not too small, we approximate these sums by the expectation over the random variables \(M_w\). An easy calculation shows

$$\begin{aligned} \frac{N}{m}\sum _{w, p_w= \nu } \mu (p_{w}\oplus k_c \oplus m_w)\mu (m_w) \approx \mu (\nu \oplus k_c) \end{aligned}$$

and

$$\begin{aligned}&\left( \frac{N}{m} \right) ^2\sum _{w, p_w= \nu } \left( {\sigma }^2(\mu (\nu \oplus k_c \oplus m_w)^2+\mu (m_w)^{2})+\sigma ^4\right) \nonumber \\&\quad \approx \frac{N}{m} (2 n \sigma ^2+\sigma ^4). \end{aligned}$$

Since \(\sum _{z}\mu (z)^2=n\cdot N\), we can apply the leakage model of paragraph 2 with

$$\begin{aligned}\delta ^2=\frac{n m}{N(2 n \sigma ^2+\sigma ^4)}. \end{aligned}$$

Given the measurements \(\tilde{b}'_w, \tilde{b}''_w\), we directly compare the values

$$\begin{aligned}\sum _{\nu } \mu (\nu \oplus k) c_{\nu } \end{aligned}$$

for different k and decide for the k with the largest value. For large m, we can expect that the success rate of this ad hoc attack only depends on \(\delta ^2= \frac{n m}{N(2 n \sigma ^2+\sigma ^4)}\).

Table 3 gives the success rates of this attack computed by numerical simulation and \(n=8\). We compare these success rates with the values for the example from paragraph 6 (\(h=\delta g\)). Since the numerical simulations are rather slow, we repeated the simulation only for a few instances. However, in all instances the values matched very well.

Table 4 gives similar data, but for \(m=N^2\).

Remark:

The leakage in \(\tilde{b}'_{w}\) depends on the input of an S-Box computation. We can certainly consider the case that the leakage depends on the output of an S-Box computation, i.e.,

$$\begin{aligned} \tilde{b}'_{w} = \mu (S(p_{w}\oplus k_c) \oplus m_w)+\tilde{\tau }'_{w}.\end{aligned}$$

The computation is completely analog, but we expect that the second approximating formula applies. Tables 4 and  5 compare the numerical values for the success rate with the second approximating formula. Again, we computed only a few instances, but in all instances the values matched very well (Table 6)