A remark on a success rate model for side-channel attack analysis

The success rate is the most common evaluation metric for measuring the performance of a particular side-channel attack scenario. We improve on an analytic formula for the success rate.


Introduction
In [1], a general statistical model for side-channel attack analysis is proposed. Based on this model, one can calculate a success rate of an attack by numerical simulation. This success rate is the most common evaluation metric for measuring the performance of a particular attack scenario. In [5], it is stated: "Closed-form expressions of success rate are desirable because they provide an explicit functional dependence on relevant parameters such as number of measurements and signal-to-noise ratio which help to understand the effectiveness of a given attack and how one can mitigate its threat by countermeasures. However, such closed-form expressions involve high-dimensional complex statistical functions that are hard to estimate". In the following, we will derive an analytic formula for the success rate. Simulation experiments confirm that this analytic formula is a good approximation for the success rate for a wide class of leakage functions.

Leakage model
We consider the case of a side-channel attack against a typical block cipher. We assume that this block cipher consists of several rounds for encryption and decryption. In each round, the block cipher uses computations of substitution boxes of small size n (e.g., 6 bits for DES or n bits for AES), where the key is mixed with intermediate values.
We further restrict ourselves to the simplest setting: B Andreas Wiemers andreas.wiemers@bsi.bund.de 1 BSI, Bonn, Germany -The attacker tries to find an n-bit subkey k c of the S-Box computation in the first round of the block cipher. The input of this S-Box computation is of the form p w ⊕ k c with plaintext inputs p w . -We have m measurements. m is a multiple of N = 2 n , and all plaintext inputs p w of this S-Box are equally distributed over these m measurements. -The side-channel measurement is a trace of a certain number of points. We assume that the key-dependent leakage occurs in just one point of time which is known to the attacker. -The measurement in this point of time is the sum of a deterministic signal and Gaussian noise. It can be written in the form h is a deterministic function that only depends on the input p w ⊕ k c of the S-Box computation.h is completely known to the attacker.τ w describes the noise of the measurement. We assume thatτ w are realizations of m independent random variablesT w ; each one is normally distributed with known expectation and variance. For ease of notation, we associate the sets {0, 1} n and {0, 1, . . . , N − 1} by the 2-adic representation of an integer. We further assume -We can calculate the mean value of allb w with the same p w . In the representation ofb w , this just reduces the vari-ance ofT w . Additionally, by applying a constant factor to eachb w we can normalize the representation ofb w . To this end, we get a representation in the form If we start with the representation ofb w , the normalized representation b w has parameter δ with As in [1], we now apply the maximum likelihood attack: We compute the conditional probability density function of the observations b w under each hypothesis k. We choose as the correct key that k which maximizes the probability density function. An easy calculation shows that we have to compare the values This can further be reduced to the values 2 does not depend on k. The success rate as defined in [1] is the probability that where X k is the random variable This success rate can certainly be computed by numerical simulation of the T w .

An approximation of the success rate
Let A be the N×N-matrix with entries h(w ⊕ k). The rows of A are Let T be the random vector (as column) of length N with entries T w . Let d = A · a t k c with entries d k . We define the set R of all vectors of length N with entries y k that fulfill An easy calculation shows that the success rate can be written as A is a symmetric matrix, and therefore there exists an orthonormal basis of eigenvectors v 0 , . . . , v N −1 with corresponding eigenvalues λ 0 , . . . , λ N −1 of A. T can be written in the basis of eigenvectors in the form where the X i are independent random variables with standard normal distribution. The distribution of A · T is the image of the standard normal distribution under A. Each vector in the distribution of T is stretched in the direction of the eigenvectors of A with the corresponding eigenvalue as factor.
We easily compute For values like n = 6 or n = 8, N = 2 n is a relatively large number, so that the typical vector in the distribution of A · T has square of norm N 2 δ 2 . As a heuristic approximation for the success rate, we just replace the distribution of A · T by the normal distribution stretched by the constant factor 2 n/2 δ: 1st approx. formula: In addition, we omit the influence of d and get 2nd approx. formula: whereR is the set of all vectors t k that fulfill The last probability can be in fact computed as a twodimensional integral This expression only depends on δ, so that it can easily be listed for different δ by numerical methods. Figure 1 plots this approximated success rate as computed by MAPLE software for n = 8. Remarks: -If we start with the representation ofb w , the success rate as computed by the second approximating formula only depends on -The approximating formulas are only valid if the eigenvalues do not vary too much. As an extreme example, we can consider the case that only one eigenvalue is large, whereas the others can be neglected. Let λ 0 > 0 be this large eigenvalue. Then, A · T is roughly distributed as λ 0 X 0 v 0 . Pr(A · T ∈ R) can be written as a one-dimensional integral over the random variable X 0 . -In our approach, we replaced the covariance matrix A 2 by a diagonal matrix. In effect, we treated X k as independent random variables. -Pr(T ∈R) ≥ 1 N with equality for δ = 0. The probability of 1 N for δ = 0 follows from the symmetry of the setR.

More on the matrix A
The properties of the matrix A are used in the context of dyadic codes; see [2]. In [3], the matrix A is called dyadic matrix. Due to the structure of A, we can compute the eigenvectors of A explicitly: There are N GF (2) Therefore, v L is an eigenvector with eigenvalue y h(y)(−1) L(y) . The rank of A is the number of nonzero eigenvalues.

Example: h depends on a single bit
Let S be the S-Box of the AES and G a fixed GF(2)-linear function. We assume that the leakage function h only depends on G • S, i.e., after normalization

The eigenvalues of
With other words: The set of eigenvalues is exactly the Walsh spectrum of the Boolean function G • S multiplied by δ. Each eigenvalue is a measure how good G • S can be approximated by a linear function L. S is the composition of the inversion over F = GF(256) and an affine function. The Walsh spectrum of any function of the form G • S is well known: It can be expressed by the so-called Kloosterman sums; see [4].
where tr(y) denotes the trace of y over F. Any GF(2)-linear function L : F −→ GF(2) can be written as L(y) = tr(ly) for exactly one l ∈ F. Therefore, we find c ∈ F such that G(S(y)) ⊕ L(y) = tr(cy −1 ⊕ ly) for all y ∈ F x or G(S(y)) ⊕ L(y) = tr(cy −1 ⊕ ly) ⊕ 1 for all y ∈ F x .
Note that for c = 0 The distribution of the Kloosterman sums can be described by values of certain class numbers (see [4,Prop. 9.1]), which can be interpreted in terms of the Walsh spectrum.
In this case, A has exactly n eigenvectors with eigenvalues = 0 and these are given by the n linear projections (z 1 ,...,z n ) .
The eigenvalues of these n eigenvectors are equal to δ N √ n .
Since we have only a few eigenvalues = 0, we cannot expect that the second approximating formula is a good approximation in this case. However, we can derive an exact formula for the success rate: Since h is a linear function, we have The sums in brackets do not depend on k, so that The maximum likelihood attack is therefore successful exactly in the event that With other words: The success rate is the probability that the random variable Y j fulfills Y j is normally distributed with an expectation value δ N √ n and variance N . Since the covariance between Y j and Y˜j is 0 for j =j, the success rate is given by the formula

Simulation results
We computed the success rate for different n, h and δ by numerical simulation of the T w . Table 1 compares the success rates for n = 8, and Table 2 the same for n = 6. In both tables, f is chosen as a random function GF(2) n −→ GF(2), but uniformly distributed. P is chosen as a random permutation on GF(2) n . g is the function from paragraph 6. We repeated the simulation 1000 times with different f and P, so that a mean is given in both tables. We note that the second approximating formula and the Hamming weight formula from paragraph 6 give different values for identical δ, but both formulas match the numerical values very well. In all experiments, the numerical values in each of the 1000 repetitions were very close to the mean given in the tables. For n = 8 (Table 1), the empirical standard deviation was less than 0.004. For n = 6 (Table 2), the empirical standard deviation was less than 0.02.
μ is a centralized form of the Hamming weight, i.e., μ(z) = (−1) z 1 + · · · + (−1) z n . τ w andτ w describe the noise of the measurement. We assume thatτ w andτ w are realizations of 2m independent random variablesT w ,T w ; each one is normally distributed with expectation 0 and variance σ 2 . m w describes the mask. m w are the realizations of m independent uniformly distributed random variables M w on GF(N ).
We set The sum is taken over m N realizations of independent random variable. For any fixed mask m w , we compute and If m N is not too small, we approximate c ν as realizations of N independent normally distributed random variables, each with expectation Again if m N is not too small, we approximate these sums by the expectation over the random variables M w . An easy calculation shows Since z μ(z) 2 = n · N , we can apply the leakage model of paragraph 2 with δ 2 = nm N (2nσ 2 + σ 4 ) .
Given the measurementsb w ,b w , we directly compare the values ν μ(ν ⊕ k)c ν for different k and decide for the k with the largest value. For large m, we can expect that the success rate of this ad hoc attack only depends on δ 2 = nm N (2nσ 2 +σ 4 ) . Table 3 gives the success rates of this attack computed by numerical simulation and n = 8. We compare these success rates with the values for the example from paragraph 6 (h = δg). Since the numerical simulations are rather slow, we repeated the simulation only for a few instances. However, in all instances the values matched very well. Table 4 gives similar data, but for m = N 2 . Remark: The leakage inb w depends on the input of an S-Box computation. We can certainly consider the case that the leakage depends on the output of an S-Box computation, i.e., The computation is completely analog, but we expect that the second approximating formula applies. Tables 4 and 5 compare the numerical values for the success rate with the second approximating formula. Again, we computed only a few instances, but in all instances the values matched very well (Table 6)