Keywords

1 Introduction

A large effort is currently underway to replace standardized public key cryptosystems, which are quantum-insecure, with newly developed “post-quantum” cryptosystems, conjectured to be secure against quantum attack. Lattice-based cryptography has been widely recognized as a foremost candidate for practical, post-quantum security and accordingly, a large effort has been made to develop and analyze lattice-based cryptosystems. The ongoing standardization process and anticipated deployment of lattice-based cryptography raises an important question: How resilient are lattices to side-channel attacks or other forms of side information? While there are numerous works addressing this question for specific cryptosystems (See  [2, 9, 17, 18, 32, 33] for side channel attacks targeting lattice-based NIST candidates), these works use rather ad-hoc methods to reconstruct the secret key, requiring new techniques and algorithms to be developed for each setting. For example, the work of  [9] uses brute-force methods for a portion of the attack, while  [7] exploits linear regression techniques. Moreover, ad-hoc methods do not allow (1) to take advantage of decades worth of research and (2) optimization of standard lattice attacks. Second, most of the side-channel attacks from prior work consider substantial amounts of information leakage and show that it leads to feasible recovery of the entire key, whereas one may be interested in more precise tradeoffs in terms of information leakage versus concrete security of the scheme. The above motivates the focus of this work: Can one integrate side information into a standard lattice attack, and if so, by how much does the information reduce the cost of this attack? Given that side-channel resistance is the next step toward the technological readiness of lattice-based cryptography, and that we expect numerous works in this growing area, we believe that a general framework and a prediction software are in order.

Fig. 1.
figure 1

Primal attack without hints (prior art).

Contributions. First, we propose a framework that generalizes the so-called primal lattice reduction attack, and allows the progressive integration of “hints” (i.e. side information that takes one of several forms) before running the final lattice reduction step. This contribution is summarized in Figs. 1 and 2 and developed in Sect. 3.

Second, we implement a Sage 9.0 toolkit to actually mount such attacks with hints when computationally feasible, and to predict their performance on larger instances. Our predictions are validated by extensive experiments. Our tool and these experiments are described in Sect. 5. Our toolkit is open-source, available at: https://github.com/lducas/leaky-LWE-Estimator.

Third, we demonstrate the usefulness of our framework and tool via three example applications. Our main example (Sect. 6.1) revisits the side channel information obtained from the first side-channel attack of  [9] against Frodo. In that article, it was concluded that a divide-and-conquer side-channel template attack would not lead to a meaningful attack using standard combinatorial search for reconstruction of the secret. Our technique allows to integrate this side-channel information into lattice attacks, and to predict the exact security drop. For example, the CCS2 parameter set very conservatively aims for 128-bits of post-quantum security (or 448 “bikz” as defined in Sect. 3.4); but after the leakage of  [9] we predict that its security drops to 29 “bikz”, i.e. that it can be broken with BKZ-29, a computation that should be more than feasible, but would require a dedicated re-implementation of our framework.

Interestingly, we note that our framework is not only useful in the side-channel scenario; we are for example also able to model decryption failures as hints fitting our framework. This allows us to reproduce some predictions from  [14]. This is discussed in Sect. 6.2.

Perhaps more surprisingly, we also find a novel improvement to attack a few schemes (LAC  [25], Round5  [16], NTRU  [35]) without any side-channel or oracle queries. Indeed, such schemes use ternary distribution for secrets, with a prescribed numbers of 1 and \(-1\): this hint fits our framework, and lead to a (very) minor improvement, discussed in Sect. 6.3.

Lastly, our framework also encompasses and streamlines existing tweaks of the primal attack: the choice of ignoring certain LWE equations to optimize the volume-dimension trade-off, as well as the re-centering  [30] and isotropization  [12, 19] accounting for potential a-priori distortions of the secret. It also implicitly solves the question of the optimal choice of the coefficient for Kannan’s Embedding from the Bounded Distance Decoding problem (BDD) to the unique Shortest Vector Problem (uSVP)  [21] (See Remark 22).

As a side contribution, we also propose in the full version of our paper [13] a refined method to estimate the required blocksize to solve an LWE/BDD/uSVP instance. This refinement was motivated by the inaccuracy of the standard method from the literature  [3, 4] in experimentally reachable blocksizes, which was making the validation of our contribution difficult. While experimentally much more accurate, this new methodology certainly deserves further scrutiny.

Fig. 2.
figure 2

The primal attack with hints (our work).

Technical overview. Our work is based on a generalization of the Bounded Distance Decoding problem (BDD) to a Distorted version (DBDD), which allows to account for the potentially non-spherical covariance of the secret vector to be found.

Each hint will affect the lattice itself, the mean and/or the covariance parameter of the DBDD instance, making the problem easier (see Fig. 2). At last, we make the distribution spherical again by applying a well-chosen linear transformation, reverting to a spherical BDD instance before running the attack. Thanks to the hints, this new instance will be easier than the initial one. Let us assume that \(\mathbf{v}\), l, k and \(\sigma \) are parameters known by the attacker. Our framework can handle four types of hints on the secret \(\mathbf{s}\) or on the lattice \(\varLambda \).

figure a

While the first three hints are clear wins for the performance of lattice attacks, the last one is a trade-off between the dimension and the volume of the lattice. This last type of hint is in fact meant to generalize the standard trick consisting of ‘ignoring’ certain LWE equations; ignoring such an equation can be interpreted geometrically as such a projection orthogonally to a so-called q-vector.

All the transformations of the lattice above can be computed in polynomial time. However, computing with general distribution in large dimension is not possible; we restrict our study to the case of Gaussian distributions of arbitrary covariance, for which such computations are also poly-time.

Some of these transformations remain quite expensive, in particular because they involve rational numbers with very large denominators, and it remains rather impractical to run them on cryptographic-grade instances. Fortunately, up to a necessary hypothesis of primitivity of the vector \(\mathbf{v}\) (with respect to either \(\varLambda \) or its dual depending on the type of hint), we can also predict the effect of each hint on the lattice parameters, and therefore run faster predictions of the attack cost.

From Leaks to Hints. At first, it may not be so clear that the types of hints above are so useful in realistic applications, in particular since they need to be linear on the secret. Of course our framework can handle rather trivial hints such as the perfect leak of a secret coefficient \(\mathbf{s}_i = l\). Slightly less trivial is the case where the only the low-order bits leaks, a hint of the form \(\mathbf{s}_i = l \bmod 2\).

We note that most of the computations done during an LWE decryption are linear: leaking any intermediate register during a matrix vector product leads to a hint of the same form (possibly \(\bmod ~q\)). Similarly, the leak of a NTT coefficient of a secret in a Ring/Module variant can also be viewed as such.

Admittedly, such ideal leaks of a full register are not the typical scenario and leaks are typically not linear on the content of the register. However, such non-linearities can be handled by approximate hints. For instance, let \({\mathbf {s}}\textstyle _0\) be a secret coefficient (represented by a signed 16-bits integer), whose a priori distribution is supported by \(\{-5, \dots , 5\}\). Consider the case where we learn the Hamming weight of \({\mathbf {s}}\textstyle _0\), say \(H({\mathbf {s}}\textstyle _0) = 2\). Then, we can narrow down the possibilities to \({\mathbf {s}}\textstyle _0 \in \{3, 5\}\). This leads to two hints:

  • a modular hint: \({\mathbf {s}}\textstyle _0 = 1 \bmod 2\),

  • an approximate hint: \({\mathbf {s}}\textstyle _0 = 4 + \epsilon _1\), where \(\epsilon _1\) has variance 1.

While closer to a realistic scenario, the above example remains rather simplified. A detailed example of how realistic leaks can be integrated as hint will be given in Sect. 6.1, based on the leakage data from  [9].

2 Preliminaries

2.1 Linear Algebra

We use bold lower case letters to denote vectors, and bold upper case letters to denote matrices. We use row notations for vectors, and start indexing from 0. Let \(\mathbf{I}_d\) denote the d-dimensional identity matrix. Let \(\langle \cdot , \cdot \rangle \) denote the inner product of two vectors of the same size. Let us introduce the row span of a matrix (denoted \({\text {Span}}(\cdot )\)) as the subspace generated by all \(\mathbb {R}\)-linear combinations of the rows of its input.

Definition 1

(Positive Semidefinite). A \(n\times n\) symmetric real matrix \(\mathbf{M}\) is positive semidefinite if scalar \(\mathbf {xMx}^T \ge 0\) for all \(\mathbf{x} \in \mathbb {R}^n\); if so we write \(\mathbf{M} \ge 0\). Given two \(n\times n\) real matrix \(\mathbf{A}\) and \(\mathbf{B}\), we note \(\mathbf{A} \ge \mathbf{B}\) if \(\mathbf{A} - \mathbf{B}\) is positive semidefinite.

Definition 2

A matrix \(\mathbf{M}\) is a square root of \(\mathbf {\Sigma } \), denoted \(\sqrt{\mathbf {\Sigma }}\), if

$$\begin{aligned} \mathbf{M}^T \cdot \mathbf{M} = \mathbf {\Sigma }, \end{aligned}$$

Our techniques involve keeping track of the covariance matrix \(\mathbf {\Sigma } \) of the secret and error vectors as hints are progressively integrated. The covariance matrix may become singular during this process and will not have an inverse. Therefore, in the following we introduce some degenerate notions for the inverse and the determinant of a square matrix. Essentially, we restrict these notions to the row span of their input. For \(\mathbf{X}\in \mathbb {R}^{d \times k}\) (with any \(d,k \in \mathbb {N}\)), we will denote \(\mathbf {\Pi }_{\mathbf{X}}\) the orthogonal projection matrix onto \({\text {Span}}(\mathbf{X})\). More formally, let \(\mathbf{Y}\) be a maximal set of independent row-vectors of \(\mathbf{X}\); the orthogonal projection matrix is given by \(\mathbf {\Pi }_{\mathbf{X}} = \mathbf{Y}^T \cdot (\mathbf{Y} \cdot \mathbf{Y}^T)^{-1} \cdot \mathbf{Y} \). Its complement (the projection orthogonally to \({\text {Span}}(\mathbf{X})\)) is denoted \(\mathbf {\Pi }^\bot _{\mathbf{X}} := \mathbf{I}_d - \mathbf {\Pi }_{\mathbf{X}}\). We naturally extend the notation \(\mathbf {\Pi }_{F}\) and \(\mathbf {\Pi }^\bot _{F}\) to subspaces \(F \subset \mathbb {R}^d\). By definition, the projection matrices satisfy \(\mathbf {\Pi }_{F}^2 = \mathbf {\Pi }_{F}\), \(\mathbf {\Pi }_{F}^T= \mathbf {\Pi }_{F}\) and \(\mathbf {\Pi }_{F}\cdot \mathbf {\Pi }^\bot _{F} = \mathbf {\Pi }^\bot _{F}\cdot \mathbf {\Pi }_{F}= \mathbf{0}\).

Definition 3

(Restricted inverse and determinant). Let \(\mathbf {\Sigma } \) be a symmetric matrix. We define a restricted inverse denoted \(\mathbf {\Sigma } ^{\sim }\) as

$$\begin{aligned} \mathbf {\Sigma } ^{\sim } := (\mathbf {\Sigma } + \mathbf {\Pi }^\bot _{\mathbf {\Sigma }})^{-1} - \mathbf {\Pi }^\bot _{\mathbf {\Sigma }}. \end{aligned}$$

It satisfies \({\text {Span}}(\mathbf {\Sigma } ^{\sim }) = {\text {Span}}(\mathbf {\Sigma })\) and \(\mathbf {\Sigma } \cdot \mathbf {\Sigma } ^{\sim } = \mathbf {\Pi }_{\mathbf {\Sigma }}\).

We also denote \({\text {rdet}}(\mathbf {\Sigma })\) as the restricted determinant defined as follows.

$$\begin{aligned} {\text {rdet}}(\mathbf {\Sigma }) := \det (\mathbf {\Sigma } + \mathbf {\Pi }^\bot _{\mathbf {\Sigma }}). \end{aligned}$$

The idea behind Definition 3 is to provide an (artificial) invertibility property to the input \(\mathbf {\Sigma } \) by adding the missing orthogonal part and to remove it afterwards. For example, if \(\mathbf {\Sigma } = \begin{bmatrix} \mathbf{A} &{} 0\\ 0 &{} 0 \end{bmatrix}\) where \(\mathbf{A}\) is invertible,

2.2 Statistics

Random variables, i.e. variables whose values depend on outcomes of a random phenomenon, are denoted in lowercase calligraphic letters e.g. . Random vectors are denoted in uppercase calligraphic letters e.g. .

Before hints are integrated, we will assume that the secret and error vectors follow a multidimensional normal (Gaussian) distribution. Hints will typically correspond to learning a (noisy, modular or perfect) linear equation on the secret. We must then consider the altered distribution on the secret, conditioned on this information. Fortunately, this will also be a multidimensional normal distribution with an altered covariance and mean. In the following, we present the precise formulae for the covariance and mean of these conditional distributions.

Definition 4

(Multidimensional normal distribution). Let \(d\in \mathbb {Z}\), for \( \varvec{\mu } \in \mathbb {Z}^d\) and \(\mathbf {\Sigma } \) being a symmetric matrix of dimension \(d \times d\), we denote by \(D_{\mathbf {\Sigma }, \varvec{\mu }}^{d}\) the multidimensional normal distribution supported by \( \varvec{\mu } + {\text {Span}}(\mathbf {\Sigma })\) by the following

$$\begin{aligned} \mathbf{x} \mapsto \frac{1}{\sqrt{(2\pi )^{{\text {rank}}(\mathbf {\Sigma })}\cdot {\text {rdet}}(\mathbf {\Sigma })}}\exp \left( -\frac{1}{2}(\mathbf{x} - \varvec{\mu })\cdot \mathbf {\Sigma } ^{\sim } \cdot (\mathbf{x} - \varvec{\mu })^T \right) . \end{aligned}$$

The following states how a normal distribution is altered under linear transformation.

Lemma 5

Suppose has a \(D_{\mathbf {\Sigma }, \varvec{\mu }}^{d}\) distribution. Let \(\mathbf{A}\) be a \(n \times d\) matrix. Then has a \(D_{\mathbf { A\mathbf {\Sigma } A}^T, \varvec{\mu } \mathbf{A}^T}^{n}\) distribution.

Lemma 6 shows the altered distribution of a normal random variable conditioned on its noisy linear transformation value, following from  [24, Equations (6) and (7)].

Lemma 6

(Conditional distribution  from  [24]). Suppose that has a \(D_{ \mathbf {\Sigma }, \varvec{\mu }}^{d}\) distribution, and has a distribution. Let us fix \(\mathbf{A}\) as a \(n \times d\) matrix and \(\mathbf {z} \in \mathbb {Z}^n\). The conditional distribution of is \(D_{\mathbf {\Sigma } ', \varvec{\mu } '}^{d}\), where

Corollary 7

(Conditional distribution  ). Suppose that has a \(D_{\mathbf {\Sigma }, \varvec{\mu }}^{d}\) distribution and has a distribution. Let us fix \(\mathbf{v} \in \mathbb {R}^d\) as a nonzero vector and \(z \in \mathbb {Z}\). We define the following scalars:

If \(\sigma _{2} \ne 0\), the conditional distribution of is \(D_{\mathbf {\Sigma } ', \varvec{\mu } '}^{d}\), where

$$\begin{aligned} \varvec{\mu } ' = \varvec{\mu } + \frac{\left( z - \mu _2\right) }{\sigma _{2}}\mathbf{v}\mathbf {\mathbf {\Sigma }}, \qquad \mathbf {\Sigma } ' = \mathbf {\Sigma }- \frac{\mathbf {\Sigma } \mathbf{v}^T \mathbf{v} \mathbf {\Sigma }}{\sigma _{2}}. \end{aligned}$$
(1)

If \(\sigma _{2} = 0\), the conditional distribution of is \(D_{\mathbf {\Sigma }, \varvec{\mu }}^{d}\).

Remark 8

We note that Corollary 7 is also useful to describe for by letting .

2.3 Lattices

A lattice, denoted as \(\varLambda \), is a discrete additive subgroup of \(\mathbb {R}^m\), which is generated as the set of all linear integer combinations of \(n \ (m \ge n)\) linearly independent basis vectors \(\lbrace {\mathbf {b}}_j \rbrace \, \subset \, \mathbb {R}^m\), namely,

$$\begin{aligned} \varLambda \, := \, \left\{ {\sum }_j z_j {\mathbf {b}}_j : z_j \in \mathbb {Z} \right\} , \end{aligned}$$

We say that m is the dimension of \(\varLambda \) and n is its rank. A lattice is full rank if \(n=m\). A matrix \(\mathbf{B}\) having the basis vectors as rows is called a basis. The volume of a lattice \(\varLambda \) is defined as \({\text {Vol}}(\varLambda ) := \sqrt{\text {det}(\mathbf {BB}^T)}\). The dual lattice of \(\varLambda \) in \(\mathbb {R}^n\) is defined as follows.

$$\begin{aligned} \varLambda ^* \, := \, \lbrace \mathbf {y} \in {\text {Span}}(\mathbf{B}) \mid \forall \mathbf {x} \in \varLambda , \langle \mathbf {x}, \mathbf {y} \rangle \in \mathbb {Z} \rbrace . \end{aligned}$$

Note that, \((\varLambda ^*)^*\,=\,\varLambda \), and \({\text {Vol}}(\varLambda ^*) = 1/{\text {Vol}}(\varLambda )\).

Lemma 9

( [26, Proposition 1.3.4]). Let \(\varLambda \) be a lattice and let F be a subspace of \(\mathbb {R}^n\). If \(\varLambda \cap F\) is a lattice, then the dual of \(\varLambda \cap F\) is the orthogonal projection onto F of the dual of \(\varLambda \). In other words, each element of \(\varLambda ^*\) is multiplied by the projection matrix \(\mathbf {\Pi }_{F}\):

$$(\varLambda \cap F)^* =\varLambda ^* \cdot \mathbf {\Pi }_{F}.$$

Definition 10

(Primitive vectors). A set of vector \(\mathbf{y}_1, \dots , \mathbf{y}_k \in \varLambda \) is said primitive with respect to \(\varLambda \) if \(\varLambda \,\cap \,{\text {Span}}(\mathbf{y}_1, \dots , \mathbf{y}_k)\) is equal to the lattice generated by \(\mathbf{y}_1, \dots , \mathbf{y}_k\). Equivalently, it is primitive if it can be extended to a basis of \(\varLambda \). If \(k=1\), \(\mathbf{y}_1\), this is equivalent to \(\mathbf{y}_1/i \not \in \varLambda \) for any integer \(i\ge 2\).

To predict the hardness of the lattice reduction on altered instances, we must compute the volume of the final transformed lattice. We devise a highly efficient way to do this, by observing that each time a hint is integrated, we can update the volume of the transformed lattice, given only the volume of the previous lattice and information about the current hint (under mild restrictions on the form of the hint). Lemmas 11 and 12 are proved in the full version of our paper  [13].

Lemma 11

(Volume of a lattice slice). Given a lattice \(\varLambda \) with volume \({\text {Vol}}(\varLambda )\), and a primitive vector \(\mathbf {v}\) with respect to \(\varLambda ^*\). Let \( \mathbf {v}^{\bot }\) denote subspace orthogonal to \(\mathbf {v}\). Then \(\varLambda \cap \mathbf {v}^{\bot }\) is a lattice with volume \({\text {Vol}}(\varLambda \cap \mathbf {v}^{\bot }) = \Vert \mathbf{v}\Vert \cdot {\text {Vol}}(\varLambda ).\)

Lemma 12

(Volume of a sparsified lattice). Let \(\varLambda \) be a lattice, \(\mathbf{v}\in \varLambda ^*\) be a primitive vector of \(\varLambda ^*\), and \(k>0\) be an integer. Let \(\varLambda ' = \{\mathbf{x} \in \varLambda \mid \langle \mathbf{x}, \mathbf{v}\rangle = 0 \bmod k\}\) be a sublattice of \(\varLambda \). Then \({\text {Vol}}(\varLambda ') = k \cdot {\text {Vol}}(\varLambda )\).

Fact 13

(Volume of a projected lattice). Let \(\varLambda \) be a lattice, \(\mathbf{v} \in \varLambda \) be a primitive vector of \(\varLambda \). Let \(\varLambda ' = \varLambda \cdot \mathbf {\Pi }^\bot _{\mathbf{v}}\) be a sublattice of \(\varLambda \). Then \({\text {Vol}}(\varLambda ') = {\text {Vol}}(\varLambda ) / \Vert \mathbf{v}\Vert \). More generally, if \(\mathbf{V}\) is a primitive set of vectors of \(\varLambda \), then \(\varLambda ' = \varLambda \cdot \mathbf {\Pi }^\bot _{\mathbf{V}}\) has volume \({\text {Vol}}(\varLambda ') = {\text {Vol}}(\varLambda ) / \sqrt{\det (\mathbf{V} \mathbf{V}^T)}\).

Fact 14

(Lattice volume under linear transformations). Let \(\varLambda \) be a lattice in \(\mathbb {R}^n\), and \(\mathbf{M} \in \mathbb {R}^{n\times n}\) a matrix such that \(\ker {\mathbf{M}} = {\text {Span}}(\varLambda )^\bot \). Then we have \({\text {Vol}}(\varLambda \cdot \mathbf{M}) = {\text {rdet}}(\mathbf{M}) {\text {Vol}}(\varLambda )\).

3 Distorted Bounded Distance Decoding

3.1 Definition

We first recall the definition of the (search) LWE problem, in its short-secret variant which is the most relevant to practical LWE-based encryption.

Definition 15

(Search \(\varvec{\mathsf{LWE}}\) problem with short secrets). Let nm and q be positive integers, and let \(\chi \) be a distribution over \(\mathbb {Z}\). The search LWE problem (with short secrets) for parameters \((n, m, q, \chi )\) is:

  • Given the pair \(\left( \mathbf{A} \in \mathbb {Z}_q^{m \times n}, \mathbf{b} = \mathbf{z} \mathbf{A}^T + \mathbf{e} \in \mathbb {Z}_q^m\right) \) where:

    1. 1.

      \(\mathbf{A} \in \mathbb {Z}_q^{m \times n}\) is sampled uniformly at random,

    2. 2.

      \(\mathbf{z} \leftarrow \chi ^n\), and \(\mathbf{e} \leftarrow \chi ^m\) are sampled with independent and identically distributed coefficients following the distribution \(\chi \).

  • Find \(\mathbf{z}\).

The primal attack (See for example  [3]) against (search)-LWE proceeds by viewing the LWE instance as an instance of a Bounded Distance Decoding (BDD) problem, converting it to a uSVP instance (via Kannan’s embedding  [21]), and finally applying a lattice reduction algorithm to solve the uSVP instance. The central tool of our framework is a generalization of \(\textsf {BDD} \) that accounts for potential distortion in the distribution of the secret noise vector that is to be recovered.

Definition 16

(Distorted Bounded Distance Decoding problem). Let \(\varLambda \subset \mathbb {R}^d\) be a lattice, \(\mathbf {\Sigma } \in \mathbb {R}^{d\times d}\) be a symmetric matrix and \( \varvec{\mu } \in {\text {Span}}(\varLambda ) \subset \mathbb {R}^d\) such that

$$\begin{aligned} {\text {Span}}(\mathbf {\Sigma })\subsetneq {\text {Span}}(\mathbf {\Sigma } + \varvec{\mu } ^T\cdot \varvec{\mu }) = {\text {Span}}(\varLambda ). \end{aligned}$$
(2)

The Distorted Bounded Distance Decoding problem is the following problem:

  • Given \( \varvec{\mu }, \mathbf {\Sigma } \) and a basis of \(\varLambda \).

  • Find the unique vector \(\mathbf{x} \in \varLambda \cap E( \varvec{\mu }, \mathbf {\Sigma })\)

where \(E( \varvec{\mu }, \mathbf {\Sigma })\) denotes the ellipsoid

$$\begin{aligned} E( \varvec{\mu }, \mathbf {\Sigma }) := \{\mathbf{x} \in \varvec{\mu } + {\text {Span}}(\mathbf {\Sigma }) | (\mathbf{x} - \varvec{\mu })\cdot \mathbf {\Sigma } ^{\sim } \cdot (\mathbf{x} - \varvec{\mu })^T \le {\text {rank}}(\mathbf {\Sigma })\}. \end{aligned}$$

We will refer to the triple \(\mathcal {I}= (\varLambda , \varvec{\mu }, \mathbf {\mathbf {\Sigma }})\) as the instance of the problem.

Intuitively, Definition 16 corresponds to knowing that the secret vector \(\mathbf{x}\) to be recovered follows a distribution of variance \(\mathbf {\Sigma } \) and average \( \varvec{\mu } \). The quantity \((\mathbf{x} - \varvec{\mu })\cdot \mathbf {\Sigma } ^{\sim } \cdot (\mathbf{x} - \varvec{\mu })^T\) can be interpreted as a non-canonical Euclidean squared distance \({\Vert \mathbf{x} - \varvec{\mu } \Vert }_{\mathbf {\Sigma }}^{2}\), and the expected value of such a distance for a Gaussian \(\mathbf{x}\) of variance \(\mathbf {\Sigma } \) and average \( \varvec{\mu } \) is \({\text {rank}}(\mathbf {\Sigma })\). One can argue that, for such a Gaussian, there is a constant probability that \(\Vert \mathbf{x} - \varvec{\mu } \Vert _{\mathbf {\Sigma }}^{2}\) is slightly greater than \({\text {rank}}(\mathbf {\Sigma })\). Since we are interested in the average behavior of our attack, we ignore this benign technical detail. In fact, we will typically interpret \(\textsf {DBDD} \) as the promise that the secret follows a Gaussian distribution of center \( \varvec{\mu } \) and covariance \(\mathbf {\Sigma } \).

The ellipsoid can be seen as an affine transformation (that we call “distortion”) of the centered hyperball of radius \({\text {rank}}(\mathbf {\mathbf {\Sigma }})\). Let us introduce a notation for the hyperball; for any \(d\in \mathbb {N}\)

$$\begin{aligned} B_{d} := \{\mathbf{x} \in \mathbb {R}^{d}\ |\ \Vert \mathbf{x}\Vert _2 \le d\}. \end{aligned}$$
(3)

One can thus write using Definition 2:

$$\begin{aligned} E( \varvec{\mu }, \mathbf {\Sigma }) = B_{{\text {rank}}(\mathbf {\mathbf {\Sigma }})} \cdot \sqrt{\mathbf {\mathbf {\Sigma }}} + \varvec{\mu }. \end{aligned}$$
(4)

From the Span inclusion in Eq. (2), one can deduce that the condition is equivalent to requiring \( \varvec{\mu } \notin {\text {Span}}(\mathbf {\Sigma })\) and \({\text {rank}}(\mathbf {\Sigma } + \varvec{\mu } ^T\cdot \varvec{\mu })={\text {rank}}(\mathbf {\Sigma }) + 1 = {\text {rank}}(\varLambda )\). This technical detail is necessary for embedding it properly into a uSVP instance (See later in Sect. 3.3).

Particular cases of Definition 16. Let us temporarily ignore the condition in Eq. (2) to study some particular cases. As shown in Fig. 3, when \(\mathbf {\mathbf {\Sigma }}=\mathbf{I}_d\), \(\textsf {DBDD} _{\varLambda , \varvec{\mu }, \mathbf{I}_d}\) is \(\textsf {BDD} \) instance. Indeed, the ellipsoid becomes a shifted hyperball \(E( \varvec{\mu }, \mathbf{I}_d)=\{\mathbf{x} \in \varvec{\mu } + \mathbb {R}^{d \times d}\ |\ \Vert \mathbf{x} - \varvec{\mu } \Vert _2 \le d\} = B_{d} + \varvec{\mu } \). If in addition \( \varvec{\mu } =0\), \(\textsf {DBDD} _{\varLambda , \mathbf{0}, \mathbf{I}_d}\) becomes a uSVP instance on \(\varLambda \).

Fig. 3.
figure 3

Graphical intuition of DBDD, BDD and uSVP in dimension two: the problem consists in finding a nonzero element of \(\varLambda \) in the colored zone. The identity hyperball is larger for uSVP to represent the fact that, during the reduction, the uSVP lattice has one dimension more than for BDD.

3.2 Embedding LWE into DBDD

In the typical primal attack framework (Fig. 1), one directly views LWE as a BDD instance of the same dimension. For our purposes, however, it will be useful to apply Kannan’s Embedding at this stage and therefore increase the dimension of the lattice by 1. While it could be delayed to the last stage of our attack, this extra fixed coefficient 1 will be particularly convenient when we integrate hints (see Remark 22 in Sect. 4). It should be noted that no information is lost through this transformation, since the parameters \( \varvec{\mu } \) and \(\mathbf {\Sigma } \) allow us to encode the knowledge that the solution we are looking for has its last coefficient set to 1 and nothing else. In more details, the solution \(\mathbf {{\mathbf {s}}\textstyle } := (\mathbf{e},\mathbf{z})\) of an \(\textsf {LWE} \) instance is extended to

$$\begin{aligned} \bar{\mathbf {{\mathbf {s}}\textstyle }} := (\mathbf{e},\mathbf{z}, 1) \end{aligned}$$
(5)

which is a short vector in the lattice \(\varLambda = \left\{ \left( \mathbf{x },\mathbf{y },w\right) | \mathbf{x }+\mathbf{y }\mathbf{A}^{\text {T}}-\mathbf{b }w = 0\mod q\right\} \).

A basis of this lattice is given by the row vectors of

$$\begin{aligned} \begin{bmatrix} q \mathbf{I}_{m} &{} 0 &{} 0\\ \mathbf{A}^{\text {T}} &{} -\mathbf{I}_n &{}0\\ \mathbf{b} &{}0&{}1\\ \end{bmatrix}. \end{aligned}$$

Denoting \(\mu _\chi \) and \(\sigma _\chi ^2\) the average and variance of the LWE distribution \(\chi \) (See Definition 15), we can convert this LWE instance to a \(\textsf {DBDD} _{\varLambda , \varvec{\mu }, \mathbf {\Sigma }}\) instance with \( \varvec{\mu } = \left[ \mu _\chi \cdots \mu _\chi \; 1\right] \) and \( \mathbf {\Sigma } = \left[ {\begin{matrix} \sigma _\chi ^2 \mathbf{I}_{m+n} &{} 0\\ 0 &{}0 \end{matrix}} \right] \). The lattice \(\varLambda \) is of full rank in \(\mathbb {R}^d\) where \(d := m + n + 1\), and its volume is \(q^m \). Note that the rank of \(\mathbf {\Sigma }\) is only \(d-1\): the ellipsoid has one less dimension than the lattice. It then validates the requirement of Eq. (2).

Remark 17

Typically, Kannan’s embedding from BDD to uSVP leaves the bottom right matrix coefficient as a free parameter, say c, to be chosen optimally. The optimal value is the one maximizing

$$\begin{aligned} \frac{\Vert (\mathbf{z} ; c)\Vert }{\det (\varLambda )^{1/d}} = \frac{(m+n)\sigma _\chi + c}{(c \cdot q^m)^{1/d}}, \end{aligned}$$

namely, \(c = \sigma _\chi \) according to the arithmetic-geometric mean inequality. Some prior works  [3, 5] instead chose \(c=1\). While this is benign since \(\sigma _\chi \) is typically not too far from 1, it remains a sub-optimal choice. Looking ahead, in our DBDD framework, this choice becomes irrelevant thanks to the isotropization step introduced in the next section; we can therefore choose \(c=1\) without worsening the attack.

3.3 Converting DBDD to uSVP

In this Section, we explain how a DBDD instance \((\varLambda , \varvec{\mu }, \mathbf {\Sigma })\) is converted into a uSVP one. Two modifications are necessary. First, we need to homogeneize the problem. Let us show that the ellipsoid in Definition 16 is contained in a larger centered ellipsoid (with one more dimension) as follows:

$$\begin{aligned} E(\mathbf { \varvec{\mu }}, \mathbf {\mathbf {\Sigma }}) \subset E(\mathbf{0}, \mathbf {\mathbf {\Sigma }}+ \varvec{\mu } ^T \cdot \varvec{\mu }). \end{aligned}$$
(6)

Using Eq. (4), one can write

$$\begin{aligned} E( \varvec{\mu }, \mathbf {\Sigma }) = B_{{\text {rank}}(\mathbf {\mathbf {\Sigma }})} \cdot \sqrt{\mathbf {\mathbf {\Sigma }}} + \varvec{\mu } \subset B_{{\text {rank}}(\mathbf {\mathbf {\Sigma }})} \cdot \sqrt{\mathbf {\mathbf {\Sigma }}} \pm \varvec{\mu }, \end{aligned}$$

where \(B_{{\text {rank}}(\mathbf {\mathbf {\Sigma }})}\) is defined in Eq. (3). And, with Eq. (2), one can deduce \({\text {rank}}(\mathbf {\mathbf {\Sigma }}+ \varvec{\mu } ^T \cdot \varvec{\mu }) = {\text {rank}}(\mathbf {\Sigma })+1\), then:

$$\begin{aligned} B_{{\text {rank}}(\mathbf {\mathbf {\Sigma }})} \cdot \sqrt{\mathbf {\mathbf {\Sigma }}} \pm \varvec{\mu } \subset B_{{\text {rank}}(\mathbf {\mathbf {\Sigma }})+1} \cdot \begin{bmatrix} \sqrt{\mathbf {\mathbf {\Sigma }}}\\ \varvec{\mu } \end{bmatrix}. \end{aligned}$$

We apply Definition 2 which confirms the inclusion of Eq. (6):

$$\begin{aligned} E( \varvec{\mu }, \mathbf {\Sigma }) \subset B_{{\text {rank}}(\mathbf {\mathbf {\Sigma }})+1} \cdot \begin{bmatrix} \sqrt{\mathbf {\mathbf {\Sigma }}}\\ \varvec{\mu } \end{bmatrix}=E(\mathbf{0}, \mathbf {\mathbf {\Sigma }}+ \varvec{\mu } ^T \cdot \varvec{\mu }). \end{aligned}$$

Thus, we can homogenize and transform the instance into a centered one with \(\mathbf {\mathbf {\Sigma }}' := \mathbf {\mathbf {\Sigma }}+ \varvec{\mu } ^T \cdot \varvec{\mu } \).

Secondly, to get an isotropic distribution (i.e. with all its eigenvalues being 1), one can just multiply every element of the lattice with the pseudoinverse of \(\sqrt{\mathbf {\mathbf {\Sigma }}'}\). We get a new covariance matrix \(\mathbf {\Sigma } '' =\sqrt{\mathbf {\Sigma } '}^{\sim }\cdot \mathbf {\Sigma } '\cdot {\sqrt{\mathbf {\Sigma } '}^{\sim }}^T= \mathbf {\Pi }_{\mathbf {\Sigma } '}\cdot {{\mathbf {\Pi }_{\mathbf {\Sigma } '}}}^T\). And with orthogonal projection properties (see Sect. 2.1), \(\mathbf {\Sigma } ''=\mathbf {\Pi }_{\mathbf {\Sigma } '} =\mathbf {\Pi }_{\varLambda }\), the last equality coming from Eq. (2).

In summary, one must make by the two following changes:

$$\begin{aligned} \text {homogenize: }&(\varLambda , \varvec{\mu }, \mathbf {\mathbf {\Sigma }}) \mapsto (\varLambda , \mathbf{0} , \mathbf {\mathbf {\Sigma }}' := \mathbf {\mathbf {\Sigma }}+ \varvec{\mu } ^T \cdot \varvec{\mu })\\ \text {isotropize: }&(\varLambda , \mathbf{0} , \mathbf {\mathbf {\Sigma }}') \mapsto (\varLambda \cdot \mathbf{M}, \mathbf{0}, \mathbf {\Pi }_{\varLambda }) \end{aligned}$$

where \(\mathbf{M}:= (\sqrt{\mathbf {\mathbf {\Sigma }}'})^{\sim }\). From the solution \(\mathbf{x}\) to the \(\textsf {uSVP} _{\varLambda \cdot \mathbf{M}}\) problem, one can derive \(\mathbf{x}' = \mathbf{x} \mathbf{M}^{\sim }\) the solution to the \(\textsf {DBDD} _{\varLambda , \varvec{\mu }, \mathbf {\Sigma }}\) problem.

Remark 18

One may note that we could solve a DBDD instance without isotropization simply by including the ellipsoid in a larger ball, and directly apply lattice reduction before the second step. This leads, however, to less efficient attacks. One may also note that the first homogenization step “forgets” some information about the secret’s distribution. This, however, is inherent to the conversion to a unique-SVP problem which is geometrically homogeneous, and is already present in the original primal attack.

3.4 Security Estimates of uSVP: Bikz versus Bits

The attack on a uSVP instance consists of applying BKZ-\(\beta \) on the uSVP lattice \(\varLambda \) for an appropriate block size parameter \(\beta \). The cost of the attack grows with \(\beta \), however, modeling this cost precisely is at the moment rather delicate, as the state of the art seems to still be in motion. Numerous NIST candidates choose to underestimate this cost, keeping a margin to accommodate future improvements, and there seems to be no clear consensus on which model to use (see  [1] for a summary of existing cost models).

While this problem is orthogonal to our work, we still wish to be able to formulate quantitative security losses. We therefore express all concrete security estimates using the blocksize \(\beta \) as our measure of the level of security, and treat the latter as a measurement of the security level in a unit called the bikz. We thereby leave the question of the exact bikz-to-bit conversion estimate outside the scope of this paper, and recall that those conversion formulae are not necessarily linear, and may have small dependency in other parameters. For the sake of concreteness, we note that certain choose, for example, to claim 128 bits of security for 380 bikz, and in this range, most models suggest a security increase of one bit every 2 to 4 bikz.

Remark 19

We also clarify that the estimates given in this paper only concern the pure lattice attack via the uSVP embedding discussed above. In particular, we note that some NIST candidates with ternary secrets  [25] also consider the hybrid attack of  [20], which we ignore in this work. We nevertheless think that the compatibility with our framework is plausible, with some effort.

Predicting \(\beta \) from a instance. The state-of-the-art predictions for solving uSVP instances using BKZ were given in  [3, 4]. Namely, for \(\varLambda \) a lattice of dimension \(\dim (\varLambda )\), it is predicted that BKZ-\(\beta \) can solve a \(\textsf {uSVP} _{\varLambda }\) instance with secret \(\mathbf {{\mathbf {s}}\textstyle } \) when

$$\begin{aligned} \sqrt{\beta / \dim (\varLambda )} \cdot \Vert \mathbf {{\mathbf {s}}\textstyle } \Vert \le \delta _\beta ^{2 \beta - \dim (\varLambda ) - 1} \cdot {\text {Vol}}(\varLambda )^{1/\dim (\varLambda )} \end{aligned}$$
(7)

where \(\delta _\beta \) is the so called root-Hermite-Factor of BKZ-\(\beta \). For \(\beta \ge 50\), the Root-Hermite-Factor is predictable using the Gaussian Heuristic  [11]:

$$\begin{aligned} \delta _{\beta } = \left( (\pi \beta )^{\frac{1}{\beta }} \cdot \frac{\beta }{2 \pi e} \right) ^{1/(2\beta -2)}. \end{aligned}$$
(8)

Note that the uSVP instances we generate are isotropic and centered so that the secret has covariance \(\mathbf {\Sigma } = \mathbf{I}\) (or \(\mathbf {\Sigma } = \mathbf {\Pi }_{\varLambda } \) if \(\varLambda \) is not of full rank) and \(\mu = \mathbf{0}\). Thus, on average, we have \(\Vert \mathbf {\mathbf {{\mathbf {s}}\textstyle }}\Vert ^2 = {\text {rank}}(\mathbf {\Sigma }) = \dim (\varLambda )\). Therefore, \(\beta \) can be estimated as the minimum integer that satisfies

$$\begin{aligned} \sqrt{\beta } \le \delta _\beta ^{2 \beta - \dim (\varLambda )- 1} \cdot {\text {Vol}}(\varLambda )^{1/\dim (\varLambda )}. \end{aligned}$$
(9)

While \(\beta \) must be an integer as a BKZ parameter, we nevertheless provide a continuous value, for a finer comparison of the difficulty of an instance. Below, we will call this method the “GSA-Intersect” method.

Remark 20

To predict security, one does not need the basis of \(\varLambda \), but only its dimension and its volume. Similarly, it is not necessary to explicitly compute the isotropization matrix \(\mathbf{M}\) of Sect. 3.3, thanks to Fact 14: \({\text {Vol}}(\varLambda \cdot \mathbf{M}) = {\text {rdet}}(\mathbf{M}) {\text {Vol}}(\varLambda ) = {\text {rdet}}(\mathbf {\Sigma } ')^{-1/2} {\text {Vol}}(\varLambda )\). These two shortcuts will allow us to efficiently make predictions for cryptographically large instances, in our lightweight implementation of Sect. 5.

Refined prediction for small blocksizes. For experimental validation purposes of our work, we prefer to have accurate prediction even for small blocksizes; a regime where those predictions are not accurate with the current state of the art. We therefore present a refined strategy using BKZ-simulation and a probabilistic model in the full version of our paper [13].

As depicted in Fig. 4, this methodology (coined Probabilistic-simulation) leads to much more satisfactory estimates compared to the model from the literature  [3, 4]. In particular, for low blocksize the literature widely underestimates the required blocksize, which is due to only considering detectability at position \(d-\beta \). For large blocksize, it somewhat overestimates it, which could be attributed to the fact that it does not account for luck. On the contrary, our new methodology seems quite precise in all regimes, making errors of at most 1 bikz. This new methodology certainly deserves further study and refinement, which we leave to future work.

Fig. 4.
figure 4

The difference \(\varDelta \beta = \text {real} - \text {predicted}\), as a function of the average experimental beta \(\beta \). The experiment consists in running a single tour of BKZ-\(\beta \) for \(\beta =2,3,4,\dots \) until the secret short vector is found. This was averaged over 256 many LWE instances per data-point, for parameters \(q=3301\), \(\sigma =20\) and \(n=m \in \{30,32,34, \dots ,88\}\).

4 Hints and Their Integration

In this Section, we define several categories of hints—perfect hints, modular hints, approximate hints (conditioning and a posteriori), and short vector hints—and show that these types of hints can be integrated into a DBDD instance. Hints belonging to these categories typically have the form of a linear equation in \(\mathbf {{\mathbf {s}}\textstyle } \) (and possibly additional variables). As emphasized in Sect. 1, these hints have lattice-friendly forms and their usefulness in realistic applications may not be obvious. We refer to Sect. 6 for detailed applications of these hints.

The technical challenge, therefore, is to characterize the effect of such hints on the DBDD instance—i.e. determine the resulting \((\varLambda ', \varvec{\mu } ', \mathbf {\Sigma } ')\) of the new DBDD instance, after the hint is incorporated.

Henceforth, let \(\mathcal {I}= \textsf {DBDD} _{\varLambda , \varvec{\mu }, \mathbf {\Sigma }}\) be a fixed instance constructed from an LWE instance with secret \(\mathbf {{\mathbf {s}}\textstyle } = (\mathbf{z}, \mathbf{e})\). Each hint will introduce new constraints on \(\mathbf {{\mathbf {s}}\textstyle } \) and will ultimately decrease the security level.

Non-Commutativity. It should be noted that many types of hints commute: Integrating them in any order will lead to the same DBDD instance. Potential exceptions are non-smooth modular hints (See later in Sect. 4.2) and a posteriori approximate hints (See later in Sect. 4.4): they do not always commute with the other types of hints, and do not always commute between themselves, unless the vectors \(\mathbf{v}\)’s of those hints are all orthogonal to each other. The reason is: in these cases, the distribution in the direction of \(\mathbf{v}\) is redefined which erases the prior information.

4.1 Perfect Hints

Definition 21

(Perfect hint). A perfect hint on the secret \(\mathbf {{\mathbf {s}}\textstyle } \) is the knowledge of \(\mathbf{v} \in \mathbb {Z}^{d-1}\) and \(l \in \mathbb {Z}\), such that

$$\begin{aligned} \left\langle \mathbf {{\mathbf {s}}\textstyle },\ \mathbf{v} \right\rangle = l. \end{aligned}$$

A perfect hint is quite strong in terms of additional knowledge. It allows decreasing the dimension of the lattice by one and increases its volume. One could expect such hints to arise from the following scenarios:

  • The full leak without noise of an original coefficient, or even an unreduced intermediate register since most of the computations are linear. For the second case, one may note that optimized implementations of NTT typically attempt to delay the first reduction modulo q, so leaking a register on one of the first few levels of the NTT would indeed lead to such a hint.

  • A noisy leakage of the same registers, but with still a rather high guessing confidence. In that case it may be worth making the guess while decreasing the success probability of the attack.Footnote 1 This could happen in a cold-boot attack scenario. This is also the case in the single trace attack on Frodo  [9] that we will study as one of our examples in Sect. 6.1.

  • More surprisingly, certain schemes, including some NIST candidates offer such a hint ‘by design’. Indeed, LAC, Round5 and NTRU-HPS all choose ternary secret vectors with a prescribed number of 1’s and \(-1\)’s, which directly induce one or two such perfect hints. This will be detailed in Sect. 6.3.

Integrating a perfect hint into a instance. Let \(\mathbf{v} \in \mathbb {Z}^{d-1}\) and \(l \in \mathbb {Z}\) be such that \(\left\langle \mathbf {{\mathbf {s}}\textstyle }, \mathbf{v} \right\rangle = l\). Note that the hint can also be written as

$$\begin{aligned} \left\langle \bar{\mathbf {{\mathbf {s}}\textstyle }},\ \bar{\mathbf {v}} \right\rangle = 0, \end{aligned}$$

where \(\bar{\mathbf {{\mathbf {s}}\textstyle }} \) is the extended LWE secret as defined in Eq. (5) and \(\bar{\mathbf {v}} := (\mathbf{v} \, ; \, -l)\).

Remark 22

Here we understand the interest of using Kannan’s embedding before integrating hints rather than after: it allows to also homogenize the hint, and therefore to make \(\varLambda '\) a proper lattice rather than a lattice coset (i.e. a shifted lattice).

Including this hint is done by modifying the \(\textsf {DBDD} _{\varLambda , \varvec{\mu }, \mathbf {\Sigma }}\) to \(\textsf {DBDD} _{\varLambda ', \varvec{\mu } ', \mathbf {\Sigma } '}\), where:

$$\begin{aligned} \varLambda '&= \varLambda \cap \left\{ \mathbf{x}\in \mathbb {Z}^{d} \mid \left\langle \mathbf{x}, \bar{\mathbf {v}} \right\rangle = 0 \right\} \nonumber \\ \mathbf {\Sigma } '&= \mathbf {\Sigma }- \frac{(\bar{\mathbf {v}} \mathbf {\Sigma })^T\bar{\mathbf {v}} \mathbf {\Sigma }}{\bar{\mathbf {v}} \mathbf {\mathbf {\Sigma }}\bar{\mathbf {v}} ^T} \end{aligned}$$
(10)
$$\begin{aligned} \varvec{\mu } '&= \varvec{\mu }- \frac{\langle \bar{\mathbf {v}},\mathbf { \varvec{\mu }}\rangle }{\bar{\mathbf {v}} \mathbf {\mathbf {\Sigma }}\bar{\mathbf {v}} ^T} \bar{\mathbf {v}} \mathbf {\Sigma } \end{aligned}$$
(11)

We now explain how to derive the new mean \( \varvec{\mu } '\) and the new covariance \(\mathbf {\Sigma } '\). Let be the random variable \(\langle \bar{\mathbf {{\mathbf {s}}\textstyle }}, \bar{\mathbf {v}} \rangle \), where \(\bar{\mathbf {{\mathbf {s}}\textstyle }} \) has mean \( \varvec{\mu } \) and covariance \(\mathbf {\Sigma } \). Then \( \varvec{\mu } '\) is the mean of \(\bar{\mathbf {{\mathbf {s}}\textstyle }} \) conditioned on , and \(\mathbf {\Sigma } '\) is the covariance of \(\bar{\mathbf {{\mathbf {s}}\textstyle }} \) conditioned on . Using Corollary 7, we obtain the corresponding conditional mean and covariance.

We note that lattice \(\varLambda '\) is an intersection of \(\varLambda \) and a hyperplane orthogonal to \(\bar{\mathbf {v}} \). Given \(\mathbf{B}\) as basis of \(\varLambda \), by Lemma 9 a basis of \(\varLambda '\) can be computed as follows:

  1. 1.

    Let \(\mathbf{D}\) be dual basis of \(\mathbf{B}\). Compute \(\mathbf{D}_{\bot } := \mathbf{D} \cdot \mathbf {\Pi }^\bot _{\bar{\mathbf {v}}}\).

  2. 2.

    Apply the LLL algorithm on \(\mathbf{D}_{\bot }\) to eliminate linear dependencies. Then delete the first row of \(\mathbf{D}_{\bot }\) (which is \(\mathbf{0}\) because with the hyperplane intersection, the dimension of the lattice is decremented).

  3. 3.

    Output the dual of the resulting matrix.

While polynomial time, the above computation is quite heavy, especially as there is no convenient library offering a parallel version of LLL. Fortunately, for predicting attack costs, one only needs the dimension of the lattice \(\varLambda \) and its volume. These can easily be computed assuming \(\bar{\mathbf {v}} \) is a primitive vector (see Definition 10) of the dual lattice: the dimension decreases by 1, and the volume increases by a factor \(||\bar{\mathbf {v}} ||\). This is stated and proved in Lemma 11. Intuitively, the primitivity condition is needed since then one can scale the leak to \(\langle \mathbf{s}, f\mathbf{v}\rangle = fl\) for any non-zero factor \(f \in \mathbb {R}\) and get an equivalent leak; however there is only one factor f that can ensure that \(f\bar{\mathbf {v}} \in \varLambda ^*\), and is primitive in it.

Remark 23

Note that if \(\bar{\mathbf {v}} \) is not in the span of \(\varLambda \)—as typically occurs if other non-orthogonal perfect hints have already been integrated—Lemma 11 should be applied to the orthogonal projection \(\bar{\mathbf {v}} ' = \bar{\mathbf {v}} \cdot \mathbf {\Pi }_{\varLambda }\) of \(\bar{\mathbf {v}} \) onto \(\varLambda \). Indeed, the perfect hint \(\left\langle \bar{\mathbf {{\mathbf {s}}\textstyle }},\ \bar{\mathbf {v}} ' \right\rangle = 0\) replacing \(\bar{\mathbf {v}} \) by \(\bar{\mathbf {v}} '\) is equally valid.

4.2 Modular Hints

Definition 24

(Modular hint). A modular hint on the secret \(\mathbf {{\mathbf {s}}\textstyle } \) is the knowledge of \(\mathbf{v} \in \mathbb {Z}^{d-1}\), \(k \in \mathbb {Z}\) and \(l \in \mathbb {Z}\), such that

$$\begin{aligned} \left\langle \mathbf {{\mathbf {s}}\textstyle },\ \mathbf{v} \right\rangle = l \mod k. \end{aligned}$$

We can expect such hints to arise from several scenarios:

  • obtaining the value of an intermediate register during LWE decryption would likely correspond to giving such a modular equation modulo q. This is also the case if an NTT coefficient leaks in a Ring-LWE scheme. It can also occur “by design” if the LWE secret is chosen so that certain NTT coordinates are fixed to 0 modulo q, as is the case in some instances of Order LWE  [6].

  • obtaining the absolute value \(a = |s|\) of a coefficient s implies \(s = a \bmod 2a\), and such a hint could be obtained by a timing attack on an unprotected implementation of a table-based sampler, in the spirit of  [17].

  • obtaining the Hamming weight of the string \(b_1b_2 \dots b_1'b_2'\dots \) used to sample a centered binomial coefficient \(s = \sum b_i - \sum b'_i\) (as done in NewHope and Kyber  [31, 34]) reveals in particular \(s \bmod 2\). Indeed, the latter string (or at least some parts of it) is more likely to be leaked than the Hamming weight of s.

Integrating a modular hint into a instance. Let \(\mathbf{v} \in \mathbb {Z}^{d-1}\); \(k\in \mathbb {Z}\) and \(l \in \mathbb {Z}\) be such that \(\left\langle \mathbf {{\mathbf {s}}\textstyle }, \mathbf{v}\right\rangle = l \mod k\). Note that the hint can also be written as

$$\begin{aligned} \left\langle \bar{\mathbf {{\mathbf {s}}\textstyle }},\ \bar{\mathbf {v}} \right\rangle = 0 \mod k \end{aligned}$$
(12)

where \(\bar{\mathbf {{\mathbf {s}}\textstyle }} \) is the extended LWE secret as defined in Eq. 5 and \(\bar{\mathbf {v}} := (\mathbf{v} \, ; \, -l)\). We refer to Remark 22 for the legitimacy of such dimension increase.

Smooth case. Intuitively, such a hint should only sparsify the lattice, and leave the average and the variance unchanged. This is not entirely true, this is only (approximately) true when the variance is sufficiently large in the direction of \(\mathbf{v}\) to ensure smoothness, i.e. when \(k^2 \ll \mathbf{v} \mathbf {\Sigma } \mathbf{v}^T\); one can refer to [28, Lemma 3.3 and Lemma 4.2] for the quality of that approximation. In this smooth case, we therefore have:

$$\begin{aligned} \varLambda '&= \varLambda \cap \left\{ \mathbf{x}\in \mathbb {Z}^{d}\ |\ \left\langle \mathbf{x}, \bar{\mathbf {v}} \right\rangle = 0\mod k \right\} \end{aligned}$$
(13)
$$\begin{aligned} \varvec{\mu } '&= \varvec{\mu } \end{aligned}$$
(14)
$$\begin{aligned} \mathbf {\Sigma } '&= \mathbf {\Sigma } \end{aligned}$$
(15)

On the other hand, if \(k^2 \gg \mathbf{v} \mathbf {\Sigma } \mathbf{v}^T\), then the residual distribution will be highly concentrated on a single value, and one should therefore instead use a perfect \( \left\langle \mathbf {{\mathbf {s}}\textstyle },\ \mathbf{v} \right\rangle = l + ik\) for some i.

General case. In the general case, one can resort to a numerical computation of the average \(\mu _c\) and the variance \(\sigma _c^2\) of the one-dimensional centered discrete Gaussian of variance \(\sigma ^2 = \mathbf{v} \mathbf {\Sigma } \mathbf{v}^T\) over the coset \(l + k\mathbb {Z}\), and apply the corrections:

$$\begin{aligned} \varvec{\mu } '&= \varvec{\mu } + \frac{\mu _c-\langle \bar{\mathbf {v}}, \varvec{\mu } \rangle }{\bar{\mathbf {v}} \mathbf {\Sigma } \bar{\mathbf {v}} ^T} \bar{\mathbf {v}} \mathbf {\Sigma } \end{aligned}$$
(16)
$$\begin{aligned} \mathbf {\Sigma } '&= \mathbf {\Sigma } + \left( \frac{\sigma _c^2}{(\bar{\mathbf {v}} \mathbf {\Sigma } \bar{\mathbf {v}} ^T)^2} - \frac{1}{\bar{\mathbf {v}} \mathbf {\Sigma } \bar{\mathbf {v}} ^T}\right) ( \bar{\mathbf {v}} \mathbf {\Sigma })^T( \bar{\mathbf {v}} \mathbf {\Sigma }) \end{aligned}$$
(17)

Intuitively, these formulae completely erase prior information on \(\langle \mathbf {{\mathbf {s}}\textstyle }, \bar{\mathbf {v}} \rangle \), before it is replaced by the new average and variance in the adequate direction. Both can be derivedFootnote 2 using Corollary 7.

As for perfect hints, the computation of \(\varLambda '\) can be done by working on the dual lattice. More specifically:

  1. 1.

    Let \(\mathbf{D}\) be dual basis of \(\mathbf{B}\).

  2. 2.

    Redefine \(\bar{\mathbf {v}} \leftarrow \bar{\mathbf {v}} \cdot \mathbf {\Pi }_{\varLambda }\), noting that this does not affect the validity of the hint.

  3. 3.

    Append \(\bar{\mathbf {v}}/k\) to \(\mathbf{D}\) and obtain \(\mathbf{D}'\)

  4. 4.

    Apply the LLL algorithm on \(\mathbf{D}'\) to eliminate linear dependencies. Then delete the first row of \(\mathbf{D}'\) (which is \(\mathbf{0}\) since we introduced a linear dependency).

  5. 5.

    Output the dual of the resulting matrix.

Also, as for perfect hints the parameters of the new lattice \(\varLambda '\) can be predicted: the dimension is unchanged, and the volume increases by a factor k under a primitivity condition, which is proved by Lemma 12.

4.3 Approximate Hints (conditioning)

Definition 25

(Approximate hint). An approximate hint on the secret \(\mathbf {{\mathbf {s}}\textstyle } \) is the knowledge of \(\mathbf{v} \in \mathbb {Z}^{d-1}\) and \(l \in \mathbb {Z}\), such that

$$\begin{aligned} \left\langle \mathbf {{\mathbf {s}}\textstyle },\ \mathbf{v} \right\rangle + e = l, \end{aligned}$$

where \(e \) models noise following a distribution \(N_1(0, \sigma _{e}^2)\), independent of \(\mathbf{s}\).

One can expect such hints from:

  • any noisy side channel information about a secret coefficient. This is the case of our study in Sect. 6.1.

  • decryption failures. In Sect. 6.2, we show how this type of hint can represent the information gained by a decryption failure.

To include this knowledge in the DBDD instance, we must combine this knowledge with the prior knowledge on the solution \(\mathbf {{\mathbf {s}}\textstyle } \) of the instance.

Integrating an approximate hint into a instance. Let \(\mathbf{v} \in \mathbb {Z}^{d-1}\) and \(l \in \mathbb {Z}\) be such that \(\left\langle \mathbf {{\mathbf {s}}\textstyle }, \mathbf{v} \right\rangle \approx l\). Note that the hint can also be written as

$$\begin{aligned} \left\langle \bar{\mathbf {{\mathbf {s}}\textstyle }},\ \bar{\mathbf {v}} \right\rangle +e = 0 \end{aligned}$$
(18)

where \(\bar{\mathbf {{\mathbf {s}}\textstyle }} \) is the extended LWE secret as defined in Eq. (5), \(\bar{\mathbf {v}} := (\mathbf{v} \, ; \, -l)\), and \(e \) has \(N_1(0, \sigma _{e}^2)\) distribution. The unique shortest non-zero solution of \(\textsf {DBDD} _{\varLambda , \varvec{\mu }, \mathbf {\Sigma }}\), is also the unique solution of the instance \(\textsf {DBDD} _{\varLambda ', \varvec{\mu } ', \mathbf {\Sigma } '}\) where

$$\begin{aligned} \varLambda '&= \varLambda \end{aligned}$$
(19)
$$\begin{aligned} \mathbf {\Sigma } '&= \mathbf {\Sigma }- \frac{(\bar{\mathbf {v}} \mathbf {\Sigma })^T\bar{\mathbf {v}} \mathbf {\Sigma }}{\bar{\mathbf {v}} \mathbf {\mathbf {\Sigma }}\bar{\mathbf {v}} ^T + \sigma _{e}^2}\end{aligned}$$
(20)
$$\begin{aligned} \varvec{\mu } '&= \varvec{\mu }- \frac{ \langle \bar{\mathbf {v}},\mathbf { \varvec{\mu }}\rangle }{\bar{\mathbf {v}} \mathbf {\mathbf {\Sigma }}\bar{\mathbf {v}} ^T + \sigma _{e}^2} \bar{\mathbf {v}} \mathbf {\Sigma } \end{aligned}$$
(21)

We note that Eq. (19) comes from

$$\varLambda ' := \varLambda \cap \left\{ \mathbf{x}\in \mathbb {Z}^{d} \mid \left\langle \mathbf{x}, \bar{\mathbf {v}} \right\rangle + e = 0, \text { for all possible } e \sim N_1(0, \sigma _{e}^2) \right\} = \varLambda .$$

The new covariance and mean follow from Corollary 7.

Consistency with Perfect Hint. Note that if \(\sigma _{e} = 0\), we fall back to a perfect hint \(\langle \mathbf {{\mathbf {s}}\textstyle }, \mathbf{v} \rangle = l\). The above computation of \(\mathbf {\Sigma } '\) (20) (resp. \( \varvec{\mu } '\) (21)) is indeed equivalent to Eq. (10) (resp. Eq. (11)) from Sect. 4.1. Note however, in our implementation, that to avoid singularities, we require the span of \({\text {Span}}(\mathbf {\Sigma } + \varvec{\mu } ^T \varvec{\mu }) = {\text {Span}}(\varLambda )\) (See the requirement in Eq. (2)): If \(\sigma _{e} = 0\), one must instead use a Perfect hint.

Multi-dimensional approximate hints. The formulae of [24] are even more general, and one could consider a multidimensional hint of the form \(\mathbf {{\mathbf {s}}\textstyle } \mathbf{V} + \mathbf{e} = \mathbf{l}\), where \(\mathbf{V} \in \mathbb {R}^{n \times k}\) and \(\mathbf{e}\) a gaussian noise of any covariance \(\mathbf {\Sigma } _{\mathbf{e}}\). However, those general formulae require explicit matrix inversion which becomes impractical in large dimension. We therefore only implemented full-dimensional (\(k=n\)) hint integration in the super-lightweight version of our tool, which assumes all covariance matrices to be diagonal. These will be used for hints obtained from decryption failures in Sect. 6.2.

4.4 Approximate Hint (a posteriori)

In certain scenarios, one may more naturally obtain directly the a posteriori distribution of \(\langle \mathbf {{\mathbf {s}}\textstyle }, \mathbf{v}\rangle \), rather than a hint \(\left\langle \mathbf {{\mathbf {s}}\textstyle }, \mathbf{v} \right\rangle + e = l\) for some error e independent of \(\mathbf {{\mathbf {s}}\textstyle } \). Such a scenario is typical in template attacks, as we exemplify via the single trace attack on Frodo from  [9], which we study in Sect. 6.1.

Given the a posteriori distribution of \(\langle \bar{\mathbf {{\mathbf {s}}\textstyle }}, \bar{\mathbf {v}} \rangle \), one can derive its mean \(\mu _{\text {ap}}\) and variance \(\sigma ^2_{\text {ap}}\) and apply the corrections to compute the new mean and covariance exactly as in Eqs. (16) and (17).

4.5 Short Vector Hints

Definition 26

(Short vector hint). A short vector hint on the lattice \(\varLambda \) is the knowledge of a short vector \(\bar{\mathbf {v}} \) such that

$$\begin{aligned} \bar{\mathbf {v}} \in \varLambda . \end{aligned}$$

Note that such hints are not related to the secret, and are not expected to be obtained by side-channel information, but rather by the very design of the scheme. In particular, the lattice \(\varLambda \) underlying LWE instance modulo q contains the so-called q-vectors, i.e. the vectors \((q, 0, 0, \dots ,0)\) and its permutations. These vectors are in fact implicitly exploited in the literature on the cryptanalysis of LWE since at least  [23]. Indeed, in some regimes, the best attacks are obtained by ‘forgetting’ certain LWE equations, which can be geometrically interpreted as a projection orthogonally to a q-vector. Note that, among all hints, the short vector hints should be the last to be integrated. In our context, we need to generalize this idea beyond q-vector because the q-vectors may simply disappear after the integration of a perfect or modular hint. For example, after the integration of a perfect hint \(\langle \mathbf {{\mathbf {s}}\textstyle }, (1, 1, \dots , 1)\rangle = 0\), all the q-vectors are no longer in the lattice, but \((q, -q, 0, \dots ,0)\) still is, and so are all its permutations.

Resolving the DBDD problem resulting from this projection will not directly lead to the original secret, as projection is not injective. However, as long as we keep \(n+1\) dimensions out of the \(n+m+1\) dimensions of the original LWE instance, we can still efficiently reconstruct the full LWE secret by solving a linear system over the rationals.

Integrating a short vector hint into a   instance. It is the case when the secret vector is short enough to be a solution after applying projection \(\mathbf {\Pi }^\bot _{\bar{\mathbf {v}}}\) on \(\textsf {DBDD} _{\varLambda , \mathbf {\Sigma }, \varvec{\mu }}\).

$$\begin{aligned} \varLambda '&= \varLambda \cdot \mathbf {\Pi }^\bot _{\bar{\mathbf {v}}}\end{aligned}$$
(22)
$$\begin{aligned} \mathbf {\Sigma } '&= (\mathbf {\Pi }^\bot _{\bar{\mathbf {v}}})^T \cdot \mathbf {\Sigma } \cdot \mathbf {\Pi }^\bot _{\bar{\mathbf {v}}}\end{aligned}$$
(23)
$$\begin{aligned} \varvec{\mu } '&= \varvec{\mu } \cdot \mathbf {\Pi }^\bot _{\bar{\mathbf {v}}} \end{aligned}$$
(24)

To compute a basis of \(\varLambda '\) one can simply apply the projection to all the vectors of its current basis, and then eliminate linear dependencies in the resulting basis using LLL.

Remark 27

Once a short vector hint \(\bar{\mathbf {v}} \in \varLambda \) has been integrated, \(\varLambda \) has been transformed into \(\varLambda '\). And, if one has to perform another short vector hint integration \(\bar{\mathbf {v}} _1 \in \varLambda \), \(\bar{\mathbf {v}} _1\) should be projected onto \(\varLambda '\) with \(\bar{\mathbf {v}} \cdot \mathbf {\Pi }_{\varLambda '} \in \varLambda '\). In our implementation however, this has been taken into account and one can simply apply the same transformation as above, replacing a single vector \(\bar{\mathbf {v}} \) by a matrix \(\mathbf{V}\).

The dimension of the lattice decreases by one (or by k, if one directly integrates a matrix of k vectors) and the volume of the lattice also decreases according to Fact 13. One can also predict the decrease of the determinant of \(\mathbf {\Sigma } \) via the identity:

(25)

Worthiness and choice of short vector hints. Integrating such a hint induces a trade-off between the dimension and the volume, and therefore it is not always advantageous to integrate.

This raises the following potentially hard problem: given a set \(\mathbf{W}\) of short vectors of \(\varLambda \) (viewed as a matrix), which subset \(\mathbf{V} \subset \mathbf{W}\) of size k lead to the easiest \(\textsf {DBDD} \) instance? Because the hardness of the new problem grows with

(26)

In the case of an un-hinted DBDD instance directly obtained from the LWE problem, for \(\mathbf{V}\) being the set of (primitive) q-vectors, the problem is easier: all subsets of size k lead to instances with the same parameters.

But this is not true anymore as soon as \(\mathbf {\Sigma } \) has been altered or if the set \(\mathbf{W}\) is arbitrary. For example, setting \(\mathbf {\Sigma } =\mathbf{I}\), one simply wishes to minimize \(\det (\mathbf{V} \mathbf{V}^T)\); but for an arbitrary set \(\mathbf{W}\) the problem of finding the optimal subset \(\mathbf{V} \subset \mathbf{W}\) is NP-hard  [22], and remains NP-hard up to exponential approximation factors.

A natural approach to try to get an approximate solution in polynomial time consists in making sequential greedy choices. This involves computing \(|\mathbf{V}|\cdot |\mathbf{W}|\) many matrix-vector products over increasingly large rationals, and appeared painfully slow in practice for making prediction on cryptographically large instances. Fortunately, in the typical cases where the vectors of \(\mathbf{W}\) are the q-vectors, this can be made somewhat practical (See Sect. 6.3 for example).

Remark 28

When the basis of an LWE-lattice is given in its systematic form, the q-vectors are already explicitly given to lattice reduction algorithms, and these algorithms will implicitly make use of them when they are worthy, as if we had integrated them. The reason is that lattice reduction algorithm naturally work with projected sublattices, and if a q-vector is shorter than what the algorithm can produce, those q-vectors will remain untouched at the beginning of the basis; the reduction algorithm will effectively work on the lattice projected orthogonally to them. In other words, integrating q-vectors is important to understand and predict how lattice reduction algorithm will work, but, in certain cases they may be automatically detected and exploited by lattice reduction algorithms themselves.

5 Implementation

5.1 Our Sage Implementation

We propose three implementations of our framework, all following the same python/sage 9.0 API.Footnote 3 More specifically, the API and some common functions are defined in DBDD_generic.sage, as a class DBDD_Generic. Three derived classes are then given:

  1. 1.

    The class DBDD (provided in DBDD.sage) is the full-fledged implementation: i.e. it fully maintains all information about a DBDD instance as one integrates hints: the lattice \(\varLambda \), the covariance matrix \(\mathbf {\Sigma } \) and the average \( \varvec{\mu } \). While polynomial time, maintaining the lattice information can be quite slow, especially since consecutive intersections with hyperplanes can lead to manipulations on rationals with large denominators. It also allows to finalize the attack, running the homogenization, isotropization and lattice reduction, based on the fplll  [15] library available through sage.

    We note that if one were to repeatedly use perfect or modular hints, a lot of effort would be spent on uselessly alternating between the primal and the dual lattice. Instead, we implement a caching mechanism for the primal and dual basis, and only update them when necessary.

  2. 2.

    The class DBDD_predict (provided in DBDD_predict.sage) is the lightweight implementation: it only fully maintains the covariance information, and the parameters of the lattice (dimension, volume). It must therefore work under assumptions about the primitivity of the vector \(\mathbf{v}\); in particular, it cannot detect hints that are redundant. If one must resort to this faster variant on large instances, it is advised to consider potential (even partial) redundancy between the given hints, and to run a comparison with the previous on small instances with similarly generated hints.

  3. 3.

    The class DBDD_predict_diag (provided in DBDD_predict_diag.sage) is the super-lightweight implementation. It maintains the same information as the above, but requires the covariance matrix to remain diagonal at all times. In particular, one can only integrate hints for which the directional vector \(\mathbf{v}\) is colinear with a canonical vector.

5.2 Tests and Validation

In the full version of our paper, we present a demonstration of our tool with some extracts of Sage 9.0 code. We implement two tests to verify the correctness of our scripts, and more generally the validity of our predictions.

Consistency checks. Our first test (check_consistency.sage) simply verifies that all three classes always agree perfectly. More specifically we run all three versions on a given instances, integrating the same random hint in all of them, and compare their hardness prediction. We first test using the full-fledged version that the primitivity condition does hold, and discard the hint if not, as we know that predictions cannot be correct on such hints. This verification passes.

Prediction verifications. We now verify experimentally the prediction made by our tool for various types of hints, by comparing those predictions to actual attack experiments (see compare_usvp_models.sage for the prediction without hints and prediction_verifications.sage for the prediction with hints). This is done for a given set of LWE parameters, and increasing the number of hints. The details of the experiments and the results are given in Fig. 5.

Fig. 5.
figure 5

Experimental verification of the security decay predictions for each type of hints. Each data point was averaged over 256 samples.

While our predictions seem overall accurate, we still note a minor discrepancy of up to 2 or 3 bikz in the low blocksize regime. This exceeds the error made by prediction on the attack without any hint, which was below 1 bikz, even in the same low blocksize regime. We suspected that this discrepancy is due to residual q-vectors, or small combinations of them, that are hard to predict for randomly generated hints, but would still benefit by lattice reduction. We tested that hypothesis by running similar experiments, but leaving certain coordinates untouched by hints, so to still explicitly know some q-vectors for short-vector hint integration, if they are “worthy”. This didn’t to improve the accuracy of our prediction, which infirms our suspected explanation. We are at the moment unable to explain this inaccuracy. We nevertheless find our predictions satisfactory, considering that even without hints, previous predictions  [3] were much less accurate (see Fig. 4).

6 Applications Examples

6.1 Hints from Side Channels

In [9], W. Bos et al. study the feasibility of a single-trace power analysis of the Frodo Key Encapsulation Mechanism (FrodoKEM) [29]. Specifically, in the first approach, they analyze the possibility of a divide-and-conquer attack targeting a multiplication in the key generation. This attack was claimed unsuccessful in [9] because the bruteforce phase after recovering a candidate for the private key was too expensive. Along with this unsuccessful result, a successful powerful extend-and-prune attack is provided in [9].

We emphasize that the purpose of this section is to exemplify our tool on a standard side-channel attack, and this is why we choose the former unsuccessful divide-and-conquer attack of  [9]. The point of this section is to show that our framework can indeed lead to improvements in the algorithmic phase of a side-channel attack, once the leak has been fixed.

FrodoKEM. FrodoKEM is based on small-secret-LWE; we outline here some details necessary to understand the attack. Note that we use different letter notations from  [29] for consistency. For parameters n and q, the private key is \((\mathbf{z} \in \mathbb {Z}_q^{n}, \mathbf{e} \in \mathbb {Z}_q^{n})\) where the coefficients of \(\mathbf{z}\) and \(\mathbf{e}\), denoted \(\mathbf{z}_i\) and \(\mathbf{e}_i\), can take several values in a small set that we denote L. The public key is \(\left( \mathbf{A} \in \mathbb {Z}_q^{n \times n}, \mathbf{b} = \mathbf{z} \mathbf{A} + \mathbf{e} \right) \). The goal of the attack is to recover \(\mathbf{z}\) by making measurements during the multiplication between \(\mathbf{z}\) and \(\mathbf{A}\) when computing \(\mathbf{b}\) in the key generation. Note that there is no multiplication involving \(\mathbf{e}\) and thus it is not targeted in this attack. Six sets of parameters are considered: CCS1, CCS2, CCS3 and CCS4 introduced in [8] and NIST1 and NIST2 introduced in [29]. For example, with NIST1 parameters, \(n=640,\ q=2^{15}\text { and }L = \{-11, \cdots , 11\}\).

Side-channel simulation. The divide-and-conquer attack provided by  [9] simulates side-channel information using ELMO, a power simulator for a Cortex M0  [27]. This tool outputs simulated power traces using an elaborate leakage model with Gaussian noise. Thus, it is parametrized by the standard deviation of the side-channel noise. For proofs of concept, the authors of [27] suggest to choose the standard deviation of the simulated noise as \(\sigma _\mathrm{SimNoise} := 0.0045\) for realistic leakage modeling. This standard deviation was also the one chosen in [9, Fig. 2b] and W. Bos et al. implemented a Matlab script that calls ELMO to simulate the side-channel information applied on Frodo. This precise side-channel simulator was provided to us by the authors of [9] and we were able to re-generate all their data with Matlab, again using \(\sigma _\mathrm{SimNoise} = 0.0045\).

Template attack. The divide-and-conquer side-channel attack proposed by W. Bos et al. belongs in the template attack family. Template attacks were introduced in [10]. In a nutshell, these attacks include a profiling phase and an online phase. Let us detail the template attack for Frodo implemented in [9].

  1. 1.

    The profiling phase consists in using a copy of the device and recording a large number of traces using many different known secret values. From these measures, the attacker can derive the multidimensional distribution of several points of interest when the traces share the same secret coefficient. More precisely, in the case of FrodoKEM, for a given index \(i\in [0, n-1]\), the points of interest will be the instants in the trace when \(\mathbf{z}_i\) is multiplied by the coefficients of \(\mathbf{A}\)(n interest points in total). Let us define

    $$\begin{aligned} \mathbf{c}_i := (T[t_{i,0}], \ldots , T[t_{i, n-1}]) \quad \mathbf{c} \in \mathbb {R}^n, \end{aligned}$$
    (27)

    where \(T\) denotes the trace measurement and \((t_{i,k})\) denotes the instants of the multiplication of \(\mathbf{z}_i\) with the coefficients \(\mathbf{A}_{i,k}\) for \((i,k) \in [0, n-1]\). The random variable vector associated to \(\mathbf{c}_i\) is denoted by . For each \(i\in [0, n-1]\) and \(x \in L\), the goal of the profiling phase is to learn the center of the probability distribution

    By hypothesis, for template attacks (see [10, Section 2.1]), \(A_{i,x}\) is assumed to follow a multidimenstional normal distribution of standard deviation \(\sigma _\mathrm{SimNoise} \cdot \mathbf{I}_n\). Thus, the attacker recovers the center of \(A_{i,x}\) for each \(i\in [0, n-1]\) and \(x \in L\) by averaging all the measured \(\mathbf{c}_i\) that validate \(\mathbf{z}_i = x\). The center of \(A_{i,x}\) is denoted \(\mathbf{t}_{i,x}\) and we call it a template. W. Bos et al.  [9] actually assume that \(\mathbf{t}_{i,x}\) depends only on x and is independent from the index i. Thus, \(\mathbf{t}_{i,x}= \mathbf{t}_{x}\). Essentially, this common assumption implies that the index \(i\in [0, n-1]\) of the target coefficient does not influence the leakage. Consequently, the attacker only has to derive \(\mathbf{t}_{0,x}\), for example.

  2. 2.

    In a second step, the attacker knows the templates \(\mathbf{t}_{x}\) for all \(x\in L\). She also knows the points of interest \(t_{i,k}\) as defined above in Eq. 27. She will construct a candidate \(\tilde{\mathbf{z}}\) for the secret \(\mathbf{z}\) by recovering the coefficients one by one. For each unknown secret coefficient \(\mathbf{z}_i\), she takes the measurement \(\mathbf{c}_i\) as defined in Eq. 27. Using this measurement, she can derive an a posteriori probability distribution: With her fixed \(i \in [0, n-1]\) and measured \(\mathbf{c}_i \in \mathbb {R}\), she computes for all \(x \in L\),

    (28)
    $$\begin{aligned}&\propto P\left[ \mathbf{z}_i = x\right] \cdot \exp \left( -\frac{\Vert \mathbf{c}_i - \mathbf{t}_x\Vert _2^2}{2\sigma _\mathrm{SimNoise}^2} \right) \end{aligned}$$
    (29)

    In [9], a score table, denoted \((S_i[x])_{x\in L}\) is derived from the a posteriori distribution as follows,

    (30)
    $$\begin{aligned}&= \ln \left( P\left[ \mathbf{z}_i = x\right] \right) -\frac{\Vert \mathbf{c}_i - \mathbf{t}_x\Vert _2^2}{2\sigma _\mathrm{SimNoise}^2}. \end{aligned}$$
    (31)

    Finally, the output candidate for \(\mathbf{z}_i\) is \(\tilde{\mathbf{z}}_i := \text {argmax}_{x\in L}(S_i[x])\).

One can use the presented attack as a “black-box” to generate the score tables using the script from [9]. As an example, using the NIST1 parameters, we show several measured scores \((S[-11], \cdots , S[11])\) corresponding to several secret coefficients in Table 1. The first line corresponds to a secret equal to 0, the second line to 1 and the third and fourth line to \(-1\). The last line is an example of failed guessing because we see that the outputted candidate is not \( -1\). We remark that the values having the opposite sign are assigned a very low score, we conjecture that it is because the sign is filling the register and then the Hamming weight of the register will be very far from the correct one.

Table 1. Examples of scores associated to the secret values \(\mathbf {{\mathbf {s}}\textstyle } _i \in \{0,\pm 1\}\), after the side-channel analysis of [9] for NIST1 parameters. The best score in each score table is highlighted. This best guess is correct for the first 3 score table, but incorrect for the last one.

With this template attack, one can recover \(\tilde{\mathbf{z}} \approx \mathbf{z}\). However, W. Bos et al.  [9] could not conclude the attack with a key recovery even though much information leaked about the secret. Frustratingly, a bruteforce phase to derive \(\mathbf{z}\) from \(\tilde{\mathbf{z}}\) did not lead to any security threat as stated in [9, Section 3]. They actually pointed out an interesting open question of whether “novel lattice reduction algorithms [can] take into account side-channel information”. Our work solves this open question by combining the knowledge obtained in the divide-and-conquer template attack of [9] with our framework.

From scores to hints. We first instantiate a DBDD instance with a chosen set of parameters. Then we assume that, for each secret coefficient \(\mathbf{z}_i\), we are given the associated score table \(S_i\), thanks to the template attack that has already been carried out. We go back to the a posteriori distribution in Eq. 29 by applying the \(\exp ()\) function and renormalizing the score table. As an example, we show the probability distributions derived from Table 1, along with their variances and centers, in Table 2.

Finally, we use our framework to introduce n a posteriori approximate hints to our DBDD instance with the derived centers and variances for each score table. When the variance is exactly 0, we integrate perfect hints instead.

Table 2. Probability distributions derived from Table 1, along with variances and centers.

Results. One can reproduce this attack using the Sage 9.0 script \(\textsf {exploiting\_SCA\_from\_Bos\_et\_al.sage}\). The experimentally derived data containing the score tables is in the folder \(\textsf {Scores\_tables\_SCA}\) for which, as mentioned earlier, was generated with a simulated noise variance of 0.0045. One can note that the obtained security fluctuates a bit from instance to instance, as it depends on the strength of the hints, which themselves depend on the randomness of the scheme. In the first two lines of Table 3, we show the new security with the inclusion of the approximate hints averaged on 50 tests per set of parameters.

Table 3. Cost of the attacks without/with hints without/with guesses.

Guessing. To improve the attack further, one can note from Table 2 that certain key values have a very high probability of being correct, and assuming each of these values are correct, one can replace an approximate hint with a perfect one. For example, considering the second line of Table 2, the secret has a probability of 0.95 to be 1 and thus guessing it trades a perfect hint for a decrease of the success probability of the attack by \(5\%\). This hybrid attack exploiting hints, guesses and lattice reduction, works as follows. Let g be a parameter.

  1. 1.

    Include all the approximate and perfect hints given by the score tables,

  2. 2.

    Order the coefficients of the secret \(\mathbf{z}_i\) according to the maximum value of their a posteriori distribution table,

  3. 3.

    Include perfect hints for the g first coefficients and then solve and check the solution.

Increasing the number of guesses g leads to a trade-off between the cost of the attack and its success probability. We have chosen here a success probability larger than 0.6, while reducing the attack cost by 38 to 145 bikz depending on the parameter set. Given that 1 bit of security corresponds roughly to 3 or 4 bikz, this is undoubtedly advantageous.

Remark 29

The refinement presented above are very recent (lastly improved on June 2020). We are grateful to the authors of  [9] of for helping us reconstructing distributions from the score table.

We remark that, with these results, the attacks with guesses on the parameters CCS1 and CCS2 seem doable in practice while it was not the case with our original results. However, some improvements of the implementation remain to be done in order to actually mount the attack. The full-fledged implementation cannot handle in reasonable time the large matrices of the original DBDD instance. We require another class of implementation which fully maintains all information about the instance, like the DBDD class, and assumes that the covariance matrix \(\mathbf {\Sigma } \) is diagonal to simplify the computations, like the DBDD_predict_diag class. We hope to report on such an implementation in a future update of this report.

Remark 30

It should be noted that, given a single trace, one cannot naively retry the attack to boost its success probability. Indeed, the “second-best” guess may already have a much lower success probability than the first. Setting up such an hybrid attack mixing lattice reduction within our framework and key-ranking appears to be an interesting problem.

6.2 Hints from Decryption Failures

Another kind of hint our framework can model are hints provided by decryption failures. Using our framework, we produce prediction on a decryption failure attack on FrodoKEM-976 that match very closely the ad-hoc analysis of  [14]. Our analysis is deferred to the full version of this paper [13].

6.3 Structural Hints from Design

Interestingly, we can also incorporate structural information on the secret or error that is present in certain schemes. We present (slightly) improved attacks on several Round 2 NIST submissions (such as LAC, Round5, and NTRU) which use ternary distribution for secrets, with a prescribed numbers of 1’s and \(-1\)’s in the full version of our paper [13].