# Coded-BKW: Solving LWE Using Lattice Codes

## Abstract

In this paper we propose a new algorithm for solving the Learning With Errors (LWE) problem based on the steps of the famous Blum-Kalai-Wasserman (BKW) algorithm. The new idea is to introduce an additional procedure of mapping subvectors into codewords of a lattice code, thereby increasing the amount of positions that can be cancelled in each BKW step. The procedure introduces an additional noise term, but it is shown that by using a sequence of lattice codes with different rates the noise can be kept small. Developed theory shows that the new approach compares favorably to previous methods. It performs particularly well for the binary-LWE case, i.e., when the secret vector is sampled from \(\{0,1\}^*\).

## Keywords

LWE binary-LWE BKW Coded-BKW Lattice codes## 1 Introduction

Learning with Errors (LWE) is a problem that has received a lot of attention recently and can be considered as a generalization of the Learning Parity with Noise (LPN) problem. Regev introduced LWE in [31], and it has proved to be a very useful tool for constructing cryptographic primitives. Although a great number of different constructions of cryptographic primitives have been given since the introduction of the LWE problem, one of the most interesting ones is the work on constructing fully homomorphic encryption schemes [8, 10, 19, 20].

There are several motivating reasons for the interest in LWE-based cryptography. One is the simplicity of the constructions, sometimes giving rise to very efficient implementations which run much faster than competing alternative solutions. Another reason is the well-developed theory on lattice problems, which gives insights into the hardness of the LWE problem. There are theoretical reductions from worst-case lattice problems to average-case LWE [31]. A third motivating reason is the fact that LWE-based cryptography is one of the areas where a quantum computer is not known to be able to break the primitives (contrary to factoring-based and discrete log-based primitives). This is sometimes referred to as being a tool in *post-quantum cryptography*.

Let us state the LWE problem.

### **Definition 1**

*n*be a positive integer,

*q*an odd prime, and let \(\mathcal {X}\) be an error distribution selected as the discrete Gaussian distribution on \(\mathbb {Z}_q\). Fix \(\mathbf s\) to be a secret vector in \(\mathbb {Z}_q^n\), chosen according to a uniform distribution. Denote by \(L_{\mathbf{s},\mathcal {X}}\) the probability distribution on \(\mathbb {Z}_q^n\times \mathbb {Z}_q\) obtained by choosing \(\mathbf{a}\in \mathbb {Z}_q^n\) uniformly at random, choosing an error \(e\in \mathbb {Z}_q\) according to \(\mathcal {X}\) and returning

The definition above gives the *search* LWE problem, as the problem description asks for the recovery of the secret vector \(\mathbf s\). Another variant is the so-called *decision* LWE problem. In this case the problem is to distinguish samples drawn from \(L_{\mathbf{s},\mathcal {X}}\) and samples drawn from a uniform distribution on \(\mathbb {Z}_q^n\times \mathbb {Z}_q\). Typically, we are then interested in distinguishers with non-negligible advantage.

The parameters of an LWE instance are typically chosen with some internal relations. The prime *q* is chosen as a polynomial in *n*, and the discrete Gaussian distribution \(\mathcal {X}\) has mean zero and standard deviation \(\sigma =\alpha \cdot q\) for some small \(\alpha \). For example, in [31], Regev proposed to use parameters \(q\approx n^2\) and \(\alpha =1/(\sqrt{2\pi n}\cdot \log _2^2n)\).

### 1.1 Previous Work

A number of algorithms for solving the LWE problem have been given, using different approaches. As there is a strong connection to lattice problems, a direction for a subset of the algorithms has been to either rewrite the LWE problem as the problem of finding a short vector in a dual lattice, the Short Integer Solution (SIS) problem, or to solve the Bounded Distance Decoding (BDD) problem. Lattice reduction algorithms may be applied to solve these problems. Even though there has been a lot of research devoted to the study of lattice reduction algorithms, there still seems to be quite some uncertainty about the complexity and performance of such algorithms for higher dimensions.

Another very interesting approach was given by Arora and Ge in [5], where they proposed a novel algebraic approach to solve the LWE problem. The asymptotic complexity of this algorithm is subexponential when \(\sigma \le \sqrt{n}\), but fully exponential otherwise. The algorithm is mainly of asymptotic interest as applying it on specific instances gives higher complexity than other solvers.

Finally, much work has been done on combinatorial algorithms for solving LWE, all taking the famous Blum-Kalai-Wasserman (BKW) algorithm [7] as a basis. The BKW algorithm resembles the generalized birthday approach by Wagner [34] and was originally given as an algorithm for solving the LPN problem. These combinatorial algorithms have the advantage that their complexity can be analyzed in a standard way and we can get explicit values on the complexity for different instantiations of the LWE problem. Even though we use approximations in the analysis, the deviation between theoretical analysis and actual performance seems to be small [3, 17]. This approach tends to give algorithms with the best performance for some important parameter choices. A possible drawback with BKW-based algorithms is that they usually require a huge amount of memory, often of the same order as the time complexity. Some recent work in this direction is [1, 3, 17].

### 1.2 Motivation and Contributions

We know that the theoretical hardness of the LWE problem is well-established, through reductions to hard lattice problems [9, 30, 31]. This can be transferred to asymptotic statements on the security. In fact, most proposals of LWE-based cryptographic primitives rely only on asymptotics when arguing about security.

Time complexity comparison for solving various LWE instances.

n | q | \(\sigma \) | Complexity (\(\log _2 \# \mathbb {Z}_q\)) | |||
---|---|---|---|---|---|---|

This paper (Sect. 5) | Duc et al. [17] | |||||

Regev [31] | ||||||

128 | 16,411 | 11.81 | 84.5 | 95.0 | 61.6 | 61.9 |

256 | 65,537 | 25.53 | 145.1 | 178.7 | 175.5 | 174.5 |

512 | 262,147 | 57.06 | 287.6 | 357.5 | 386.8 | 518.6 |

Lindner & Peikert [25] | ||||||

128 | 2,053 | 2.70 | 69.7 | 83.7 | 54.5 | 57.1 |

256 | 4,099 | 3.34 | 123.8 | 154.2 | 156.2 | 151.2 |

512 | 4,099 | 2.90 | 209.2 | 271.8 | 341.9 | 424.5 |

But there is also a huge interest in studying the actual hardness of specific instances of the LWE problem. How does the choice of parameters \((n, q, \sigma )\) influence the complexity of solving LWE? What are the smallest parameters we can have and still achieve, say, 80-bit security?

Time complexity comparison for solving various binary-LWE instances.

We also apply the algorithm in a slightly modified form on the binary-LWE problem. The binary-LWE problem is the LWE problem when the secret vector \(\mathbf{s}\) is chosen uniformly from \(\{0,1\}^n\). In this case we have a huge improvement (see Table 2) in performance compared with other algorithms.

Tables 1 and 2 show comparisons of different algorithms for solving various LWE and binary-LWE instances, respectively. We compare the performance of the new algorithm with the previous best BKW variant (i.e., Duc et al. [17] for LWE or Albrecht et al. [3] for binary-LWE) and the estimates (under certain models [11, 25, 26, 29]) for distinguishing LWE (or binary-LWE) samples from uniform using lattice reduction algorithms, when LWE is reduced to SIS. The results consolidate the understanding that BKW is asymptotically efficient. For the toy LWE instances with \(n=128\), the SIS approach still beats all the BKW variants, including ours; but the recent variant has greatly narrowed the gap. The situation alters when the parameter *n* increases.

We also obtain a significant improvement (i.e., with a factor of more than \(2^{11}\) in time) on solving an LWE (136, 2003, 5.19)-instance, which first appeared in [29] and was then adopted as an example in [25], compared with the estimates in [1] that use the BDD approach.

Thus, we are close to a conclusion that, when choosing LWE instances for today’s cryptosystems (e.g., achieving an 80-bit or higher security level), thwarting of BKW-type attacks must be taken into consideration.

The remainder of the paper is organized as follows. In Sect. 2 we describe the basic theory around the LWE problem. We give a short description of the BKW algorithm in Sect. 3, and then present the novel modification in the next section. We detail the algorithm in Sect. 5, analyze its complexity in Sect. 6, and then propose a variant for binary-LWE in Sect. 7. This is followed by the sections of implementation and results. We finally concludes this paper in Sect. 10.

## 2 Background

On an *n*-dimensional Euclidean space \({\mathbb R}^n\), the intuitive notion of length of a vector \(\mathbf{x} = (x_1, x_2, \ldots , x_n)\) is captured by the \(L_2\)-norm; \(||\mathbf{x}|| = \sqrt{x_1^2 + \cdots + x_n^2}\). The Euclidean distance between two vectors \(\mathbf{x}\) and \(\mathbf{y}\) in \({\mathbb R}^n\) is defined as \(||\mathbf{x}-\mathbf{y}||\). For a given set of vectors \(\mathcal{L}\), the minimum mean square error (MMSE) estimator assigns each vector in \({\mathbb R}^n\) to the vector \(\mathbf{l}\in \mathcal{L}\) such that \(||\mathbf{x}-\mathbf{l}||\) is minimized. Let us shortly introduce the discrete Gaussian distribution.

### 2.1 Discrete Gaussian Distribution

Let \(x\in \mathbb {Z}\). The discrete Gaussian distribution on \(\mathbb {Z}\) with mean 0 and variance \(\sigma ^2\), denoted \(D_{\mathbb {Z},\sigma }\), is the probability distribution obtained by assigning a probability proportional to \(\exp (-x^2/2\sigma ^2)\) to each \(x\in \mathbb {Z}\). The \(\mathcal {X}\) distribution^{1} with variance \(\sigma ^2\) is the distribution on \(\mathbb {Z}_q\) obtained by folding \(D_{\mathbb {Z},\sigma }\,\text {mod}\,q\), i.e., accumulating the value of the probability mass function over all integers in each residue class *mod* *q*. Similarly, we define the discrete Gaussian over \(\mathbb {Z}^{n}\) with variance \(\sigma ^2\), denoted \(D_{\mathbb {Z}^{n},\sigma }\), as the product distribution of *n* independent copies of \(D_{\mathbb {Z},\sigma }\).

In general, the discrete Gaussian distribution does not exactly inherit the usual properties from the continuous case, but in our considered cases it will be close enough and we will use properties from the continuous case, as they are approximately correct. For example, if *X* is drawn from \(\mathcal {X}_{\sigma _1}\) and *Y* is drawn from \(\mathcal {X}_{\sigma _2}\), then we consider \(X+Y\) to be drawn from \(\mathcal {X}_{\sqrt{\sigma _1^2+\sigma _2^2}}\). This follows the path of previous work [1].

A central point in cryptanalysis is to estimate the number of samples required to distinguish between two distributions, in our case the uniform distribution on \(\mathbb {Z}_q\) and \(\mathcal {X}_{\sigma }\). The solution to this distinguishing problem leads to an efficient key recovery: we assume that for a right guess, the observed symbol is \(\mathcal {X}_{\sigma }\) distributed; otherwise, it is uniformly random. Thus, we need to distinguish the secret from *Q* candidates. We follow the theory from linear cryptanalysis [6] (also similar to that in correlation attacks [15]), that the number *M* of required samples to test is about \(\mathcal {O}\left( \frac{\ln (Q)}{\varDelta (\mathcal {X}_{\sigma } \Vert U)}\right) ,\) where \(\varDelta (\mathcal {X}_{\sigma }\Vert U)\) is the divergence^{2} between \(\mathcal {X}_{\sigma }\) and the uniform distribution *U* in \(\mathbb {Z}_q\).

### 2.2 LWE Problem Description

*m*samples from the LWE distribution \(L_{\mathbf{s},\mathcal {X}}\) and the response is denoted as

*n*columns are linearly independent and form the matrix \(\mathbf {A_0}\). Define \(\mathbf {D} = \mathbf {A_0}^{-1}\). With a change of variables \({ \hat{\mathbf {s}}} = \mathbf {s} \mathbf {D}^{-1}-( z_1,z_2,\ldots , z_n ) \) we get an equivalent problem described by \(\mathbf {\hat{A}}=( \mathbf {I}, \hat{{\mathbf {a}}}_{n+1}^{\tiny \text {T}}, \hat{{\mathbf {a}}}_{n+2}^{\tiny \text {T}}, \cdots , \hat{{\mathbf {a}}}_m^{\tiny \text {T}})\), where \(\mathbf {\hat{A}} =\mathbf {D}\mathbf {A}\). We compute

### 2.3 Lattice Codes and Construction A

A lattice \(\varLambda \) is a discrete additive subgroup of \({\mathbb R}^n\). Reformulated, \(\varLambda \) is a lattice iff there are linearly independent vectors \(\mathbf{v}_1,\ldots , \mathbf{v}_m\in {\mathbb R}^n\), such that any \(\mathbf{y}\in \varLambda \) can be written as \(\mathbf{y}=\sum _{i=1}^m \alpha _i \mathbf{v}_i\), where \(\alpha _i\in {\mathbb Z}\). The set \(\mathbf{v}_1,\ldots ,\mathbf{v}_m\) is called a basis for \(\varLambda \). A matrix whose columns are these vectors is said to be a generator matrix for \(\varLambda \).

*q*-ary linear codes. If \(\mathcal {C}\) is a linear [

*N*,

*k*] code over the alphabet of size

*q*, where

*q*is a prime, then a lattice over this code is

*q*-ary lattice associated with \(\mathcal {C}\).

*q*-ary random linear codes [18, 27, 35].

## 3 The BKW Algorithm

The BKW algorithm was proposed by Blum et al. [7] and was originally targeting the LPN problem. However, it is trivially adopted also to the LWE problem.

As with Wagner’s generalized birthday algorithm, the BKW approach uses an iterative collision procedure on the columns in the generator matrix \({\mathbf {A}}\), which step by step reduces the dimension of \(\mathbf {A}\). Summing together columns that collide in some subset of positions and keeping them as columns in a new matrix reduces the dimension but increases the size of the noise.

*b*entries. Assume that one finds two columns \({\mathbf {a}^{\tiny \text {T}}_{i_1}}, \mathbf {a}^{\tiny \text {T}}_{i_2}\) such that

There are different approaches to realizing the above merging procedure. We consider the approach called LF1 in [24], which computes the difference between one fixed column and any other column with the same last *b* entries (in absolute value), and forwards this to the next BKW step.

*m*is the number of columns in \(\mathbf {A}\), then we have the number of columns in \(\mathbf {A}_2\) to be \(m_2 = m-\frac{q^b-1}{2}\). Hence, using the LF1 approach, the number of samples (columns) forwarded to the next step of BKW is slowly decreasing (by \(\frac{q^b-1}{2}\) for each step). It is known from simulation, that the LF2 approach [24] which gives more surviving samples, performs well and could be chosen in an actual implementation.

*b*entries of columns in \(\mathbf {A}_2\) are all zero. In connection to this matrix, the vector of observed symbols is

We then iterate the same for \(i=2,3,\ldots , t\), picking a new collision set of size \(\frac{q^b-1}{2}\) and finding colliding columns in \(\mathbf {A}_i\), giving new vectors with an additional *b* entries being zero, forming the columns of \(\mathbf {A}_{i+1}\). Repeating the same procedure an additional \(t-2\) times will reduce the number of unknowns in the secret vector \(\mathbf{s}\) to \(n-bt\) in the remaining problem.

*t*BKW steps the noise connected to each column is of the form

Altogether we have reduced the LWE instance to a smaller instance, where now the length of the secret vector is \(n'=n-tb\), but the noise has variance \(2^t\cdot \sigma ^2\). The remaining unknown part of the secret vector \(\mathbf{s}\) is guessed (a total of \(q^{n-tb}\)) and for each guess we check through a hypothesis test whether the remaining samples follow the Gaussian distribution. The number of remaining samples is at least \(m-t\cdot \frac{q^b-1}{2}\).

Note that there is an improved version of BKW using lazy modulus reduction [3] and the very recent improvement in [17].

## 4 A Modified BKW Algorithm for the LWE Problem

The new algorithm we propose uses the same structure as the BKW algorithm. The new idea involves changing the BKW step to a more advanced step that can remove more positions in the treated vectors at the expense of leaving an additional noise term.

We introduce some additional notation. For the index set *I*, we make use of \(\mathbf {v}_I\) to denote the vector with entries indexed by *I*. Alternatively, we utilize the symbol \(\mathbf {v}_{[1,\ldots ,n]}\) to denote the vector containing the first *n* entries of \(\mathbf {v}\), etc.

### 4.1 A New BKW Step

Recall the BKW step, taking a large number of vectors \(\mathbf{a}_i\) and trying to collide them in a set of positions determined by an index set *I*. This part of the vector \(\mathbf{a}\) is written as \(\mathbf{a}_I\). The size of the collision set (\(\frac{q^b-1}{2}\)) and the number of vectors have to be of the same order, which essentially determines the complexity of the BKW algorithm, as the number of steps we can perform is determined by the variance of the noise.

We propose to do the BKW step in a different manner. Assuming that we are considering step *i* in the BKW process, we fix a *q*-ary linear code with parameters \((N_i,b)\), called \(\mathcal {C}_i\). The code gives rise to a lattice code. Now, for any given vector \(\mathbf {a}_I\) as input to this BKW step, we approximate the vector by one of the codewords in the code \(\mathcal {C}_i\).

Each vector \(\mathbf {a}_I\) is then sorted according to which codeword it was mapped to. Altogether, there are \(q^b\) possible codewords. Finally, generate new vectors for the next BKW step by subtracting vectors mapped to the same codeword (or adding to the zero codeword).

*t*BKW steps of this kind. In step

*i*we have removed \(N_i\) positions, so in total we have now removed \(\sum _{i=1}^t N_i\) positions (\(N_i\ge b\)). The received samples are created from summing \(2^t\) original samples, so after guessing the remaining symbols in the secret vector and adjusting for its contribution, a received symbol

*z*can be written as a sum of noise variables,

*h*of the modified BKW algorithm. Note that on one position

*i*, at most one error term \(E_i^{(h)}\) is non-zero.

We observe that noise introduced in early steps is increased exponentially in the remaining steps, so the procedure will use a sequence of codes with decreasing rate. In this way the error introduced in early steps will be small and then it will eventually increase.

### 4.2 Analyzing the Error Distribution

There are many approaches to estimating the error distribution introduced by coding. The simplest way is just assuming that the value is a summation of several independent discrete Gaussian random variables. This estimation is easily performed and fairly accurate. A second approach is to compute the error distribution accurately (to sufficient precision) by computer. We should note that the error distribution is determined from the linear code employed. We now rely on some known result on lattice codes to provide a good estimate on the size of the noise introduced by coding.

We assume that the error vector \({\mathbf {e}}\) introduced by the coding technique remains discrete Gaussian, and their summation is discrete Gaussian as well, just as in previous research. As the error is distributed symmetrically we should estimate the value \(\mathrm {E}[||\mathbf {e}|| ^2]\) to bound the effect of the error, where \(\mathbf {e}\) is the error vector distributed uniformly on the integer points inside the fundamental region \(\mathcal {V}\) of the lattice generated by Construction A.

*N*,

*k*] linear code as \(G(\varLambda _{N,k})\).

Numerical evaluations on 1 / *G*

| 631 | 2053 | 16411 | |||||
---|---|---|---|---|---|---|---|---|

code | [2,1] | [3,1] | [4,1] | [2,1] | [3,1] | [4,1] | [2,1] | [3,1] |

\(\mathrm {E}[||\mathbf {e}||^2]\) | \(101.26^{\dag }\) | 1277.31 | 4951.53 | \(329.24^{\dag }\) | 6185.67 | 29107.73 | \(2631.99^{\dag }\) | 99166.25 |

1 / | 12.46 | 12.71 | 12.80 | 12.47 | 12.65 | 12.78 | 12.47 | 12.62 |

We have numerically tested the smallest possible variance of errors introduced by coding, given several small sizes of *N*, *k* and *q*, (e.g., [*N*, *k*] is [3, 1] or [2, 1], *q* is 631, 2053 or 16411) and verified that the above estimation works (see Table 3, where 1 / *G* is bounding \(1{/}G(\varLambda _{N,k})\)). We choose [*N*, 1] codes since for the covering or MMSE property, lower rate means worse performance.

It is folklore that the value *G* will decrease when the dimension and length becomes larger, and all the cases listed in Table 3 fully obey the rule. Thus we believe that we may have even better performance when employing a more complicated code for a larger problem. Actually, the values without a \(\dag \) sign in Table 3 is computed using randomly chosen linear codes, and they still outperform our estimation greatly. This observation fits the theory well that when the dimension *n* is large, a random linear code may act nearly optimally.

From Eq. (6) we know the variance of the error term from the coding part. Combining this with Eq. (5), we get an estimation of the variance of the total noise for the samples that we create after *t* modified BKW steps.

### 4.3 Decoding Method and Constraint

Here we discuss details of syndrome decoding and show that the additional cost is under control. Generally, we characterize the employed [*N*, *k*] linear code by a systematic generator matrix \(\mathbf {M} = \begin{bmatrix}\mathbf {I}\,\mathbf {F}'\end{bmatrix}_{k\times N}\). Thus, a corresponding parity-check matrix \(\mathbf {H} = \begin{bmatrix} \mathbf {F} '^{\tiny \text {T}}\, \mathbf {I}\end{bmatrix}_{(N-k) \times N}\) is directly obtained.

The syndrome decoding procedure is described as follows. (1) We construct a constant-time query table containing \(q^{N-k}\) items, in each of which we store the syndrome and its corresponding error vector with minimum Euclidean distance. (2) When the syndrome is computed, by checking the table, we locate its corresponding error vector and add them together, thereby yielding the desired nearest codeword.

We generalize the method in [22] to the non-binary case \(\mathbb {Z}_q\) for computing the syndrome efficiently. Starting by sorting the vectors \(\mathbf {a}_{I}\) by the first *k* entries, we then partition them accordingly; thus there are \(q^k\) partitions denoted \(\mathcal {P}_j\), for \(1\le j\le q^k\). We can read the syndrome from its last \(N-k\) entries directly if the vector \(\mathbf {a}_{I}\) belongs to the partition with the first *k* entries all zero. Then we operate inductively. If we know one syndrome, we can compute another one in the same partition within \(2(N - k)\) \(\mathbb {Z}_q\) operations, or compute one in a different partition whose first *k* entries with distance 1 from that in the known partition within \(3(N - k)\) \(\mathbb {Z}_q\) operations. Suppose we have \(m_{dec}\) vectors to decode here (generally, the value \(m_{dec}\) is larger than \(q^k\)), then the complexity of this part is bounded by \((N-k)(2 m_{dec} +q^k)<3m_{dec}(N-k)\). Since the cost of adding error vectors for the codewords is \(m_{dec}N\), we can give an upper bound for the decoding cost, which is roughly \(4m_{dec}N\).

**Concatenated Constructions.** The drawback of the previous decoding strategy is that a large table is required to be stored with size exponential in \(N-k\). On the other hand, there is an inherent memory constraint, i.e., \(\mathcal {O}\left( q^b\right) \), when the size *b* is fixed, which dominates the complexity of the BKW-type algorithm.

We make use of a narrow sense concatenated code defined by direct summing several smaller linear codes to simplify the decoding procedure, when the decoding table is too large. This technique is not favored in coding theory since it diminishes the decoding capability, but it works well for our purpose.

## 5 Algorithm Description

### 5.1 Gaussian Elimination

The goal of this step is to transform the distribution of secret vector \(\mathbf{s}\) to be that of the error (c.f. [4, 23] for similar ideas). We refer to the full version for details on deriving the complexity of this step.

### 5.2 Standard BKW Reductions

The previously described coded-BKW in Sect. 4 introduces noise that grows with each iteration, so it makes sense to start with a number of pure BKW reductions. We start by performing \(t_1\) standard BKW steps to balance the two noise parts, i.e., the noise increased by merging and the noise introduced by coding. This step zeros out the bottom \(t_1 \cdot b\) bits. We now explain the details.

Given the output of the Gaussian elimination, i.e., \(\hat{\mathbf {z}}\) and \(\hat{\mathbf {A}}=(\mathbf{I} \mathbf {L}_0)\), we process only on the non-systematic part of \(\hat{\mathbf {A}}\), denoted by \(\mathbf {L}_0\). Similar as the other BKW procedures [7], in each step we sort the vector by the last *b* unprocessed entries and thus divide the total samples into at most \(\frac{q^b-1}{2}\) classes. Then, we merge (adding or subtracting) those in the same class to zero the considered *b* entries, forming new samples as the input to the next BKW step, \(\mathbf {L}_1, \mathbf {L}_2,\) etc.

### 5.3 Coded-BKW Reductions

*q*-ary linear code is utilized. Here various rates are employed to equalize the error contribution per dimension. The code length \(N_i\) in the \((t_2-i+1)\)th coded-BKW step is a function of a preset variance value \(\sigma _{set}^2\) which is determined by the error level introduced by the codes utilized in the last phase — subspace hypothesis testing. We know that in the final error expression there are \(2^{t_2-i+1}\) error terms from the

*i*-th coded BKW step. Thus, we have the following equation,

*M*. Following Sect. 4.3, the decoding cost is upper bounded by

### 5.4 Partial Guessing

The previous step outputs samples with smaller dimension but higher noise variance. In order to deal with the remaining unknowns in the secret \(\mathbf{\hat{s}}\) vector, we use a combination of testing all values by guessing and performing a hypothesis test using an FFT.

*d*; thus there are \((2d + 1)^{n_{top}}\) candidates. Thus, the complexity of this step is just that of updating the observed symbol, i.e.,

### 5.5 Subspace Hypothesis Testing

Here we generalize the subspace hypothesis testing technique first proposed in [22] to \(\mathbb {Z}_q\) case, and then combine with Fast Fourier Transform to calculate the occurrences of different symbols in \(\mathbb {Z}_q\) efficiently. This information would yield an optimal distinguisher with a small additional cost.

The calculation of the polynomial \(H_{\mathbf {y}}(X)\) can be accelerated by Fast Fourier Transform. Let \(\omega \) be a primitive *q*-th root of unity in the complex field \(\mathbb {C}\). We can interpolate the polynomial \(H_{\mathbf {y}}(X)\) if we know its *q* values at the *q* different points \((1, \omega , \omega ^2,\ldots , \omega ^{q-1})\) with complexity about \(\mathcal {O}\left( q\log _2(q)\right) \). Thus, the problem is transformed to a polynomial evaluation problem.

We first evaluate \(q^l\) polynomials \(h_{\mathbf {u}}(X)\) on *q* different points \((1, \omega , \omega ^2,\ldots , \omega ^{q-1})\) with the complexity \(\mathcal {O}\left( q^l \cdot q\log _2q\right) \). Then with these values stored, we can evaluate the polynomial \(H_{\mathbf {y}}(X)\) using *q* FFTs, each of which costs \(\mathcal {O}\left( q^l \log _2(q^l)\right) \).

If the symbol occurrences are known, then we obtain the belief levels of all the candidates using a Neyman-Pearson test [15]. We choose the one with the highest rank and output it. This testing adds \(\mathcal {O}\left( q^{l+1}\right) \) \(\mathbb {Z}_q\)-operations. Similar to that in the LPN case [22], recovering the remaining information can be done by iteratively employing this procedure to solve smaller LWE instances whose complexity is negligible compared to that of knowing the first part.

## 6 Analysis of the New Approach for BKW

We denote by *P*(*d*) the probability that the absolute value of one guessed symbol \(\hat{s}_i\) is smaller than *d*, where \(\hat{s}_i \mathop {\leftarrow }\limits ^{\text {\$}}\mathcal {X}_{\sigma }\). Here we obtain a lower bound of *P*(*d*) by ignoring the folding feature of the distribution as \(P(d)> {{\mathrm{erf}}}(\frac{d}{\sqrt{2}\sigma })\), where \({{\mathrm{erf}}}\) is the error function \({{\mathrm{erf}}}(x)=\frac{2}{\sqrt{\pi }}\int _0^xe^{-t^2}dt\).

In the testing step, we preset a noise level \(\gamma ^2 \sigma ^2\sigma _{set}^2n_{tot}\) to be the variance of the noise introduced by coding, and then compute the required number of samples to perform a successful distinguishing. The process may fail if the size of the information subvector to be tested, denoted \(\hat{\mathbf {s}}_{test}\), is too large to distinguish. Thus we need a new notion, \(P_{test}\), to denote the probability that the Euclidean length of \(\hat{\mathbf {s}}_{test}\) is less than a preset value \(\gamma \sqrt{n_{tot}}\sigma \). Using the following lemma from [28], which is a tail bound on discrete Gaussians, we can upper bound the failure probability by \((\gamma e^{\frac{1-\gamma ^2}{2}})^{n_{tot}}\).

### **Lemma 1**

For any \(\gamma \ge 1\), \(\Pr [||\mathbf {v}|| > \gamma \sigma \sqrt{n}; \mathbf {v} \mathop {\leftarrow }\limits ^{\text {\$}}D_{\mathbb {Z}^n,\sigma }]< (\gamma e^{\frac{(1-\gamma ^2)}{2}})^n\).

Later we set the value \(\gamma \) to be 1.2. Then, the estimated success probability is larger than \(97.5\,\%\) in most of the applications. We summarize our findings in the following theorem.

### **Theorem 1**

**(The Complexity of Algorithm 1).**Let \((n, q, \sigma )\) be the parameters of the chosen LWE instance. Let \(d, t_1, t_2, b, l, n_{test}\) be algorithm parameters. The number of \(\mathbb {Z}_q\) operations required for a successful run of the new attack is

*M*for testing is set to be

^{3}

*U*is the uniform distribution in \(\mathbb {Z}_q\) and \(\sigma _{final}^2 = 2^{t_1 + t_2}\sigma ^2 + \gamma ^2 \sigma ^2 \sigma _{set}^2 n_{tot}.\) Thus, the number of calls to the LWE oracle is

### *Proof*

The cost for one iteration is \(C_0 + C_1 + C_2 + C_3 + C_4\), which should be divided by its expected success probability \((P(d))^{n_{top}}\cdot P_{test}\).

## 7 A Variant of Coded-BKW for Binary-LWE

We can derive an efficient algorithm for binary-LWE by modifying certain steps accordingly. First, the distribution of the information vector is already of small size in \(\mathbb {Z}_q\); therefore we skip the Gaussian elimination step. In addition, since the prime *q* is a relatively large symbol, it is beneficial to replace the step of the FFT hypothesis testing by a simple step exhausting all the combinations of the top \(n_{top}\) entries, which are uniformly chosen from the binary set \(\{0,1\}^{n_{top}}\). The variant is similar to Algorithm 1, so we omit it here and refer the interested reader to the full version for details.

## 8 Simulation

We have performed simulations to support our theoretical results. A simulation with parameters \(\left( q,\sigma ,\#samples\right) =\left( 2053,2.70,2^{25}\right) \) is shown in Fig. 1, plotting the number of eliminated rows vs. \(\log _2\) of the variance of the samples errors. Four standard 2-row BKW steps were used initially, followed by three iterations each of [3,2]-, [4,2]-, [5,2]- and [6,2]-coding steps. The dashed horizontal line shows the variance of the corresponding uniform distribution (variance roof) of the errors, setting an upper bound for variance in simulations. The four curves show the performances of 2-step BKW (theoretical), theoretical coded-BKW (according to Sect. 6), coded-BKW simulation, and coded-BKW simulation when employing the unnatural selection heuristic (see [3]).

It is clear that coded-BKW significantly outperforms plain BKW. Furthermore, it can be seen that the developed theoretical estimations for coded-BKW very closely match actual simulation performance.

## 9 Summary of Results

We now present numerical results, as shown in Tables 1 and 2, using the new algorithms to solve the LWE and binary-LWE problems for various parameter settings, including instances from Regev’s cryptosystem [31] or from Lindner and Peikert’s paper [25]. As in [17], we consider operations over \(\mathbb {C}\) to have the same complexity as the operation in \(\mathbb {Z}_q\), and set \(C_{FFT}\) to be 1, which is the best we can obtain for an FFT. We also set \(\gamma = 1.2\) and \(d = 3\sigma \).

As in [1], we apply the new method to the instances proposed in a somewhat homomorphic encryption scheme [2], which can be considered as LWE instances using linearization. Our method yields substantial improvements in all cases and especially solves an instance with the number of variables in the linearized system \(n=153\) (targeting 128-bit security [1]), in about \(2^{119}\) bit operations, thereby breaking the claimed security level.

We present here additional information about the comparisons in Tables 1 and 2. Firstly, only the new algorithms and the algorithm proposed in [17] are key-recovery attacks; all the others belong to the class of distinguishing attacks. Secondly, the counterpart proposed by Albrecht et al. [3] is the version without unnatural selection, since we can also improve our algorithm by this heuristic. Thus, we accelerate the BKW-type binary-LWE solver by a factor of almost \(2^{20}\), for the toy instance \(n=128\) in Regev’s parameter setting. Last, we adopt the estimating model in [1, 3] using data from the implementations in [11, 25, 26] to evaluate the performance of the lattice reduction distinguisher, when LWE is reduced to SIS. We refer the interested readers to these two papers for details.

When reducing LWE to BDD, also named “Decode” in [25], Lindner and Peikert reported the running time of this attack on two LWE instances. Albrecht et al. [1] multiplied the time by the clock speed of the CPU used, compared with their BKW variant, and finally reached the conclusion that the BDD approach would yield substantially lower complexity. Specifically, their estimation about this “Decode” approach on one (with parameter (136, 2003, 5.19)) of the two instances is about \(2^{91.4}\) \(\mathbb {Z}_q\) operations. We obtain a much better time complexity of about \(2^{80.6}\) operations over \(\mathbb {Z}_q\), when applying Algorithm 1 to this instance.

As Ring-LWE is a sub-problem of LWE, the new algorithm can be employed to attack some recent Ring-LWE-based cryptosystems [16, 21, 32]. We solve the underlying Ring-LWE (256, 7681, 4.51) and Ring-LWE (512, 12289, 4.86) instantiations in \(2^{123}\) and \(2^{225}\) bit-operations, respectively, thereby breaking the claimed 128-bit and 256-bit security levels.

## 10 Conclusion

We have proposed a new algorithm to solve the LWE problem by modifying the steps of the BKW algorithm using lattice codes. Our algorithm outperforms the previous BKW variants for all instantiations we considered and also all the lattice reduction approaches from some size of instances and onwards. To the best of our knowledge, it is the best LWE solver when the dimension *n* is large enough and it seems to cover the choices of today’s and future security levels. Another application is that it outperforms all the other approaches drastically on the binary-LWE problem.

## Footnotes

- 1.
It is also denoted \(\mathcal {X}_{\sigma }\), and we omit \(\sigma \) if there is no ambiguity.

- 2.
Divergence has a couple of aliases in literature: relative entropy, information divergence, Kullback-Leibler divergence, etc. We refer the interested reader to [15] for the rigorous definition. In this paper, the divergence \(\varDelta (\mathcal {X}_{\sigma } \Vert U)\) will be computed numerically.

- 3.

## References

- 1.Albrecht, M.R., Cid, C., Faugère, J.C., Fitzpatrick, R., Perret, L.: On the Complexity of the BKW Algorithm on LWE. Desig. Codes Crypt.
**74**, 1–30 (2013)Google Scholar - 2.Albrecht, M.R., Farshim, P., Faugère, J.-C., Perret, L.: Polly cracker, revisited. In: Lee, D.H., Wang, X. (eds.) ASIACRYPT 2011. LNCS, vol. 7073, pp. 179–196. Springer, Heidelberg (2011) CrossRefGoogle Scholar
- 3.Albrecht, M.R., Faugère, J.-C., Fitzpatrick, R., Perret, L.: Lazy modulus switching for the BKW algorithm on LWE. In: Krawczyk, H. (ed.) PKC 2014. LNCS, vol. 8383, pp. 429–445. Springer, Heidelberg (2014) CrossRefGoogle Scholar
- 4.Applebaum, B., Cash, D., Peikert, C., Sahai, A.: Fast cryptographic primitives and circular-secure encryption based on hard learning problems. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 595–618. Springer, Heidelberg (2009) CrossRefGoogle Scholar
- 5.Arora, S., Ge, R.: New algorithms for learning in presence of errors. In: Aceto, L., Henzinger, M., Sgall, J. (eds.) ICALP 2011, Part I. LNCS, vol. 6755, pp. 403–415. Springer, Heidelberg (2011) CrossRefGoogle Scholar
- 6.Baignères, T., Junod, P., Vaudenay, S.: How far can we go beyond linear cryptanalysis? In: Lee, P.J. (ed.) ASIACRYPT 2004. LNCS, vol. 3329, pp. 432–450. Springer, Heidelberg (2004) CrossRefGoogle Scholar
- 7.Blum, A., Kalai, A., Wasserman, H.: Noise-tolerant learning, the parity problem, and the statistical query model. J. ACM
**50**(4), 506–519 (2003)MathSciNetCrossRefGoogle Scholar - 8.Brakerski, Z.: Fully homomorphic encryption without modulus switching from classical GapSVP. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 868–886. Springer, Heidelberg (2012) CrossRefGoogle Scholar
- 9.Brakerski, Z., Langlois, A., Peikert, C., Regev, O., Stehlé, D.: Classical hardness of learning with errors. In: Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, STOC 2013, pp. 575–584. ACM (2013)Google Scholar
- 10.Brakerski, Z., Vaikuntanathan, V.: Efficient fully homomorphic encryption from (standard) LWE. In: Proceedings of the 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, pp. 97–106. IEEE Computer Society (2011)Google Scholar
- 11.Chen, Y., Nguyen, P.Q.: BKZ 2.0: better lattice security estimates. In: Lee, D.H., Wang, X. (eds.) ASIACRYPT 2011. LNCS, vol. 7073, pp. 1–20. Springer, Heidelberg (2011) CrossRefGoogle Scholar
- 12.Cohen, G., Honkala, I., Litsyn, S., Lobstein, A.: Covering Codes, vol. 54. Elsevier, Amsterdam (1997) zbMATHGoogle Scholar
- 13.Conway, J., Sloane, N.: Voronoi regions of lattices, second moments of polytopes, and quantization. IEEE Trans. Inf. Theory
**28**(2), 211–226 (1982)MathSciNetCrossRefzbMATHGoogle Scholar - 14.Conway, J.H., Sloane, N.J.A., Bannai, E., Leech, J., Norton, S., Odlyzko, A., Parker, R., Queen, L., Venkov, B.: Sphere Packings, Lattices and Groups, vol. 3. Springer, New York (1993) CrossRefzbMATHGoogle Scholar
- 15.Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2012) Google Scholar
- 16.De Clercq, R., Roy, S.S., Vercauteren, F., Verbauwhede, I.: Efficient software implementation of Ring-LWE Encryption. In: Design, Automation and Test in Europe (DATE 2015) (2015)Google Scholar
- 17.Duc, A., Tramèr, F., Vaudenay, S.: Better algorithms for LWE and LWR. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 173–202. Springer, Heidelberg (2015) Google Scholar
- 18.Erez, U., Litsyn, S., Zamir, R.: Lattices which are good for (almost) everything. IEEE Trans. Inf. Theory
**51**(10), 3401–3416 (2005)MathSciNetCrossRefzbMATHGoogle Scholar - 19.Gentry, C.: A fully homomorphic encryption scheme. Ph.D. thesis, Stanford University (2009)Google Scholar
- 20.Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with errors: conceptually-simpler, asymptotically-faster, attribute-based. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013, Part I. LNCS, vol. 8042, pp. 75–92. Springer, Heidelberg (2013) CrossRefGoogle Scholar
- 21.Göttert, N., Feller, T., Schneider, M., Buchmann, J., Huss, S.: On the design of hardware building blocks for modern lattice-based encryption schemes. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, pp. 512–529. Springer, Heidelberg (2012) CrossRefGoogle Scholar
- 22.Guo, Q., Johansson, T., Löndahl, C.: Solving LPN using covering codes. In: Sarkar, P., Iwata, T. (eds.) ASIACRYPT 2014. LNCS, vol. 8873, pp. 1–20. Springer, Heidelberg (2014) Google Scholar
- 23.Kirchner, P.: Improved generalized birthday attack. Cryptology ePrint Archive, Report 2011/377 (2011)Google Scholar
- 24.Levieil, É., Fouque, P.-A.: An improved LPN algorithm. In: De Prisco, R., Yung, M. (eds.) SCN 2006. LNCS, vol. 4116, pp. 348–359. Springer, Heidelberg (2006) CrossRefGoogle Scholar
- 25.Lindner, R., Peikert, C.: Better key sizes (and Attacks) for LWE-based encryption. In: Kiayias, A. (ed.) CT-RSA 2011. LNCS, vol. 6558, pp. 319–339. Springer, Heidelberg (2011) CrossRefGoogle Scholar
- 26.Liu, M., Nguyen, P.Q.: Solving BDD by enumeration: an update. In: Dawson, E. (ed.) CT-RSA 2013. LNCS, vol. 7779, pp. 293–309. Springer, Heidelberg (2013) CrossRefGoogle Scholar
- 27.Loeliger, H.A.: Averaging bounds for lattices and linear codes. IEEE Trans. Inf. Theory
**43**(6), 1767–1773 (1997)MathSciNetCrossRefzbMATHGoogle Scholar - 28.Lyubashevsky, V.: Lattice signatures without trapdoors. In: Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012. LNCS, vol. 7237, pp. 738–755. Springer, Heidelberg (2012) CrossRefGoogle Scholar
- 29.Micciancio, D., Regev, O.: Lattice-based cryptography. In: Bernstein, D.J., Buchmann, J., Dahmen, E. (eds.) Post-Quantum Cryptography, pp. 147–191. Springer, Berlin Heidelberg (2009)CrossRefGoogle Scholar
- 30.Peikert, C.: Public-key cryptosystems from the worst-case shortest vector problem. In: Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing, pp. 333–342. ACM (2009)Google Scholar
- 31.Regev, O.: On lattices, learning with errors, random linear codes, and cryptography. J. ACM
**56**(6), 34:1–34:40 (2009)MathSciNetCrossRefGoogle Scholar - 32.Roy, S.S., Vercauteren, F., Mentens, N., Chen, D.D., Verbauwhede, I.: Compact Ring-LWE cryptoprocessor. In: Batina, L., Robshaw, M. (eds.) CHES 2014. LNCS, vol. 8731, pp. 371–391. Springer, Heidelberg (2014) Google Scholar
- 33.Selçuk, A.A.: On probability of success in linear and differential cryptanalysis. J. Crypt.
**21**(1), 131–147 (2008)CrossRefzbMATHGoogle Scholar - 34.Wagner, D.: A generalized birthday problem. In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, p. 288. Springer, Heidelberg (2002) CrossRefGoogle Scholar
- 35.Zamir, R., Feder, M.: On lattice quantization noise. IEEE Trans. Inf. Theory
**42**(4), 1152–1159 (1996)CrossRefzbMATHGoogle Scholar