In this section, we present our main algorithm, prove its asymptotical complexity, and present practical results in dimension \(n=128\).
3.1 Rationale
A natural idea in order to distinguish between an instance of \(\mathsf {LWE}\) (or \(\mathsf {LPN}\)) and a uniform distribution is to select some k samples that add up to zero, yielding a new sample of the form \((\mathbf {0},e)\). It is then enough to distinguish between e and a uniform variable. However, if \(\delta \) is the bias of the error in the original samples, the new error e has bias \(\delta ^{k}\), hence roughly \(\delta ^{-2k}\) samples are necessary to distinguish it from uniform. Thus it is crucial that k be as small a possible.
The idea of the algorithm by Blum, Kalai and Wasserman BKW is to perform “blockwise” Gaussian elimination. The n coordinates are divided into k blocks of length \(b = n/k\). Then, samples that are equal on the first b coordinates are substracted together to produce new samples that are zero on the first block. This process is iterated over each consecutive block. Eventually samples of the form \((\mathbf {0},e)\) are obtained.
Each of these samples ultimately results from the addition of \(2^{k}\) starting samples, so k should be at most \(\mathcal {O}(\log (n))\) for the algorithm to make sense. On the other hand \(\mathrm {\varOmega }(q^{b})\) data are clearly required at each step in order to generate enough collisions on b consecutive coordinates of a block. This naturally results in a complexity roughly \(2^{(1 + o(1))n/\log (n)}\) in the original algorithm for \(\mathsf {LPN}\). This algorithm was later adapted to \(\mathsf {LWE}\) in [3], and then improved in [4].
The idea of the latter improvement is to use so-called “lazy modulus switching”. Instead of finding two vectors that are equal on a given block in order to generate a new vector that is zero on the block, one uses vectors that are merely close to each other. This may be seen as performing addition modulo p instead of q for some \(p < q\), by rounding every value \(x \in \mathbb {Z}_{q}\) to the value nearest xp / q in \(\mathbb {Z}_{p}\). Thus at each step of the algorithm, instead of generating vectors that are zero on each block, small vectors are produced. This introduces a new “rounding” error term, but essentially reduces the complexity from roughly \(q^{b}\) to \(p^{b}\). Balancing the new error term with this decrease in complexity results in a significant improvement.
However it may be observed that this rounding error is much more costly for the first few blocks than the last ones. Indeed samples produced after, say, one iteration step are bound to be added together \(2^{a-1}\) times to yield the final samples, resulting in a corresponding blowup of the rounding error. By contrast, later terms will undergo less additions. Thus it makes sense to allow for progressively coarser approximations (i.e. decreasing the modulus) at each step. On the other hand, to maintain comparable data requirements to find collisions on each block, the decrease in modulus is compensated by progressively longer blocks.
What we propose here is a more general view of the BKW algorithm that allows for this improvement, while giving a clear view of the different complexity costs incurred by various choice of parameters. Balancing these terms is the key to finding an optimal complexity. We forego the “modulus switching” point of view entirely, while retaining its core ideas. The resulting algorithm generalizes several variants of BKW, and will be later applied in a variety of settings.
3.2 Quantization
The goal of quantization is to associate to each point of \(\mathbb {R}^k\) a center from a small set, such that the expectancy of the distance between a point and its center is small. We will then be able to produce small vectors by substracting vectors associated to the same center.
Modulus switching amounts to a simple quantizer which rounds every coordinate to the nearest multiple of some constant. Our proven algorithm uses a similar quantizer, except the constant depends on the index of the coordinate.
It is possible to decrease the average distance from a point to its center by a constant factor for large moduli [17], but doing so would complicate our proof without improving the leading term of the complexity. When the modulus is small, it might be worthwhile to use error-correcting codes as in [18].
3.3 Main Algorithm
Let us denote by \(\mathcal {L}_{0}\) the set of starting samples, and \(\mathcal {L}_i\) the sample list after i reduction steps. The numbers \(d_{0} = 0 \le d_{1} \le \dots \le d_{k} = n\) partition the n coordinates of sample vectors into k buckets. Let \(\mathbf {\mathrm {D}} = (D_{0},\dots ,D_{k-1})\) be the vector of quantization coefficients associated to each bucket.
In order to allow for a uniform presentation of the BKW algorithm, applicable to different settings, we do not assume a specific distribution on the secret. Instead, we assume there exists some known \(\mathbf {\mathrm {B}} = (B_{0},\dots ,B_{n-1})\) such that \(\sum _i (s_i/B_i)^2 \le n\). Note that this is in particular true if \(|s_{i}| \le B_{i}\). We shall see how to adapt this to the standard Gaussian case later on. Without loss of generality, \(\mathbf {\mathrm {B}}\) is non increasing.
There are a phases in our reduction: in the i-th phase, the coordinates from \(d_i\) to \(d_{i+1}\) are reduced. We define \(m=|\mathcal {L}_0|\).
Lemma 5
Solve terminates in time \(\mathcal {O}(mn\log q)\).
Proof
The Reduce algorithm clearly runs in time \(\mathcal {O}(|\mathcal {L}|n \log q)\). Moreover, \(|\mathcal {L}_{i+1}|\le |\mathcal {L}_i|/2\) so that the total running time of Solve is \(\mathcal {O}(n\log q\sum _{i=0}^k m/2^i)=\mathcal {O}(mn\log q)\). \(\square \)
Lemma 6
Write \(\mathcal {L}'_{i}\) for the samples of \(\mathcal {L}_{i}\) where the first \(d_{i}\) coordinates of each sample vector have been truncated. Assume \(|s_j|D_{i}<0.23q\) for all \(d_{i} \le j < d_{i+1}\). If \(\mathcal {L}'_{i}\) is sampled according to the \(\mathsf {LWE}\) distribution of secret \(\mathbf {\mathrm {s}}\) and noise parameters \(\alpha \) and \(\epsilon \le 1\), then \(\mathcal {L}'_{i+1}\) is sampled according to the \(\mathsf {LWE}\) distribution of the truncated secret with parameters:
$$\begin{aligned} \alpha '^2=2\alpha ^2+4\pi ^2\sum _{j=d_i}^{d_{i+1}-1}(s_jD_{i}/q)^2\quad \text {and }\quad \epsilon '=3\epsilon . \end{aligned}$$
On the other hand, if \(D_i=1\), then \(\alpha '^2=2\alpha ^2\).
Proof
The independence of the outputted samples and the uniformity of their vectorial part are clear. Let \((\mathbf {\mathrm {a}},b)\) be a sample obtained by substracting two samples from \(\mathcal {L}_{i}\). For \(\mathbf {\mathrm {a'}}\) the vectorial part of a sample, define \(\epsilon (\mathbf {\mathrm {a'}})\) such that \(\mathbb {E}[\exp (2i\pi (\langle \mathbf {\mathrm {a'}},\mathbf {\mathrm {s}} \rangle -b')/q)|\mathbf {\mathrm {a'}}]=(1+\epsilon (\mathbf {\mathrm {a'}}))\exp (-\alpha ^2)\). By definition of LWE, \(|\epsilon (\mathbf {\mathrm {a'}})| \le \epsilon \), and by independence:
$$\begin{aligned} \mathbb {E}[\exp (2i\pi (\langle \mathbf {\mathrm {a}},\mathbf {\mathrm {s}}\rangle -b)/q)|\mathbf {\mathrm {a}}]=\exp (-2\alpha ^2)\mathbb {E}_{\mathbf {\mathrm {a'}}-\mathbf {\mathrm {a''}}=\mathbf {\mathrm {a}}}[(1+\epsilon (\mathbf {\mathrm {a'}}))(1+\epsilon (\mathbf {\mathrm {a''}}))], \end{aligned}$$
with \(|\mathbb {E}_{\mathbf {\mathrm {a'}}-\mathbf {\mathrm {a''}}=\mathbf {\mathrm {a}}}[(1+\epsilon (\mathbf {\mathrm {a'}}))(1+\epsilon (\mathbf {\mathrm {a''}}))]-1|\le 3\epsilon \).
Thus we computed the noise corresponding to adding two samples of \(\mathcal {L}_{i}\). To get the noise for a sample from \(\mathcal {L}_{i+1}\), it remains to truncate coordinates from \(d_{i}\) to \(d_{i+1}\). A straightforward induction on the coordinates shows that this noise is:
$$\begin{aligned} \exp (-2\alpha ^2)\mathbb {E}_{\mathbf {\mathrm {a'}}-\mathbf {\mathrm {a''}}=\mathbf {\mathrm {a}}}[(1+\epsilon (\mathbf {\mathrm {a'}}))(1+\epsilon (\mathbf {\mathrm {a''}}))]\prod _{j=d_{i}}^{d_{i+1}-1}\mathbb {E}[\exp (2i\pi \mathbf {a}_j\mathbf {s}_j/q)]. \end{aligned}$$
Indeed, if we denote by \(\mathbf {\mathrm {a}}^{(j)}\) the vector \(\mathbf {\mathrm {a}}\) where the first j coordinates are truncated and \(\alpha _j\) the noise parameter of \(\mathbf {\mathrm {a}}^{(j)}\), we have:
$$\begin{aligned}&|\mathbb {E}[\exp (2i\pi (\langle \mathbf {\mathrm {a}}^{(j+1)},\mathbf {\mathrm {s}}^{(j+1)} \rangle -b)/q)|\mathbf {\mathrm {a}}^{(j+1)}]-\exp (-\alpha _n^2)\mathbb {E}[\exp (2i\pi \mathbf {a}_j\mathbf {s}_j/q)]|\\ =\;&|\mathbb {E}[\exp (-2i\pi \mathbf {a}_j\mathbf {s}_j/q)(\exp (2i\pi (\langle \mathbf {\mathrm {a}}^{(j)},\mathbf {\mathrm {s}}^{(j)} \rangle -b)/q)-\exp (-\alpha _j^2))]|\\ \le \;&\epsilon ' \exp (-\alpha _j^2)\mathbb {E}[\exp (2i\pi \mathbf {a}_j\mathbf {s}_j/q)]. \end{aligned}$$
It remains to compute \(\mathbb {E}[\exp (2i\pi \mathbf {a}_j\mathbf {s}_j/q)]\) for \(d_{i} \le j<d_{i+1}\). Let \(D = D_{i}\). The distribution of \(\mathbf {a}_j\) is even, so \(\mathbb {E}[\exp (2i\pi \mathbf {a}_j \mathbf {s}_j)]\) is real. Furthermore, since \(|\mathbf {a}_j|\le D\),
$$\begin{aligned} \mathbb {E}[\exp (2i\pi \mathbf {a}_j \mathbf {s}_j/q)]\ge \cos (2\pi \mathbf {s}_jD/q) . \end{aligned}$$
Assuming \(|\mathbf {s}_j|D<0.23q\), simple function analysis shows that
$$\begin{aligned} \mathbb {E}[\exp (2i\pi \mathbf {a}_j \mathbf {s}_j/q)]\ge \exp (-4\pi ^2 \mathbf {s}_j^2D^2/q^2). \end{aligned}$$
On the other hand, if \(D_i=1\) then \(\mathbf {a}_j=0\) and \(\mathbb {E}[\exp (2i\pi \mathbf {a}_j \mathbf {s}_j/q)]=1\). \(\square \)
Finding optimal parameters for BKW amounts to balancing various costs: the baseline number of samples required so that the final list \(\mathcal {L}_{k}\) is non-empty, and the additional factor due to the need to distinguish the final error bias. This final bias itself comes both from the blowup of the original error bias by the BKW additions, and the “rounding errors” due to quantization. Balancing these costs essentially means solving a system.
For this purpose, it is convenient to set the overall target complexity as \(2^{n(x + o(1))}\) for some x to be determined. The following auxiliary lemma essentially gives optimal values for the parameters of Solve assuming a suitable value of x. The actual value of x will be decided later on.
Lemma 7
Pick some value x (dependent on \(\mathsf {LWE}\) parameters). Choose:
$$\begin{aligned} k&\le \bigg \lfloor \log \bigg (\frac{nx}{6\alpha ^2}\bigg ) \bigg \rfloor \quad&m&= n2^k2^{nx}\\ D_i&\le \frac{q\sqrt{x/6}}{\pi B_{d_i}2^{(a-i+1)/2}}\quad&d_{i+1}&= \min \bigg (d_{i} + \bigg \lfloor \frac{nx}{\log (1+q/D_i)} \bigg \rfloor , n\bigg ). \end{aligned}$$
Assume \(d_k = n\) and \(\epsilon \le 1/(\beta ^{2}x)^{\log 3}\), and for all i and \(d_i\le j < d_{i+1}\), \(|s_j|D_i<0.23q\). Solve runs in time \(\mathcal {O}(mn)\) with negligible failure probability.
Proof
Remark that for all i,
$$\begin{aligned} |\mathcal {L}_{i+1}|\ge (|\mathcal {L}_{i}|-(1+q/D_i)^{d_{i+1}-d_i})/2 \ge (|\mathcal {L}_i|-2^{nx})/2. \end{aligned}$$
Using induction, we then have \(|\mathcal {L}_i|\ge (|\mathcal {L}_0|+2^{nx})/2^i-2^{nx}\) so that \(|\mathcal {L}_k| \ge n2^{nx}\).
By induction and using the previous lemma, the input of Distinguish is sampled from a \(\mathsf {LWE}\) distribution with noise parameter:
$$\begin{aligned} \alpha '^2=2^k\alpha ^2+4\pi ^2\sum _{i=0}^{k-1}2^{k-i-1}\sum _{j=d_i}^{d_{i+1}-1}(s_jD_i/q)^2. \end{aligned}$$
By choice of k the first term is smaller than nx/6. As for the second term, since B is non increasing and by choice of \(D_{i}\), it is smaller than:
$$ 4\pi ^2\sum _{i=0}^{k-1}2^{k-i-1}\frac{x/6}{\pi ^22^{k-i+1}}\sum _{j=d_i}^{d_{i+1}-1}\Big (\frac{s_j}{B_j}\Big )^2 \le (x/6)\sum _{j=0}^{n-1}\Big (\frac{s_{j}}{B_{j}}\Big )^{2}\le nx/6. $$
Thus the real part of the bias is superior to \(\exp (-nx/3)(1-3^a\epsilon ) \ge 2^{-nx/2}\), and hence by Theorem 2.2, Distinguish fails with negligible probability. \(\square \)
Theorem 4
Assume that for all i, \(|s_i|\le B\), \(B\ge 2\), \(\max (\beta ,\log (q))=2^{o(n/\log n)}\), \(\beta =\omega (1)\), and \(\epsilon \le 1/\beta ^4\). Then Solve takes time \(2^{(n/2+o(n))/\ln (1+\log \beta /\log B)}\).
Proof
We apply Lemma 7, choosing
$$\begin{aligned} k=\lfloor \log (\beta ^2/(12\ln (1+\log \beta ))) \rfloor =(2-o(1))\log \beta \in \omega (1) \end{aligned}$$
and we set \(D_i=q/(Bk2^{(k-i)/2})\). It now remains to show that this choice of parameters satisfies the conditions of the lemma.
First, observe that \(BD_i/q\le 1/k=o(1)\) so the condition \(|s_j|D_i<0.23q\) is fulfilled. Then, \(d_k \ge n\), which amounts to:
$$\begin{aligned} \sum _{i=0}^{k-1} \frac{x}{(k-i)/2+\log \mathcal {O}(kB)} \ge 2x\ln (1+k/2/\log \mathcal {O}(kB)) \ge 1+k/n=1+o(1) \end{aligned}$$
If we have \(\log k=\omega (\log \log B)\) (so in particular \(k = \omega (\log B)\)), we get \(\ln (1+k/2/\log \mathcal {O}(kB))=(1+o(1))\ln (k)=(1+o(1))\ln (1+\log \beta /\log B)\).
Else, \(\log k=\mathcal {O}(\log \log B)=o(\log B)\) (since necessarily \(B = \omega (1)\) in this case), so we get \(\ln (1+k/2/\log \mathcal {O}(kB))=(1+o(1))\ln (1+\log \beta /\log B)\).
Thus our choice of x fits both cases and we have \(1/x\le 2\ln (1+\log \beta )\). Second, we have \(1/k=o(\sqrt{x})\) so \(D_i\), \(\epsilon \) and k are also sufficiently small and the lemma applies. Finally, note that the algorithm has complexity \(2^{\varOmega (n/\log n)}\), so a factor \(n2^k\log (q)\) is negligible. \(\square \)
This theorem can be improved when the use of the given parameters yields \(D<1\), since \(D=1\) already gives a lossless quantization.
Theorem 5
Assume that for all i, \(|s_i|\le B=n^{b+o(1)}\). Let \(\beta =n^{c}\) and \(q=n^d\) with \(d\ge b\) and \(c+b\ge d\). Assume \(\epsilon \le 1/\beta ^4\). Then Solve takes time \(2^{n/(2(c-d+b)/d+2\ln (d/b)-o(1))}\).
Proof
Once again we aim to apply Lemma 7, and choose k as above:
$$\begin{aligned} k=\log (\beta ^2/(12\ln (1+\log \beta )))=(2c-o(1))\log n \end{aligned}$$
If \(i<\lceil 2(c-d+b)\log n \rceil \), we take \(D_i=1\), else we choose \(q/D_i=\varTheta (B2^{(a-i)/2})\). Satisfying \(d_{a} \ge n-1\) amounts to:
$$\begin{aligned}&2x(c-d+b)\log n/\log q+\sum _{i=\lceil 2(c-d+b)\log n \rceil }^{a-1} \frac{x}{(a-i)/2+\log \mathcal {O}(B)} \\ \ge \;&2x(c-d+b)/d+2x\ln ((a-2(c-d+b)\log n+2\log B)/2/\log \mathcal {O}(B)) \\ \ge \;&1+a/n=1+o(1) \end{aligned}$$
So that we can choose \(1/x=2(c-d+b)/d+2\ln (d/b)-o(1)\). \(\square \)
Corollary 1
Given a \(\mathsf {LWE}\) problem with \(q=n^d\), Gaussian errors with \(\beta =n^c\), \(c>1/2\) and \(\epsilon \le n^{-4c}\), we can find a solution in \(2^{n/(1/d+2\ln (d/(1/2+d-c))-o(1))}\) time.
Proof
Apply Theorem 1: with probability 2/3, the secret is now bounded by \(B=\mathcal {O}(q\sqrt{n}/\beta \sqrt{\log n})\). The previous theorem gives the complexity of an algorithm discovering the secret, using \(b=1/2-c+d\), and which works with probability \(2/3-2^{-\varOmega (n)}\). Repeating n times with different samples, the correct secret will be outputted at least \(n/2+1\) times, except with negligible probability. By returning the most frequent secret, the probability of failure is negligible. \(\square \)
In particular, if \(c \le d\), it is possible to quantumly approximate lattice problems within factor \(\mathcal {O}(n^{c+1/2})\) [34]. Setting \(c=d\), the complexity is \(2^{n/(1/c+2\ln (2c)-o(1))}\), so that the constant slowly converges to 0 when c goes to infinity.
A simple \(\mathsf {BKW}\) using the bias would have a complexity of \(2^{d/cn+o(n)}\), the analysis of [4] or [3] only conjectures \(2^{dn/(c-1/2)+o(n)}\) for \(c>1/2\). In [4], the authors incorrectly claim a complexity of \(2^{cn+o(n)}\) when \(c=d\), because the blowup in the error is not explicitely computed.
Finally, if we want to solve the \(\mathsf {LWE}\) problem for different secrets but with the same vectorial part of the samples, it is possible to be much faster if we work with a bigger final bias, since the Reduce part needs to be called only once.
3.4 Experimentation
We have implemented our algorithm, in order to test its efficiency in practice, as well as that of the practical improvements in the appendix of the full version. We have chosen dimension \(n = 128\), modulus \(q = n^{2}\), binary secret, and Gaussian errors with noise parameter \(\alpha = 1/(\sqrt{n/\pi }\log ^2 n)\). The previous best result for these parameters, using a \(\mathsf {BKW}\) algorithm with lazy modulus switching, claims a time complexity of \(2^{74}\) with \(2^{60}\) samples [4].
Using our improved algorithm, we were able to recover the secret using \(m = 2^{28}\) samples within 13 hours on a single PC equipped with a 16-core Intel Xeon. The computation time proved to be devoted mostly to the computation of \(9\cdot 10^{13}\) norms, computed in fixed point over 16 bits in SIMD.
In appendix of the full version, we compare the different techniques to solve the LWE problem when the number of samples is large or small. We were able to solve the same problem using BKZ with block size 40 followed by an enumeration in two minutes.