1 Introduction

With the upcoming NIST initiative for recommending quantum-secure public key cryptosystems [1], it becomes even more urgent and mandatory to properly select cryptographic key sizes with a well-defined security level, both classically and of course also quantumly. Therefore, the cryptographic community has to establish for the most prominent hardness problems, e.g. in the areas of codes, lattices, multivariate and isogenies, predictions for solving cryptographic instances with security levels of 128 bit and above.

The choice of key sizes has naturally been a tradeoff between efficiency and security. On the one hand, one would like to choose small parameters that allow for efficient implementations. On the other hand, one is usually quite conservative in estimating which parameters can be broken within say \(2^{128}\) steps. While giving conservative security estimates is in general good, we believe that this practice is often disproportionate in cryptographic research.

For instance, when selecting the best algorithm, cryptographers usually completely ignore memory consumption. And quite often, the best time complexity T is only achieved with memory consumption as large as T. An example with such huge memory requirement is the Blum-Kalai-Wasserman (BKW) algorithm [7] for solving LPN. But when implementing an algorithm in practice, memory consumption is the main limiting factor. While performing \(2^{60}\) steps is even doable on smallish computing clusters in a reasonable amount of time, getting an amount of \(2^{60}\) of RAM is clearly out of reach. If one has to rely on additional hard disk space, the running time will increase drastically.

An Internet investigation shows that nowadays the largest supercomputersFootnote 1 have a RAM of at most 1.6 PB \(<2^{54}\) bits. Putting some safety margin, it seems to be fair to say that any algorithm with memory consumption larger than \(2^{60}\) bits cannot be instantiated in practice. In the course of the paper we will also consider a higher safety margin of \(2^{80}\) bits.

Hence, there is a need for finding algorithms for post-quantum problems that can be instantiated with small memory, in order to run them on medium size instances for an accurate extrapolation to cryptographic key sizes. For the selection of key sizes, one might safely restrict to algorithms that do not exceed a certain amount of memory, like e.g. \(2^{60}\) bits. Belaïd et al. [5] considered a related model in which an attacker has limited LPN samples and memory. However, we do not want to limit the number of LPN samples.

Ideally, we would design algorithms whose running time benefit from any fixed amount of memory. Let us assume that we have M bits of RAM on our computing facility. The main research question is then which optimal running time can be achieved when (fully) using this amount.

Our goal is to answer this question for Learning Parity with Noise (LPN). LPN is the basis for many code-based constructions and can be seen as a special instance of Learning with Errors (LWE) [29]. In the LPN problem, one has to learn a secret \(\mathbf s \in \mathbb {F}_2^k\) using access to an oracle that provides samples of the form \((\mathbf a_i, b_i)\), where \(\mathbf a_i\) is uniformly at random from \(\mathbb {F}_2^k\), and \(b_i = \langle \mathbf a_i, \mathbf s \rangle + e_i\) for some error \(e_i \in \{0,1\}\) with \(\Pr [e_i=1] = \tau \). Hence, LPN is a two-parameter problem with dimension k and error rate \(\tau \in [0,\frac{1}{2})\).

Naturally, the problem becomes harder with increasing k and \(\tau \). For \(\tau =0\) we can easily draw k samples with linearly independent \(\mathbf a_i\) and solve for \(\mathbf s\) via Gaussian elimination. This algorithm can simply be generalized to any \(\tau \in [0, \frac{1}{2})\), by drawing k samples in each iteration, computing a candidate \(\mathbf s'\), and test whether \(\mathbf s = \mathbf s'\). Notice that \(\mathbf s = \mathbf s'\) iff in this iteration all samples are error-free.

This algorithm, that we call Gauss, seems to be somewhat folklore. To the best of our knowledge it was first used in 2008 by Carrijo et al. [11], and has been e.g. analyzed in Bogos et al. [8]. The benefits of Gauss are that it consumes only small memory and performs well for small noise \(\tau \), e.g. for the currently required choice of \(\tau =\frac{1}{\sqrt{k}}\) in the public key encryption schemes of Alekhnovich [3], Damgård, Park [12], Döttling, Müller-Quade, Anderson [13] and Duc, Vaudenay [14].

For constant noise \(\tau \), as used e.g. in the HB family of protocols [16, 20, 21] and their extensions [19, 23], currently the best known algorithm is BKW, due to Blum, Kalai and Wasserman [7] with running time, memory consumption and sample complexity \(2^{\mathcal{O}(k/\log k)}\). BKW has been widely studied in the cryptographic literature and there are several improvements in practice due to Fossorier et al. [15], Levieil, Fouque [24], Lyubashevsky [25], Guo, Johansson, Löndahl [18] and Zhang, Jiao, Mingsheng [30]. While BKW offers for large \(\tau \) the best running time, it cannot be implemented even for medium size LPN parameters due to its huge memory consumption. But without having any experimental results, it is an error-prone process to predict security levels. This also led to some discussion about the accuracy of predictions [9].

Gauss and BKW are the starting point of our paper. We revisit both in Sect. 2, where we analyze them asymptotically and show that BKW has a very bad dependency on \(\tau \) with a running time of \(2^{\mathcal{O}(k/\log (\frac{k}{\tau }))}\). So even for \(\tau \) as small as \(\tau = \frac{1}{k}\), the running time remains \(2^{\mathcal{O}(k/\log k)}\).

Another drawback of Gauss and BKW is their large sample complexity, i.e. the number of calls to an LPN oracle, which is for both algorithms as large as their running time. Since the LPN oracle is by definition classical, this prevents any possible speed-ups by quantum search techniques, e.g. by Grover search [17].

Therefore, we will first reduce the number of samples to a pool of only \(n=\text {poly}(k)\) samples. Out of these n samples, we look for a set of k error-free samples similar to Gauss. The resulting algorithm Pooled Gauss (Sect. 4) has the same time and memory complexity as Gauss, while consuming far fewer samples. This immediately gives rise to a quantum version, for which we save a square root in the run time via Grover search.

Another benefit of having small sample complexity is that we can add a preprocessing step that reduces the dimension of our LPN instances via intensive use of the LPN oracle. The resulting algorithm that we call Well-Pooled Gauss (Sect. 5.1) offers a significantly reduced time complexity.

In a nutshell, Well-Pooled Gauss has a simple preprocessing step that decreases the LPN dimension, and then a decoding step via Gaussian elimination. The preprocessing step can be improved by more sophisticated dimension reduction methods such as BKW. This comes at the cost of using some memory, but we can control the required memory by adjusting the amount of dimension reduction. Altogether, this results in Algorithm Hybrid (Sect. 5.3) that for any given memory M first reduces the dimension with full memory use, and second runs Gaussian elimination on the dimension-reduced, and thus easier, LPN instance. Another nice feature of Hybrid is that its preprocessing step allows to easily include many of the recent BKW optimizations [18, 24, 30].

Moreover, we are also able to improve on the decoding step by replacing Gaussian elimination with more sophisticated information set decoding algorithms, like Ball-Collision Decoding of Bernstein, Lange, Peters [6], MMT of May, Meurer, Thomae [26], BJMM of Becker et al. [4] or May-Ozerov [27]. For our purpose of decoding LPN instances, it turns out that the MMT algorithm tuned to the LPN setting performs best. The resulting algorithm that we call Well-Pooled MMT is studied in Sect. 5.4.

Table 1 provides a more detailed overview of our algorithms and results. For ease of exposition, in Table 1 we omit all small error terms in run times, like \((1+o(1))\)-factors or \(\tilde{\mathcal{O}}\)-notation.

Table 1. Overview of our results, \(f(\tau ):=\log \left( \frac{1}{1-\tau }\right) \)

2 Preliminaries and the LPN Problem

2.1 Preliminaries

Let us first fix some notation. With \(\log \) we denote the binary logarithm. For a positive integer \(n\in \mathbb {N}\) we define \([n]:=\{1,2,\ldots , n\}\). Let M be a set and \(k\in \mathbb {N}\). Then \(\left( {\begin{array}{c}M\\ k\end{array}}\right) \) is the set of all subsets of M of size k. In particular, \(\left( {\begin{array}{c}[n]\\ k\end{array}}\right) \) is the set of size-k subsets of \(\{1,\ldots ,n\}\).

Let \(A\in \mathbb {F}_2^{n\times k}\), \(\mathbf b\in \mathbb {F}_2^k\) and \(I=\{i_1,\ldots , i_\ell \}\subseteq [k]\). Then \(A_I\) consists of the rows indexed by I and \(\mathbf b_I\) consists of the entries indexed by I, e.g.

Let \(\mathbf v=(v_1,\ldots , v_n)\in \mathbb {F}_2^n\). Then we call wt\((\mathbf v):=\Vert \mathbf v\Vert _1=|\{i\in [n]\;|\; v_i\not =0\}|\) the Hamming weight (or just weight) of \(\mathbf v\). A linear code \(\mathcal {C}\) is a subspace of \(\mathbb {F}_2^n\). If \(\dim (\mathcal {C})=k\) and \(d:=\min \limits _{0\not = c\in \mathcal {C}}\{ wt (c)\}\), then we call \(\mathcal {C}\) an [nkd] code.

This implies \(\mathcal {C}=\text {im}(G)\) for some matrices \(G\in \mathbb {F}_2^{n\times k}\) with rank k. We call G a generator matrix of \(\mathcal {C}\). For a random matrix G we call \(\mathcal {C}=\text {im}(G)\) a random linear code.

For a finite set M we write the uniform distribution on M by \(\mathcal {U}(M)\). Moreover, we denote by \(\text {Ber}_\tau \) the Bernoulli distribution with parameter \(\tau \), i.e., \(e \sim \text {Ber}_\tau \) means that we draw a 0–1 valued random variable e with \(\Pr [e=1] = \tau \).

The binomial distribution is denoted as \(\text {Bin}_{n,p}\) and can be seen as the sum of n independently identically distributed \(\text {Ber}_{p}\) variables. If \(X\sim \text {Bin}_{n,p}\), we have \( Pr \left[ X=i\right] =\left( {\begin{array}{c}n\\ i\end{array}}\right) p^i(1-p)^{n-i}\). In the course of the paper we will deal with the question: Given \(p=p(k)\) and \(N=N(k)\), how large does the number of Bernoulli trials \(n=n(k)\) have to be, such that \( Pr \left[ X\ge N\right] \ge 1-\text {negl}(k)\)? Here k is a security parameter and \(\text {negl}(k)=o(\frac{1}{\text {poly}(k)})=k^{-\omega (1)}\). We call probabilities of the form \(1-\text {negl}(k)\) overwhelming.

For example, setting \(n=\frac{N}{p}\) only yields \( Pr \left[ X\ge N\right] \ge \frac{1}{2}\), so n has to be larger than that. How much larger it has to be is answered by

Lemma 1

Let \(X\sim \text {Bin}_{n,p>0}\) and \(0\le N \le np\). If \(n=\varTheta \left( \frac{N+\log ^2 k}{p}\right) \), then we have \( Pr \left[ X\ge N\right] \ge 1-k^{-\omega (1)}\).

Proof

The Chernoff bounds give us \( Pr \left[ X\ge N\right] \ge 1-e^{-\frac{(np-N)^2}{2np}}\). If we set \(n\ge \frac{N+\log ^2 k+\sqrt{N\cdot \log ^2 k+\log ^4 k}}{p}\), for example \(n=\frac{2N+\log ^2 k}{p}\), we get

$$1-e^{-\frac{(np-N)^2}{2np}}=1-e^{-\frac{(N+\log ^2 k)^2}{4N+2\log ^2 k}}\ge 1-e^{-\frac{N+\log ^2 k}{4}}\ge 1-e^{\frac{\log ^2 k}{4}}=1-k^{-\omega (1)}.$$

    \(\square \)

For \(N=1\), Lemma 1 gives us the amount of Bernoulli trials we need, until we get a success with overwhelming probability, i.e. \(\frac{\log ^2 k}{p}\) suffices. One can see, that this is only slightly more than \(\frac{1}{p}\), the expectation value of a geometric distributed random variable with parameter p.

2.2 The LPN Problem

Let us now formally define the LPN problem.

Definition 1

In the \( LPN _{k,\tau }\) problem, for some secret \(\mathbf s \in \mathbb {F}_2^k\) and error parameter \(\tau \in [0,\frac{1}{2})\) we are given access to an oracle that provides samples of the form

where \(\mathbf a_i \sim \mathcal {U}(\mathbb {F}_2^k)\) and \(e_i \sim \text {Ber}_\tau \), independently. Our goal is to recover \(\mathbf s\).

We call \(b_i\) the corresponding label of \(\mathbf a_i\).

Notation: Upon asking m queries, we write \((A, \mathbf b) \leftarrow LPN _{k,\tau }^m\) meaning that

\(A\mathbf s = \mathbf b + \mathbf e\), where the \(i^{th}\) row of \(A \in \mathbb {F}_2^{m \times k}\) and \(\mathbf b \in \mathbb {F}_2^m\) present the \(i^{th}\) sample.

Remark 1

We say that an algorithm A with overwhelming probability solves \( LPN _{k,\tau }\) in running time T, if it both terminates within time T and outputs the correct \(\mathbf s\) with probability \(1- \text {negl}(k)\). This means that A might not terminate in time T or that A might output an incorrect \(\mathbf s'\), but we bound both events by some negligible function in k. Notice that our notion is stronger than just expected running time \(T_e\), where the real running time might significantly deviate from \(T_e\) with even constant probability.

The error-free case \(\text {LPN}_{k,0}\) can be easily solved by obtaining k sample \((\mathbf a_i, b_i)\) with linearly independent \(\mathbf a_i\), and computing via Gaussian elimination

$$\begin{aligned} \mathbf s = A^{-1} \mathbf b. \end{aligned}$$
(1)

However, in case of errors we obtain \(\mathbf s = A^{-1} \mathbf b + A^{-1} \mathbf e\), with an accumulated error of \(A^{-1} \mathbf e\), where \( wt (A^{-1} \mathbf e)\) is usually large. In other words, Gaussian elimination lets the error grow too fast by adding together too many samples.

The error growth can be made precise in terms of the number n of additions via the following lemma, usually called Piling-up Lemma in the cryptographic literature.

Lemma 2

(Piling-up Lemma). Let \(e_i\sim \text {Ber}_\tau \), \(i=1,\ldots , n\) be identically, independently distributed. Then we have \(\sum _{i=1}^{n}e_i\sim \text {Ber}_{\frac{1}{2}-\frac{1}{2}(1-2\tau )^n}\).

Proof

\(n=1\) is immediate. Induction over n yields

$$\begin{aligned} {} Pr \left[ \sum _{i=1}^n e_i=1\right]= & {} Pr \left[ \sum _{i=1}^{n-1} e_i=0\right] \cdot Pr [e_n=1]+ Pr \left[ \sum _{i=1}^{n-1} e_i=1\right] \cdot Pr [e_n=0]\\= & {} \left( \frac{1}{2}+\frac{1}{2}(1-2\tau )^{n-1}\right) \tau +\left( \frac{1}{2}-\frac{1}{2}(1-2\tau )^{n-1}\right) (1-\tau )\\= & {} \frac{1}{2}-\frac{1}{2}(1-2\tau )^n.\quad \square \end{aligned}$$

3 Revisiting Previous Work

3.1 The BKW Algorithm

Blum, Kalai and Wasserman [7] proposed a variant of Gaussian elimination, called BKW algorithm, that performs elimination of whole blocks instead of single coordinates. This results in way less additions of samples, thus controlling the error, at the cost of requiring way more initial LPN samples to perform eliminations.

The following high-level description of BKW eliminates blocks of size d in each of its \(c-1\) iterations, resulting in vectors that are sums of \(2^{c-1}\) original samples. We describe only how to compute the first bit of \(\mathbf s\), the other bits are analogous.

figure a

Blum, Kalai and Wasserman show that, for constant \(\tau \), instantiating their algorithm with blocks of size roughly \(d = \frac{k}{\log k}\) and \(c=\log k\) iterations while using \(N= 2^{\mathcal{O}(k/\log k)}\) samples results in running time and memory complexity also \(2^{\mathcal{O}(k/\log k)}\).

Since for concrete cryptographic instantiations, we are also interested in the dependence on \(\tau \) and the constant hidden in the \(\mathcal{O}\)-notion, we give a slightly more detailed analysis in the following.

Theorem 1

BKW solves \(\text {LPN}_{k,\tau }\) for \(\tau >0\) with overwhelming success probability in time, memory and sample complexity \(2^{\frac{k}{\log \left( \frac{k}{\tau }\right) }(1+o(1))}.\)

Proof

By our choice in BKW we initially start with \(N:=\big (c-1+\frac{\log ^2 k}{(1-2\tau )^{2^c}}+\log ^2 k\big )2^d\) samples. Every foreach loop reduces the number of samples by at most \(2^d\), resulting in at least \(\big (\frac{\log ^2 k}{(1-2\tau )^{2^c}}+\log ^2 k\big )2^d\) samples after loop termination.

Let \(\mathbf u_1=(1,0,0, \ldots , 0)\) be the first unit vector. Among the remaining samples there will be at least \(r=\frac{\log ^2 k}{(1-2\tau )^{2^c}}\) samples of the form \((\mathbf u_1, \mathbf s_1+e)\) for some error \(e \in \{0,1\}\) with overwhelming probability according to Lemma 1. Since our r remaining samples are generated as a sum of \(2^{c-1}\) initial samples, the Piling-up lemma (Lemma 2) yields \(e\sim \text {Ber}_{\frac{1}{2}-\frac{1}{2}(1-2\tau )^{2^{c-1}}}\).

Hence, e has a bias of \(\bar{b}=\frac{1}{2}(1-2\tau )^{2^{c-1}}\). An easy Chernoff bound argument shows that having \(\bar{b}^{-2}\) samples is sufficient to obtain \(s_1\) with constant success probability by majority vote. Since our number r is larger than \(\bar{b}^{-2}\) by a factor of \(\frac{\log ^2 k}{4}\), we even obtain \(s_1\) with overwhelming success probability. By repeating this process for all bits \(s_1, \ldots , s_k\) a union bound shows that we lose a factor of at most k in the success probability, meaning that we can recover \(\mathbf s\) with overwhelming success probability.

The algorithm’s run time and memory consumption is (up to polynomial factors) dominated by its sample complexity, which by our choice of cd is

$$N = \big (c-1+\frac{\log ^2 k}{(1-2\tau )^{2^c}}+\log ^2 k\big )2^d = 2^{\mathcal{O}(k^{1-\varepsilon }) + \frac{1}{1-\varepsilon } \cdot \frac{k}{\log (\frac{k}{\tau })}} = 2^{\frac{k}{\log (\frac{k}{\tau })}(1+o(1))}.$$

    \(\square \)

We would like to point out that in Theorem 1 the running time \(2^{k/\log (\frac{k}{\tau })(1+o(1))}\) only very slowly decreases with \(\tau \). Notice that even for \(\tau \) as small as \(\varTheta (\frac{1}{k} )\) we still obtain a running time of \(2^{\frac{1}{2}k/\log k(1+o(1))}\), while \(\text {LPN}_{k, \mathcal{O}(\frac{1}{k} )}\) clearly can be solved in polynomial time via correcting \(\mathcal{O}(1)\) errors and running Gaussian elimination.

3.2 Gauss

The following simple Algorithm 2, that we call Gauss, is the most natural extension of Gaussian elimination from Sect. 2.2, where one repeats sampling k linearly independent \(\mathbf a_i\) until they are all error-free.

In each iteration of Gauss we simply assume error-freeness and compute a candidate secret key \(\mathbf s' = A^{-1} \mathbf b\) as in Eq. (1). We take fresh samples to test our hypothesis, whether we were indeed in the error-free case and hence \(\mathbf s' = \mathbf s\).

Notice that we are in the error-free case with probability \((1-\tau )^k\). Hence, Algorithm 2 has up to polynomial factors expected running time \((\frac{1}{1-\tau })^k\), provided that \(\texttt {Test}\) can be carried out in polynomial time. Thus in comparison to BKW in Sect. 3.1, we obtain a much better dependence on \(\tau \). For instance for \(\tau = \mathcal{O}(\frac{1}{k})\), we obtain polynomial running time, as one would expect.

figure b

Basically our algorithm \(\texttt {Test}\) computes for sufficiently many fresh LPN sample \((A, \mathbf b) \leftarrow LPN _{k,\tau }^m\) whether \(A\mathbf s' + \mathbf b\) is closer to \(\text {Ber}_{m, \tau }\) or to \(\text {Ber}_{m, \frac{1}{2}}\) via checking whether its weight is close to \(\tau m\) or \(\frac{m}{2}\), respectively.

We have designed Test in a flexible way that allows us to control the two-sided error probabilities \(\Pr [ \texttt {Test}~\text {rejects} \mid \mathbf s' = \mathbf s]\) for rejecting the right candidate and \(\Pr [{\texttt {Test}}~~\text {accepts} \mid \mathbf s' \not = \mathbf s]\) for accepting an incorrect \(\mathbf s'\) via two parameters \(\alpha , \beta \). Throughout this paper, we will tune these parameters \(\alpha , \beta \) to guarantee that all subsequent algorithms have overwhelming success probability \(1 - \text {negl}(k)\).

figure c

Notice that by our definition of m in Test even an exponentially small choice of \(\alpha = \beta = \frac{1}{2^k}\) leads to only \(m = \varTheta \left( \frac{k}{(\frac{1}{2}-\tau )^2}\right) \) samples, which is linear in k and quadratic in \(\left( \frac{1}{2} - \tau \right) ^{-1}\). Thus, our hypothesis test can be carried out efficiently even for exponentially small error probabilities.

Lemma 3 (Hypothesis Testing)

For any \(\alpha , \beta \in (0,1]\), Test accepts the correct LPN secret \(\mathbf s\) with probability at least \(1- \alpha \), and rejects incorrect \(\mathbf s'\) with probability at least \(1 - \beta \), using m samples in time and space \(\varTheta (mk)\).

Proof

Inputting the correct \(\mathbf s\) to Test implies, that \( wt (A \mathbf s'+\mathbf b)\sim \text {Bin}_{m,\tau }\). In this case we have

$$\begin{aligned} \begin{array}{lcl} Pr \left[ wt (A \mathbf s'+\mathbf b)\ge c\right] &{}\, \mathop {\le }\limits ^{\text {Chernoff}} &{}\exp \left( -\frac{1}{3} \cdot \min \left( \frac{c}{\tau m}-1,\left( \frac{c}{\tau m}-1\right) ^2\right) \cdot \tau m\right) \\ &{}\;\; \le &{}\exp \left( -\frac{1}{3}\cdot \frac{\tau }{\frac{1}{2}-\tau }\left( \frac{c}{\tau m}-1\right) ^2\cdot \tau m\right) \mathop {=}\limits ^{!}\alpha .\\ \end{array} \end{aligned}$$

We need that the last term is equal to \(\alpha \), which leads to the threshold weight of

$$c:=\tau m + \sqrt{3(\frac{1}{2}-\tau )\ln \left( \frac{1}{\alpha }\right) m},$$

as defined in Test. If \(\mathbf s'\not = \mathbf s\), then \( wt (A \mathbf s'+\mathbf b)\sim \text {Bin}_{m,\frac{1}{2}}\). We want to upper bound the acceptance probability in this case.

$$\begin{aligned} Pr \left[ wt (A \mathbf s'+\mathbf b)\le c\right] \, \mathop {\le }\limits ^{\text {Chernoff}} \, \exp \left( -\frac{1}{2} \cdot \left( 1-\frac{2c}{m}\right) ^2\cdot \frac{m}{2}\right) \mathop {=}\limits ^{!}\beta \end{aligned}$$

Using the c from above, the last equation holds, if

$${}m:=\left( \frac{\sqrt{\frac{3}{2}\ln \left( \frac{1}{\alpha }\right) }+\sqrt{\ln \left( \frac{1}{\beta }\right) }}{\frac{1}{2}-\tau }\right) ^2.\quad \square $$

Remark 2

As defined, Test takes m fresh samples on every invocation for achieving independence. However, for efficiency reasons we will in practice use the same m samples for Test on every invocation. Our experiments confirm that the introduced dependencies do not noticeably affect the algorithms’ performance and success probability.

Now that we are equipped with an efficient hypothesis test, we can carry out the analysis of Gauss. For ease of notation, we use for the running time soft-Theta notion \(\tilde{\varTheta }\) to suppress factors that are polynomial in k.

Theorem 2

Gauss solves \(\text {LPN}_{k,\tau }\) with overwhelming success probability in time and sample complexity \(\tilde{\varTheta }\left( \frac{1}{(1-\tau )^k}\right) \) using \(\varTheta (k^2)\) memory.

Proof

We already noted that the outer repeat loop of Gauss takes an expected number of \(\frac{1}{(1-\tau )^k}\) to produce a batch of k error-free LPN samples. In particular, Lemma 1 tells us that we will find an error-free batch after at most \(\frac{\log ^2 k}{(1-\tau )^k}\) trials with overwhelming probability.

The inner loop is executed an expected number of \(\mathcal{O}(1)\) times until \(A\in \text {GL}_k(\mathbb {F}_2)\). Here again, after at most \(\mathcal {O}\left( \log ^2 k\right) \) iterations it is ensured that we get an invertible A with overwhelming probability. This already proves the upper bound on the running time.

Since, we only have to store k samples for A of length \(\varTheta (k)\) each, our memory consumption is \(\varTheta (k^2)\). In Test we do not necessarily have to store our \(m=\varTheta (k)\) samples, since we can process them on the fly. However, in practice (see Remark 2) it is useful to reserve for them another \(\varTheta (k^2)\) memory cells.

Considering the success probability, Gauss solves \( LPN _{k,\tau }\) when it rejects all false candidates \(\mathbf s'\), and accepts the secret key \(\mathbf s\) (if it appears). The first event happens by Lemma 3 with probability at least \(1- \beta = (\frac{1-\tau }{2})^k\) for each incorrect candidate by our choice in Gauss. The second event happens by Lemma 3 with probability at least \(1- \alpha = 1-2^{-k}\).

Let X be a random variable for the number of iterations of the outer loop until we are for the first time in the error-free case. Then

$$\begin{aligned} {} Pr [\text {Success}]= & {} \sum _{i=1}^{\infty } Pr [\text {Success}\; | \; X=i]\cdot Pr [X=i]\\\ge & {} \sum _{i=1}^{\infty }\left( 1-\beta \right) ^{i-1}\left( 1-\alpha \right) \cdot \left( 1-(1-\tau )^k\right) ^{i-1}(1-\tau )^k\\= & {} \frac{\left( 1-\alpha \right) (1-\tau )^k}{1-\left( 1-\beta \right) \left( 1-(1-\tau )^k\right) }\\\ge & {} \frac{(1-\alpha )(1-\tau )^k}{\beta +(1-\tau )^k} = 1-\text {negl}(k).\quad \square \end{aligned}$$

Notice that Gauss’ sample complexity is as large as its running time by Theorem 2. We will show in the following section that the sample complexity can be decreased to \(\text {poly}(k)\) without affecting the run time. This will be the starting point for further improvements.

4 LPN and Its Relation to Decoding

Let us slightly modify the Gauss algorithm from Sect. 3.2. Instead of taking in each iteration a fresh batch of k LPN samples, we initially fix a large enough pool of n samples. Then in each iteration we take k out of our pool of n samples, with linearly independent \(\mathbf a_i\). This results in the following Algorithm 4 that we call Pooled Gauss.

figure d

Before we analyze Pooled Gauss, we want to clarify its connection to the decoding of random linear codes. Notice that we fix a sample matrix \(A \in \mathbb {F}_2^{n \times k}\) with uniformly random entries. A can be considered a generator matrix of some random linear [nk] code \(\mathcal C\), which is the column span of \(\mathcal C\). The secret \(\mathbf s \in \mathbb {F}_2^k\) is a message and the label vector \(\mathbf b \in \mathbb {F}_2^n\) is an erroneous encoding of \(\mathbf s\) with some error vector \(\mathbf e \in \mathbb {F}_2^n\) having components \(e_i\sim \text {Ber}_\tau \). Thus, decoding the codeword \(\mathbf b\) to the original message \(\mathbf s\) solves \( LPN _{k,\tau }\).

Decoding such a codeword \(\mathbf b\) can be done by finding an error-free index set as in Pooled Gauss. In coding theory language, such an error-free index set is called an information set. Thus, our Pooled Gauss algorithm is in this language an information set decoding algorithm, namely it resembles the well-known algorithm of Prange [28] from 1962. One should notice however that as opposed to the decoding scenario, we can fix the length n of \(\mathcal C\) ourselves.

Theorem 3

Pooled Gauss solves \(\text {LPN}_{k,\tau }\) with overwhelming success probability in time \(\tilde{\varTheta }\left( \frac{1}{(1-\tau )^k}\right) \) using \(\tilde{\varTheta }(k^2)\) samples and \(\tilde{\varTheta }(k^3)\) memory.

Proof

Pooled Gauss’ run time follows with the same reasoning as for Gauss’ running time. The outer loop will with overwhelming probability be executed at most \(\frac{\log ^2 k}{(1-\tau )^k}\) times, and all other parts can be performed in time \(\mathcal {O}(k^3)\). The sample complexity follows by our choice of n in Pooled Gauss. Storing n samples requires \(\tilde{\varTheta }(k^3)\) memory.

For the success probability, we would first like to notice that the probability for drawing k linearly independent vectors out of a pool even as small as \(n'=2k\) without replacement can easily be lower-bounded by \(\frac{1}{4}\). We will see, that the pool in our algorithm will be even bigger than that in the following. Therefore, by our choice of n and similar to the reasoning in the proof of Theorem 2, the inner loop of Pooled Gauss will always find an invertible \(A_I\) with overwhelming probability. So we condition our further analysis on this event.

Let Y be the number of error-free samples in the pool of n vectors. On expectation, we have \(\mathbb {E}[Y] = (1-\tau )n\). By using a Chernoff bound, we can show that we deviate by a factor of \(1-\frac{1}{k}\) from the expectation with probability at most

$$ Pr [Y\ge (1-\frac{1}{k})(1-\tau )n]\ge 1-e^{-\frac{(1-\tau )n}{2k^2}}.$$

By our choice of \(n= \omega \left( k^2\log k \right) \) the right hand side is \(1-k^{-\omega (1)}\), which is overwhelming.

We call any pool with at least \((1-\frac{1}{k})(1-\tau )n\) error-free samples good. Conditioned on the event G that our pool is good, we draw a batch of k error-free samples with probability

$$\begin{aligned} p\ge & {} \prod _{i=0}^{k-1}\frac{(1-\frac{1}{k})(1-\tau )n-i}{n} \ge \left( \frac{(1-\frac{1}{k}) (1-\tau )n-k}{n}\right) ^k\\= & {} (1-\frac{1}{k})^k(1-\tau )^k\left( 1-\frac{k}{n(1-\frac{1}{k})(1-\tau )}\right) ^k =\varOmega \left( (1-\tau )^k\right) . \end{aligned}$$

Now, following the same arguments with p instead of \((1-\tau )^k\) as in Theorem 2 gives us an overwhelming probability of success.     \(\square \)

4.1 Low-Noise LPN

Some interesting cryptographic applications require that the LPN error \(e_i \sim ~\text {Ber}_{\tau }\) has an error term \(\tau =\tau (k)\) depending on k. E.g. public key encryption seems to require some \(\tau (k)\) as small as \(\frac{1}{\sqrt{k}}\).

As a corollary from Theorem 3, we obtain that for any \(\tau (k)\) that approaches 0 for \(k \rightarrow \infty \), our Pooled Gauss algorithm runs – up to polynomial factors – in time \(e^{\tau (k)k(1+o(1))}\). This implies that for \(\tau (k) = o(\frac{1}{\log k})\) the run time of Pooled Gauss asymptotically outperforms the run time of BKW from Theorem 1.

Corollary 1

(Low Noise). Let \(\tau (k)\mathop {\longrightarrow }\limits ^{k\rightarrow \infty }~0\). Pooled Gauss solves \(\text {LPN}_{k,\tau (k)}\) with overwhelming success probability in time \(\tilde{\varTheta }\left( e^{\tau k(1+o(1))}\right) \) using \(\tilde{\varTheta }(k^2)\) samples and \(\tilde{\varTheta }(k^3)\) memory.

Proof

The run time statement follows by observing that

$${}\left( \frac{1}{1-\tau }\right) ^k=\left( \frac{1}{(1-\tau )^\frac{1}{\tau }}\right) ^{\tau k}=\left( \frac{1}{\frac{1}{e}-o(1)}\right) ^{\tau k}=\left( e+o(1)\right) ^{\tau k}=e^{\tau k(1+o(1))}.\quad \square $$

For small noise \(\tau (k) = \varOmega (\frac{1}{\sqrt{k}})\), i.e. a case that covers the mentioned encryption application above, we can also remove the error term \((1+o(1))\) in the exponent, meaning that Pooled Gauss achieves – up to polynomial factors – run time \(e^{\tau (k)k}\).

Corollary 2

(Really Low Noise). Let \(\tau (k)=\frac{1}{k^c}\) for \(c\ge \frac{1}{2}\). Pooled Gauss solves \(\text {LPN}_{k,\tau }\) with overwhelming success probability in time \(\tilde{\varTheta }\left( e^{k^{1-c}}\right) \), using \(\tilde{\varTheta }(k^2)\) samples and \(\tilde{\varTheta }(k^3)\) memory.

Proof

Since \(\ln (\frac{1}{1-x})=x+\frac{x^2}{2}+\mathcal {O}\left( x^3\right) \) for \(x\in [-1,1)\) we get

$$\left( \frac{1}{1-\frac{1}{k^c}}\right) ^k=e^{\ln \left( \frac{1}{1-\frac{1}{k^c}}\right) k}=e^{k^{1-c}+\frac{k^{1-2c}}{2}+\mathcal {O}\left( k^{1-3c}\right) }. $$

We see, that for \(c\ge \frac{1}{2}\), the last term is in \(\mathcal {O}\left( e^{k^{1-c}}\right) \) and for \(c<\frac{1}{2}\) it is not.     \(\square \)

4.2 Quantum Pooled Gauss

In a nutshell, Pooled Gauss runs until it finds an error-free batch of k LPN samples from a pool of n samples. The expected number of error-free samples in such a pool is \((1-\tau )n\). Hence, we search for an index set I in a total search space of size \(\left( {\begin{array}{c}n\\ k\end{array}}\right) \), in which we expect \(\left( {\begin{array}{c}(1-\tau )n\\ k\end{array}}\right) \) good index sets. Therefore, we expect

$$ T = \frac{\left( {\begin{array}{c}n\\ k\end{array}}\right) }{\left( {\begin{array}{c}(1-\tau )n\\ k\end{array}}\right) } $$

iterations of Pooled Gauss until we hit an error-free batch. It is not hard to show that T equals up to a polynomial the run time from Theorem 3.

The event of hitting an error-free batch can be modeled by the function \(f:\left( {\begin{array}{c}[n]\\ k\end{array}}\right) \rightarrow \{0,1\}\) that takes value \(f(I)=1\) iff I is an index set of k error-free LPN samples. More formally, we can define

$$\begin{aligned} f:\left( {\begin{array}{c}[n]\\ k\end{array}}\right) \rightarrow \{0,1\},\; I\mapsto {\left\{ \begin{array}{ll} 1_{A_I^{-1}\mathbf b_I=\mathbf s}, &{} A_I\in \text {GL}(\mathbb {F}_2^k)\\ 0, &{} A_I\notin \text {GL}(\mathbb {F}_2^k) \end{array}\right. }. \end{aligned}$$
(2)

Here, the characteristic function \(1_{A_I^{-1}\mathbf b_I=\mathbf s}\) takes value 1 iff we compute the correct secret key \(\mathbf s\), which is equivalent to I being an index set of k error-free LPN samples. In our algorithm Pooled Gauss the evaluation of \(1_{A_I^{-1}\mathbf b_I=\mathbf s}\) is done by Test, which may err with negligible probability. But assume for a moment that we have a perfect instantiation of f.

Using f, the task of Pooled Gauss is to find an index set \(I^*\) among all index sets from \(\left( {\begin{array}{c}[n]\\ k\end{array}}\right) \) such that \(f(I^*)=1\), which can be done classically in expected time

$$T = \frac{\left( {\begin{array}{c}n\\ k\end{array}}\right) }{|f^{-1}(1)|}.$$

We can now speed up Pooled Gauss quantumly by applying Boyer et al.’s [10] version of Grover search [17], which results in run time \(\sqrt{T}\). It is worth to point out that Boyer et al.’s algorithm works even in our case, where we do not know the number \(|f^{-1}(1)|\) of error-free index sets. All that the algorithm requires is oracle access to the function f, for which we show that this oracle access can be perfectly simulated by Test. This results in Algorithm 5 that we call Quantum Pooled Gauss.

figure e

Theorem 4

Quantum Pooled Gauss quantumly solves \(\text {LPN}_{k,\tau }\) with overwhelming probability in time \(\tilde{\varTheta }\left( \left( \frac{1}{1-\tau }\right) ^{\frac{k}{2}}\right) \), using \(\tilde{\varTheta }(k^2)\) queries and \(\tilde{\varTheta }(k^3)\) memory.

Proof

According to [10], Grover succeeds with overwhelming success probability. Hence, the proof of Theorem 3 essentially carries over to the quantum setting.

However, it remains to show that we can safely replace oracle access to the function f as defined in Eq. (2) by Test, which in turn defines some function

$$\begin{aligned}\tilde{f}:\left( {\begin{array}{c}[n]\\ k\end{array}}\right) \rightarrow \{0,1\},\; I\mapsto {\left\{ \begin{array}{ll} \texttt {Test}(A_I^{-1}\mathbf b_I, \tau , \left( {\begin{array}{c}n\\ k\end{array}}\right) ^{-2},\left( {\begin{array}{c}n\\ k\end{array}}\right) ^{-2}), &{} A_I\in \text {GL}(\mathbb {F}_2^k)\\ 0, &{} A_I\notin \text {GL}(\mathbb {F}_2^k) \end{array}\right. }.\end{aligned}$$

We will show that by our choice of \(\alpha = \beta = \left( {\begin{array}{c}n\\ k\end{array}}\right) ^{-2}\) with overwhelming probability \(f(I) = \tilde{f(I)}\) for all \(\ I\), i.e. we perfectly simulate f. Let us define a random variable X that counts the number of inputs in which f and \(\tilde{f}\) disagree, i.e.

$$X:=\left| \{I\in \left( {\begin{array}{c}[n]\\ k\end{array}}\right) \;|\; \tilde{f}(I)\not =f(I)\}\right| .$$

Notice that \(X\sim \text {Bin}_{\left( {\begin{array}{c}n\\ k\end{array}}\right) , \le \alpha }\), since by Lemma 3 Test errs with probability (at most) \(\alpha =\beta \) for all of the \(\left( {\begin{array}{c}n\\ k\end{array}}\right) \) sets I. We obtain

$$ Pr [\tilde{f}=f]= Pr [X=0]\ge \left( 1-\frac{1}{\left( {\begin{array}{c}n\\ k\end{array}}\right) ^2}\right) ^{\left( {\begin{array}{c}n\\ k\end{array}}\right) }\ge 1-\frac{1}{\left( {\begin{array}{c}n\\ k\end{array}}\right) },$$

where we use Bernoulli’s inequality for the last step. Since we chose \(n = \omega (k^2)\), we have \(\left( {\begin{array}{c}n\\ k\end{array}}\right) \ge \left( \frac{n}{k}\right) ^k = \omega (k^k)\). This implies \( Pr [\tilde{f}=f]=1 - \text {negl}(k)\), as required.     \(\square \)

Remark 3

Notice that our slight modification from Gauss to Pooled Gauss enables the use of quantum techniques. While both algorithms Gauss and Pooled Gauss achieve the same running time T, Gauss also requires (roughly) T samples. But any algorithm with sample complexity T has automatically run time lower bound \(\varOmega (T)\), since our LPN oracle is by Definition 1 classical and each oracle access costs \(\varOmega (1)\).

So while there is good motivation to reduce the number of samples, it is somewhat unsatisfactory from a cryptanalysts’ point of view to make only limited use of an LPN oracle by restricting to a polynomial number of samples. In the next section, we will show how more extensive queries give rise to a better suited pool of vectors that will further speed up Pooled Gauss.

5 Decoding LPN with Preprocessing

Our idea is to add some preprocessing to Pooled Gauss that produces LPN samples with \(\mathbf a_i\) of smaller dimension \(k'\) by zeroing some columns in the A-matrix. This may come at the cost of slightly increasing the noise parameter \(\tau \). This idea gives rise to the following meta algorithm Dim-Decode.

figure f

In the following, we give different instantiations of Dim-Decode. We start by looking at techniques for the Modify step for dimension reduction.

5.1 Improvements Using only Polynomial Memory

Our first simple technique is to keep only those LPN samples \((\mathbf a, b)\) that have zeros in the last \(k-k'\) coordinates of \(\mathbf a\). We will balance the running time for steps Modify and Decode by choosing \(k'\) accordingly. This results in Algorithm 7 that we call Well-Pooled Gauss.

Notice that by our choice of \(k'\), Well-Pooled Gauss reduces the dimension to a \(\frac{1}{1+\log (\frac{1}{1-\tau })}\)-fraction of k. Since \(\tau \in [0,\frac{1}{2})\) we have

$$\frac{1}{1+\log (\frac{1}{1-\tau })} \in (\frac{1}{2},1],$$

meaning that \(k' \ge \frac{k}{2}\) or in other words that Pooled Gauss in its first run recovers at least the first half of the bits of \(\mathbf s\), and in its second run the remaining half.

figure g

Hence the run time of Pooled Gauss’ first application dominates the run time of its second application. Since Pooled Gauss’ run time depends exponentially on k, we can gain up to a square root in the running time when we reduce the dimension up to \(\frac{k}{2}\).

Theorem 5 (Well-Pooled Gauss)

Well-Pooled Gauss solves \(\text {LPN}_{k,\tau }\) with overwhelming probability in time and query complexity

$$\tilde{\varTheta }\left( \left( \frac{1}{(1-\tau )^k}\right) ^\frac{1}{1+\log \left( \frac{1}{1-\tau }\right) }\right) $$

using \(\tilde{\varTheta }(k^3)\) memory.

Proof

Pooled Gauss’s first application runs in time \(T:=\tilde{\varTheta }\left( \frac{1}{(1-\tau )^{k'}}\right) \), which is the claimed total running time. Furthermore, by Lemma 1 with overwhelming probability the run time of the Modify step for finding \(n+m\) samples with last \(k-k'\) 0-coordinates is bounded by

$$\begin{aligned} 2^{k-k'}\left( n+m+\log ^2 k\right)= & {} \tilde{\varTheta }\left( 2^{k-k'}\right) = \tilde{\varTheta } \left( 2^{\left( 1-\frac{1}{1+\log (\frac{1}{1-\tau })}\right) k} \right) \\= & {} \tilde{\varTheta } \left( 2^{\frac{\log \left( \frac{1}{1-\tau }\right) }{1+\log \left( \frac{1}{1-\tau }\right) }k} \right) = T. \end{aligned}$$

    \(\square \)

5.2 Quantum Improvements with Polynomial Memory

In the quantum version of Well-Pooled Gauss, called Quantum Well-Pooled Gauss, we simply replace in Algorithm 7 the Pooled Gauss procedure by its quantum version. Notice that by Remark 3 we cannot provide a quantum version of our Modify step. So we cannot expect to gain another full square root by going to the quantum version of Well-Pooled Gauss.

The following theorem shows that in Quantum Well-Pooled Gauss one should take the parameter choice \(k' := \frac{2}{2+ \log \left( \frac{1}{1-\tau }\right) }k > \frac{2}{3} k\). The Quantum Pooled Gauss routine then runs in time

$$T:=\tilde{\varTheta }\left( \left( \frac{1}{1-\tau }\right) ^{\frac{k'}{2}}\right) .$$

Hence in comparison with Well-Pooled Gauss (Theorem 5) we gain at most an additional factor of \(\frac{2}{3}\) in the exponent.

Theorem 6 (Quantum Well-Pooled Gauss)

Quantum Well-Pooled Gauss quantumly solves \(\text {LPN}_{k,\tau }\) with overwhelming probability in time and query complexity

$$\tilde{\varTheta }\left( \left( \frac{1}{(1-\tau )^k}\right) ^\frac{1}{2+\log \left( \frac{1}{1-\tau }\right) }\right) ,$$

using \(\tilde{\varTheta }(k^3)\) space.

Proof

Analogous to the proof of Well-Pooled Gauss’ run time (Theorem 5), in its quantum version the run time of the first Quantum Pooled Gauss routine is \(T = \tilde{\varTheta }((\frac{1}{1-\tau })^{\frac{k'}{2}})\), which is the claimed total run time. T dominates the run time of the second call for Quantum Pooled Gauss, since \(k' > \frac{2}{3} k\). T also upper bounds the run time of the Modify step, since by Lemma 1 with overwhelming probability for obtaining \(n+m\) of the desired form it takes at most time

$$ 2^{k-k'}\left( n+m+\log ^2 k\right) = \tilde{\varTheta }(2^{k-k'}) = \tilde{\varTheta }\left( 2^{\frac{\log \left( \frac{1}{1-\tau }\right) }{2+\log \left( \frac{1}{1-\tau }\right) } k}\right) = T. $$

    \(\square \)

5.3 Using Memory – Building a Bridge Towards BKW

Up to now, for instantiating Algorithm 6 in Well Pooled Gauss we made only somewhat naive use of our LPN oracle by storing only those vectors in the stream of all oracle answers that were already dimension-reduced. The reason for this was our restriction to polynomial memory consumption in order to achieve highly efficient algorithms in practice.

Optimally, we could tune our LPN algorithm by the memory that is available on our machine. Let us say, we have memory M and we are looking for the fastest algorithm that uses at most M memory cells. In this scenario, we are free to store more LPN samples and to add them until they provide us dimension-reduced samples, at the cost of a growing error \(\tau ' > \tau \) determined by the Piling-up Lemma (Lemma 2).

Notice that this is exactly the strategy of BKW. But as opposed to the BKW algorithm, which basically reduces the dimension k all the way down to 1, we allow – due to our constraint memory M – only limited dimension reduction down to some \(k'\). Since afterwards, we will in Algorithm 6 resort again to a Decoding step, the optimal choice of \(k'\) is not determined by the size of M alone, but also by the growth of the error \(\tau '\) that our decoding procedure can handle.

More precisely, the following Algorithm 8, called Hybrid, in a first step uses the naive strategy of Well-Pooled Gauss to decrease the dimension by \(k_1\), while leaving \(\tau \) unchanged. Then in a second step it uses c BKW-iterations on blocks of size d to further reduce the dimension by \(k_2\), thereby increasing to error \(\tau '\) which grows double-exponentially in c.

figure h

Hybrid instantiated with only polynomial memory leaves out all BKW-iterations and boils down to Well-Pooled Gauss. Hybrid instantiated with sufficiently memory achieves BKW’s time complexity \(2^{k/\log (\frac{k}{\tau })(1+o(1))}\). Thus, Hybrid provides a perfect interpolation between both algorithms.

Theorem 7 (Hybrid)

Using \(\tilde{\mathcal{O}}(M)\) space, Hybrid solves \( LPN _{k,\tau }\) with overwhelming probability in time and query complexity \(\tilde{\mathcal{O}}(M\cdot 2^{k_1})\), where \(k_1\) is as defined in Hybrid.

For \(M=\text {poly}(k)\) we get the same time, memory and sample complexity as in Well-Pooled Gauss (Theorem 5).

Choosing \(M=2^{k/\log (\frac{k}{\tau })(1+o(1))}\) gives us the complexities of BKW (Theorem 1).

Proof

For the Decoding step, we need \(m=\tilde{\varTheta }\left( \frac{1}{(\frac{1}{2}-\tau ')^2}\right) =\tilde{\varTheta }\left( \frac{1}{(1-2\tau )^{2^{c+1}}}\right) \) samples for the hypothesis test and \(n=\text {poly}(k)\) samples for the pool at the end. That’s why we initially need to feed the BKW step \(N=m+n+c2^d=\tilde{\varTheta }\left( 2^{\max \left( \frac{k_2}{c},2^c\log (\frac{1}{(1-2\tau )^2})\right) }\right) \) samples. As in Well-Pooled Gauss we can create these samples in time \(\tilde{\varTheta }(N\cdot 2^{k_1})\). Therefore with overwhelming probability the Modify and Decoding steps in Hybrid take time \( \tilde{\varTheta }\left( N\cdot 2^{k_1}+N+\left( \frac{1}{1-\tau '}\right) ^{k'} m\right) \)

$$=\tilde{\varTheta }\left( 2^{k_1+\max \left( \frac{k_2}{c},2^c\log (\frac{1}{(1-2\tau )^2})\right) }+2^{\log \left( \frac{1}{1-\tau '}\right) (k-k_1-k_2)+2^c\log \left( \frac{1}{(1-2\tau )^2}\right) }\right) ,$$

using \(\tilde{\varTheta }\left( N\cdot 2^{k_1}\right) \) samples and \(\tilde{\varTheta }\left( N\right) \) memory. Our choice of \(k_1\) in Hybrid balances both run time summands. Our choice of \(k_2\) then gives us the stated time, sample and memory complexity. Notice that the Complete step takes less time, queries and memory, since here we solve smaller \(\text {LPN}\) instances.

It remains to show that Hybrid contains as special cases Well-Pooled Gauss and BKW. For some \(\varepsilon >0\), let us define similar to Hybrid

$$c(k,\tau ,M):=\max \left( 0,\log \left( \frac{1}{\tau ^{1-\varepsilon }k^\varepsilon }\log \left( M\right) \log \left( \frac{k}{\tau }\right) \right) \right) .$$

For the choice \(M=\text {poly}(k)\) we get \(c=o(1)\) and \(k_2=o(1)\), which means that we do not perform any BKW steps. Thus, Hybrid is identical to Well-Pooled Gauss. For the choice \(M=2^{(1+o(1))k/\log \left( \frac{k}{\tau }\right) }\) we obtain \(c=(1-\varepsilon )\log (\frac{k}{\tau })+o(1)\), \(k_2=k-o(1)\) and \(k_1=o(1)\), giving us the complexities of BKW from Theorem 1.     \(\square \)

5.4 Using Memory – Advanced Decoding Algorithms

In the previous Sect. 5.3 we provided a time-memory tradeoff for Algorithm 6 Dim-Decode by looking at the Modify step only. In this section, we will focus on the Decode step. So far, we only used the quite naive Gauss decoding procedure, which resembles Prange’s information set decoding algorithm [28]. However, within the last years there has been significant progress in information set decoding, starting with Ball-Collision Decoding [6], then followed by a series of papers, MMT [26], BJMM [4] and May-Ozerov [27], using the so-called representation technique.

In principle, one can freely choose the preferred decoding procedure in Decode. Our starting point was a simplified version of BJMM, but then on optimizing the BJMM parameters we found out that for the LPN instances under consideration (see Sect. 6), MMT performs actually best. Since BJMM offers asymptotically a better run time than MMT, the situation should however change for (very) large LPN dimension k. We also did not consider the Nearest Neighbor algorithm May-Ozerov, since its large polynomial run time factor currently prevents speedups in the parameter ranges that we consider.

Instantiating Dim-Decode with MMT decoding results in Algorithm 9, called Well-Pooled MMT.

figure i

Unfortunately, [26] does not provide a closed run time formula for MMT. Therefore, we cannot give a theorem for Well-Pooled MMT that provides a precise bound for the time complexity as a function of \(k, \tau \) as in the previous sections. However, we are able to optimize the run time of Well-Pooled MMT for every fixed \(\tau \) as a function of k.

Let us conjecture that Well-Pooled MMT’s time complexity is \(2^{c(\tau )k}\). Recall from Theorem 5 that Well-Pooled Gauss’s time complexity is

$$ \tilde{\varTheta }\bigg (2^{\frac{\log \left( \frac{1}{(1-\tau )}\right) }{1+\log \left( \frac{1}{1-\tau }\right) } k}\bigg ) = \tilde{\varTheta }\left( 2^{c'(t) k} \right) . $$

We can plot both functions \(c(\tau )\) and \(c'(\tau )\) as a function of \(\tau \). The following graph in Fig. 1 visualizes that the run time exponent \(c(\tau )\) of Well-Pooled MMT is smaller than the run time exponent \(c'(\tau )\) of Well-Pooled Gauss for every \(\tau \in [0, \frac{1}{2})\), as one would expect. The largest gap appears at \(\tau = \frac{1}{4}\), where Well-Pooled MMT achieves time \(2^{0.282k}\), whereas Well-Pooled Gauss requires time \(2^{0.293k}\).

Fig. 1.
figure 1

Well-Pooled Gauss and Well-Pooled MMT

Remark 4

Since we do not know how to express the running time of MMT as a closed formula, a function that approximates \(c(\tau )\) reasonably well is

$$\frac{\log \left( \frac{1}{1-\frac{7}{6}\tau }\right) }{\log \left( \frac{12}{5}\right) +\log \left( \frac{1}{1-\frac{7}{6}\tau }\right) }.$$

It is worth noticing that MMT, as opposed to Prange, consumes exponential memory. However, the memory consumption is still quite moderate. For the LPN instances that we considered the memory never exceeded \(2^{\frac{c(\tau )k}{2}}\).

Quantumly we can also speed up our Decode step. Naively, i.e. just using Grover search, MMT collapses to Quantum Gauss. But using quantum random walks, one obtains a Quantum Well-Pooled MMT algorithm, as recently proposed by Kachigar and Tillich [22], that slightly outperforms Quantum Gauss.

6 Classical and Quantum Bit Security Estimates

In LPN cryptanalysis, it is currently practice to give estimated bit complexities, that is the binary logarithm of the running time, for newly developed algorithms in table form. In order to allow some comparison with existing work and to give an impression how our algorithms might perform for LPN parameters of cryptographic interest, we also give tables in the following. However, at the same time we want to express a clear warning that all LPN tables should be taken with care, since the given bit security levels might over- or underestimate the real performance in practice.

Therefore, we also provide experimental results in Sect. 7 on medium-size LPN parameter, which we extrapolate to cryptographic size by our asymptotic formulas. In our opinion, this is the only reliable way of predicting key sizes with good accuracy. Nevertheless, experimental results might not be possible even for medium-size parameters in some cases, e.g. when an algorithm consumes large memory or when we predict quantum running-times. Hence, performance tables are in these cases unavoidable.

Since we care about practical memory consumption in this work, we enforce upper limits on the available memory for the considered algorithms. We first consider \(2^{60}\) bits, since that is more memory than the biggest RAM we are aware of. Second, we consider a bound of \(2^{80}\) bits for an extra safety margin.

For these upper bounds on the memory M, we compute the bit complexities of our algorithms and compare them to the Coded-BKW [18] table from Bogos et  al. [8].

The formulas we used for computing the bit complexities are available at https://github.com/Memphisd/LPN-decoded. In these formulas, we tried to take any polynomial factors into account. As in [8], the cost of the Complete step is not taken into account in the tables, but the cost of Complete would not significantly change the data.

Table 2. Coded-BKW [8], \(M=2^{60}\)
Table 3. Coded-BKW [8], \(M=2^{80}\)
Table 4. WP MMT or Hybrid, \(M=2^{60}\)
Table 5. WP MMT or Hybrid, \(M=2^{80}\)
Table 6. Quantum Hybrid, \(M=2^{60}\)
Table 7. Quantum Hybrid, \(M=2^{80}\)

One can see in Table 2 that for the chosen LPN parameters only 7 parameter sets can be solved with Coded-BKW within the memory limit of \(2^{60}\). Using instead \(2^{80}\) bits of memory as in Table 3, Coded-BKW can be used or a larger range of instances, as one would expect.

Let us compare this to Tables 4 and 5, where we used either Hybrid, marked as bold face entries, or Well-Pooled MMT (WP MMT). Interestingly, most instances are optimally solved with Well-Pooled MMT using always less than \(2^{30}\) bits of memory. For small memory, Hybrid collapses to Well-Pooled Gauss. However, Well-Pooled MMT outperforms Well-Pooled Gauss for the given instances. Interestingly, most instances – especially those of cryptographic interest – are solved via Well-Pooled MMT, that is with pure decoding and without using the given memory.

Note that according to our predictions, 80-bit security on classical computers for \(\text {LPN}_{\frac{1}{\sqrt{k}}, k}\) can only be achieved for \(k \ge 2048\). This makes current applications of LPN for encryption quite inefficient.

Coded-BKW shows its strength for errors around \(\frac{1}{8}\). However, the same techniques could be used in the Modify step of our Hybrid algorithm, which would result in a similar running time.

Tables 6 and 7 finally state the quantum bit security levels when taking Quantum Well-Pooled Gauss inside Hybrid. Here again, the bold marked entries are those where the optimization suggests to use BKW steps. In comparison to the classical case, these are even less instances. We see that the prominent cryptographic choice \(\text {LPN}_{512, \frac{1}{8}}\) offers only 69-bit security on quantum computers.

NIST’s Post-Quantum Call. NIST [1] asks for classical security levels of 128, 192, 256 bit and quantum security levels of 64, 80, 128 bit. Tables 8, 9 and 10 define the minimal k that fulfill these levels for \(\tau \) taking values \(\frac{1}{\sqrt{k}}\), \(\frac{1}{8}\), \(\frac{1}{4}\), respectively. The memory is constrained to \(M=2^{80}\) bits.

Table 8. \(\tau =\frac{1}{\sqrt{k}}\)
Table 9. \(\tau =\frac{1}{8}\)
Table 10. \(\tau =\frac{1}{4}\)

7 Experiments

All our implementations are available via

https://github.com/Memphisd/LPN-decoded.

Our experiments were done on a server with four 16-core-processors, allowing a parallelization of 64 threads, using 256 GB \(< 2^{41}\) bit RAM.

We executed Well-Pooled Gauss and Well-Pooled MMT for \(\tau = \frac{1}{8}, \frac{1}{4}\) and various k. In order to get reliable run times for \(\tau =\frac{1}{8}\) and \(k < 170\), respectively \(\tau = \frac{1}{4}\) and \(k< 100\), we averaged the run time over 30 instances. For larger k, we solved only a single instance.

The results for \(\tau = \frac{1}{8}\) and \(\tau = \frac{1}{4}\) are shown in Figs. 2 and 3, respectively. Here we plot the logarithm of the running time in msec as a function of k. Hence negative values mean that it takes only a fraction of a msec to solve the instance. For Well-Pooled Gauss we plotted as a comparison the asymptotic line with slope

$$ \frac{\log _2(\frac{1}{1-\tau })}{1+\log _2(\frac{1}{1-\tau })}, $$

which follows from Theorem 5. For Well-Pooled MMT we numerically computed the slopes 0.177 and 0.381 for \(\tau = \frac{1}{8}\) and \(\tau = \frac{1}{4}\), respectively, similar to the computation in Fig. 1.

Fig. 2.
figure 2

Experimental results for \(\tau =\frac{1}{8}\)

Fig. 3.
figure 3

Experimental results for \(\tau =\frac{1}{4}\)

As can be seen in Figs. 2 and 3, as expected the experiments take slightly longer than the asymptotic prediction, since the asymptotic hides polynomial factors. But especially for Well-Pooled MMT large values of k are close to the asymptotic line, which means that the asymptotic quite accurately predicts the running time.

Since Well-Pooled MMT’s run time includes quite large polynomials factors, due to MMT, we expected that it outperforms Well-Pooled Gauss only for large values of k. To our surprise, the break even point for both algorithms was only \(k =78\) for \(\tau = \frac{1}{8}\), and \(k=9\) for \(\tau = \frac{1}{4}\). Hence, for these error rates \(\tau \) one should always prefer Well-Pooled MMT over Well-Pooled Gauss even for relatively small sizes of k.

Largest instances. The largest instances that we solved with Well-Pooled MMT were \(k=243, \tau = \frac{1}{8}\) and \(k=135, \tau = \frac{1}{4}\). Let us provide more details for these computations.

For \(k=243, \tau = \frac{1}{8}\), we first computed in almost 7 days a pool of samples \((\mathbf a_i, b_i)\) where the \(\mathbf a_i\) had their last 35 coordinates equal to zero. The resulting \(\text {LPN}_{208, \frac{1}{8}}\) was solved with MMT in 8 days, resulting in a total of 15 days. This recovers already 208 coordinates of \(\mathbf s\). The Complete step in Well-Pooled MMT that recovers the remaining 35 coordinates took less than a second.

For \(k=135, \tau = \frac{1}{4}\), the preprocessing step on again 35 coordinates took almost 6 days, the decoding step 8 days and Complete less than a second, resulting in a total of 14 days. We summarize the results in Table 11.

Table 11. Solved instances

Extrapolation to \(k=512\) . Let \(T(k, \tau )\) be the time to solve an \( LPN _{k,\tau }\) instance via Well-Pooled MMT as computed numerically in Fig. 1. Then it would take us a factor of \(\frac{T(512, \frac{1}{8})}{T(243, \frac{1}{8})} \approx 2^{41}\) longer to break an \(\text {LPN}_{512, \frac{1}{8}}\) than an \(\text {LPN}_{243, \frac{1}{8}}\) instance. For \(\tau = \frac{1}{4}\) we would even need an additional factor of \(\frac{T(512, \frac{1}{4})}{T(135, \frac{1}{4})} \approx 2^{113}\).

To make these kind of statements more trustworthy, we should check, if the numerically computed running times resemble the experimental data. Therefore consider the following: It took about \(2^{6.4}\) times as long to solve \(\text {LPN}_{135, \frac{1}{4}}\) than solving \(\text {LPN}_{113, \frac{1}{4}}\) in the experiments. Using formulas, there is a gap of \(2^6\) between these instances, which is close to what we actually measured.

Hence, both instances seem to provide sufficient classical security, but also recall from Sect. 6 that \(\text {LPN}_{512, \frac{1}{8}}\) only offers 69-bit quantum security.

Hybrid implementation. We solved \(\text {LPN}_{113, \frac{1}{4}}\) in less than 2 h, which in comparison took for Well-Pooled Gauss around 2 days and for Well-Pooled MMT still almost 4 h. We were also able to solve \(\text {LPN}_{135, \frac{1}{4}}\) using Hybrid in 5.69 days. In comparison, solving this instance with Well-Pooled MMT took 13.84 days.

Let us provide some more details of both computations, starting with \(\text {LPN}_{113, \frac{1}{4}}\). We first computed a pool of \(2^{33}\) samples with \(k_1=3\) bits fixed in 8 min. BKW then eliminated \(k_2=93\) bit in \(c=3\) iterations with block-size \(d=31\) in 56 min. This gave an \(\text {LPN}_{17, \frac{255}{512}}\) instance, which Gauss solved in 9 min.

Thus, we recovered the first 17 bits of \(\mathbf s\), which we then eliminated from our pool, resulting in an \(\text {LPN}_{93, \frac{1}{4}}\) instance. In a second iteration, BKW eliminated \(k_2=78\) bit in \(c=3\) iterations with block-size \(d=26\) in 3 min. The resulting \(\text {LPN}_{15, \frac{255}{512}}\) instance was solved by Gauss in 6 min.

After eliminating these further 15 bits from our pool, we are left with an \(\text {LPN}_{78, \frac{1}{4}}\) instance, which was directly solved by MMT in another 19 min. The remaining \(k_1=3\) bits were brute-forced in 125 msec.

Thus, in total it took us only 101 min to solve \(\text {LPN}_{113, \frac{1}{4}}\) with Hybrid.

For solving \(\text {LPN}_{135, \frac{1}{4}}\), we computed again a pool of \(2^{33}\) samples with \(k_1=10\) coordinates fixed in 1.47 days. BKW then eliminated \(k_2=99\) bits in \(c=3\) iterations taking 36 min, which resulted in \(2^{21.4}\) samples with \(k_1+k_2=109\) coordinates fixed.

This amount of samples is not yet sufficient to achieve a good success probability for solving the remaining \(\text {LPN}_{26, \frac{255}{512}}\) by Gauss. In order to increase the input samples to Gauss we computed a new pool of \(2^{32}\) samples with \(k_1=10\) coordinates fixed in 0.74 days, and exchanged half of the samples of the bigger pool that we compute in the beginning. This altered pool of size \(2^{33}\) was then used as input to BKW to eliminate again \(k_2=99\) bits in \(c=3\) iterations in another 35 min. This allowed us to double the size of the input pool for Gauss while increasing the runtime only by a factor of \(\approx 1.5\).

After acquiring enough input samples, solving \(\text {LPN}_{26, \frac{255}{512}}\) by Gauss took 3.12 days. The remaining \(\text {LPN}_{109, \frac{1}{4}}\) instance could be solved by another iteration and subsequent execution of MMT in less then 7 h. In total, solving \(\text {LPN}_{135, \frac{1}{4}}\) took us 5.69 days with Hybrid.