Solving LPN Using Covering Codes

We present a new algorithm for solving the LPN problem. The algorithm has a similar form as some previous methods, but includes a new key step that makes use of approximations of random words to a nearest codeword in a linear code. It outperforms previous methods for many parameter choices. In particular, we can now solve the (512,18)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(512,\frac{1}{8})$$\end{document} LPN instance with complexity less than 280\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{80}$$\end{document} operations in expectation, indicating that cryptographic schemes like HB variants and LPN-C should increase their parameter size for 80-bit security.


Introduction
In recent years of modern cryptography, much effort has been devoted to finding efficient and secure low-cost cryptographic primitives targeting applications in very constrained hardware environments (such as RFID tags and low-power devices). Many proposals rely on the hardness assumption of Learning Parity with Noise (LPN), a fundamental problem in learning theory, which recently has also gained a lot of attention within the cryptographic society. The LPN problem is well studied, and it is intimately related to the problem of decoding random linear codes, which is one of the most important problems in coding theory. Being a supposedly hard problem, the LPN problem is a * This paper is an extended version of [16] (https://doi.org/10.1007/978-3-662-45611-8_1). This paper was solicited by the Editors-in-Chief as the best paper from ASIACRYPT 2014, based on the recommendation of the program committee good candidate for post-quantum cryptography, where other classically hard problems such as factoring and the discrete log problem fall short. The inherent properties of LPN also make it ideal for lightweight cryptography.
The LPN problem can be informally stated as follows. We have an LPN oracle denoted Π LPN that returns pairs of the form (g, x, g + e), where x is an unknown but fixed binary vector, g is a binary vector with the same length but sampled from a uniform distribution, e is from a Bernoulli distribution, and x, g denotes the scalar product of vectors x and g. The (search) LPN problem is to find the secret vector x given a fixed number of samples (oracle queries) from Π LPN .
The first time the LPN problem was employed in a cryptographic construction was in the Hopper-Blum (HB) identification protocol [21]. HB is a minimalistic protocol that is secure in a passive attack model. Aiming to secure the HB scheme also in an active attack model, Juels and Weis [22], and Katz and Shin [23] proposed a modified scheme. The modified scheme, which was given the name HB + , extends HB with one extra round. It was later shown by Gilbert et al. [15] that the HB + protocol is vulnerable to active attacks, in particular man-in-the-middle attacks, where the adversary is allowed to intercept and attack an ongoing authentication session to learn the secret. Gilbert et al. [13] subsequently proposed a variant of the Hopper-Blum protocol called HB # . Apart from repairing the protocol, the constructors of HB # introduced a more efficient key representation using a variant of LPN called Toeplitz-LPN.
Gilbert et al. [14] proposed a way to use LPN in encryption of messages, which resulted in the cryptosystem LPN-C. Kiltz et al. [24] and Dodis et al. [10] showed how to construct message authentication codes (MACs) using LPN. The existence of MACs allows one to construct identification schemes that are provably secure against active attacks. The most recent contribution to LPN-based constructions is a two-round identification protocol called Lapin, proposed by Heyse et al. [20], and an LPN-based encryption scheme called Helen, proposed by Duc and Vaudenay [11]. The Lapin protocol is based on an LPN variant called Ring-LPN, where the samples are elements of a polynomial ring.
The two major threats against LPN-based cryptographic constructions are generic algorithms that decode random linear codes (information-set decoding (ISD)) and variants of the BKW algorithm, originally proposed by Blum et al. [3]. Being the asymptotically most efficient 1 approach, the BKW algorithm employs an iterated collision procedure on the queries. In each iteration, colliding entries sum together to produce a new entry with smaller dependency on the information bits but with an increased noise level. Once the dependency from sufficiently many information bits is removed, the remaining are exhaustively searched to find the secret. Although the collision procedure is the main reason for the efficiency of the BKW algorithm, it leads to a requirement of an immense amount of queries compared to ISD. Notably, for some cases, e.g., when the noise is very low, ISD yields the most efficient attack.
Levieil and Fouque [29] proposed to use fast Walsh-Hadamard transform in the BKW algorithm when searching for the secret. In an unpublished paper, Kirchner [25] suggested to transform the problem into systematic form, where each information (key) bit then appears as an observed symbol, perturbed by noise. This requires the adversary to only exhaust the biased noise variables rather than the key bits. When the error rate is  1 8  low, the noise variable search space is very small and this technique decreases the attack complexity. Building on the work by Kirchner [25], Bernstein and Lange [4] showed that the ring structure of Ring-LPN can be exploited in matrix inversion, further reducing the complexity of attacks on for example Lapin. None of the known algorithms manage to break the 80 bit security of Lapin. Nor do they break the parameters proposed in [29], which were suggested as design parameters of LPN-C [14] for 80-bit security.

Contribution
In this paper, we propose a new algorithm for solving the LPN problem based on [4,25].
We employ a new technique that we call subspace distinguishing, which exploits coding theory to decrease the dimension of the secret. The trade-off is a small increase in the sample noise. Our novel algorithm performs favorably in comparison to the state-of-theart algorithms and affects the security of HB variants, Lapin and LPN-C. As an example, we attack the common (512, 1 8 )-instance of LPN and question its 80-bit security barrier. A comparison of complexity of different algorithms 2 is shown in Table 1.
Let us explain the main idea of the paper in an informal way. The BKW algorithm will in each step remove the influence of b secret bits by colliding subvectors, at the cost of increasing the noise. So we can model a single step as reducing an LPN problem of dimension n and bias to an LPN problem of dimension n − b and bias 2 . The new main idea is that one can remove more secret bits if we now collide subvectors (linear combinations of secret bits) that are close in Hamming distance, but not necessarily the same. This will leave a few secret bits in each expression, but as the secret bits are biased and they can be considered as an additional noise term. Such a step reduces an LPN problem of dimension n and bias to an LPN problem of dimension n − B, where B is much larger than b, and the new bias is a bit smaller than 2 . It is shown that LPN solvers that perform this new approach in the last step get an improved performance.

Subsequent Work
After the submission of this paper, a number of papers have appeared that further refine and improve upon this work. We mention the work of [5,6,32], and [12]. 2 The Bernstein-Lange algorithm is originally proposed for Ring-LPN, and by a slight modification [4], one can also apply it to the LPN instances. This modified algorithm shares several beginning steps (i.e., the steps of Gaussian elimination and the collision procedure) with the new algorithm, so we use the same implementation of these steps when computing their complexity, for a fair comparison.
To be specific, Bogos, Tramèr, and Vaudenay [5] presented a unified framework to study the existing LPN algorithms and also a tight theoretical bound to analyze the data complexity by using the Hoeffding bounds. Later, Zhang et al. [32] proposed a new method to analyze the bias introduced by the concatenation of several perfect codes, where the bias average rather than the bias conditioned on certain keys is employed. Bogos and Vaudenay [6] further clarified the underlying heuristic approximation and generalized the average bias analysis. They considered concrete code construction using concatenations of perfect and quasi-perfect codes. Note that firstly we can treat searching for large decodable linear codes with good covering property as a pre-computation task, and secondly, the analysis using bias average could produce a lower complexity estimation, which has been verified in our experiments where our bias estimation conditioned on key patterns matches the experimental data but is slightly conservative. In a recent paper [12], the idea of combining BKW and ISD was further investigated by Esser, Kübler and May.

Organization
The organization of the paper is as follows. In Sect. 2, we give some preliminaries and introduce the LPN problem in detail. Moreover, in Sect. 3 we give a short description of the BKW algorithm. We briefly describe the general idea of our new attack in Sect. 4 and more formally in Sect. 5. In Sect. 6, we analyze its complexity. The results when the algorithm is applied on various LPN-based cryptosystems are given in Sect. 7, which is followed by a section showing the experimental results. In Sect. 9, we describe some aspects of the covering-coding technique. Section 10 concludes the paper.

The LPN Problem
We now give a more thorough description of the LPN problem. Let Ber η be the Bernoulli distribution and let X ∼ Ber η be a random variable with alphabet X = {0, 1}. Then, The bias of X is given from Pr [X = 0] = 1 2 (1 + ), i.e., = 1 − 2η. Let k be a security parameter, and let x be a binary vector of length k. We define the Hamming weight of a vector v as the number of nonzero elements, denoted by w H (v), and let B 2 (n, w) denote the Hamming ball which contains all the elements in F n 2 whose Hamming weight is no larger than w.
where e ← Ber η . Here, x, g denotes the scalar product of vectors x and g.
We also write x, g as x · g T , where g T is the transpose of the row vector g. We receive a number n of noisy versions of scalar products of x from the oracle Π LPN , and our task is to recover x.
Let y be a vector of length n, and let y i = x, g i . For known random vectors g 1 , g 2 , . . . , g n , we can easily reconstruct an unknown x from y using linear algebra. In the LPN problem, however, we receive instead noisy versions of y i , i = 1, 2, . . . , n. Writing the noise in position i as e i , i = 1, 2, . . . , n, we obtain In matrix form, the same relation is written as z = xG + e, where z = z 1 z 2 · · · z n , e = e 1 e 2 · · · e n , and the matrix G is formed as This shows that the LPN problem is simply a decoding problem, where G is a random k × n generator matrix, x is the information vector, and z is the received vector after transmission of a codeword on the binary symmetric channel with error probability η.

Piling-Up Lemma
We recall the piling-up lemma, which is frequently used in analysis of the LPN problem.

Complexity Estimates
The computational complexity of a given algorithm can be given in many different ways. First, we may choose between giving asymptotic expressions or giving more explicit complexity estimates. For example, the BKW algorithm for solving LPN in dimension n is sub-exponential.
In this paper, we are primarily interested in explicit complexity estimates and we will thus try to estimate the number of operations required by an algorithm. We follow a long tradition of counting the number of "simple" bit operations. This includes reading a bit in memory, and it also does not have restrictions in memory size. Clearly, this model does not match an estimation of number of clock cycles on some CPU. In general, we expect the number of clock cycles to be less, since some word-oriented instructions can perform many bit operations in a single instruction.

The BKW Algorithm
The BKW algorithm, as proposed by Blum et al. [3], is an algorithm that solves the LPN problem in sub-exponential time, requiring 2 O(k/ log k) queries and time. To achieve this, the algorithm uses an iterative sort-and-match procedure on the columns of the query matrix G, which iteratively reduces the dimension of G.
1. Reduction phase Initially, one searches for all combinations of two columns in G that add to zero in the last b entries. Let and define a filtering function φ M : F k 2 → F b 2 . Assume that one finds two columns g T i 0 , g T i 1 such that where * means any value, i.e., they belong to the same partition (or equivalence class) and fulfill φ M (g i 0 ) = φ M (g i 1 ). Then, a new vector g (1) is computed. Let y (1) 1 = x, g (1) 1 . An observed symbol is also formed, corresponding to this new column by forming where now e (1) 1 = e i 0 + e i 1 . It can be verified that Pr e (1) . The algorithm proceeds by adding the same element, say g i 0 , to the other elements in the partition forming z (1) and so forth. The resulting columns are stored in a matrix G 1 , If n is the number of columns in G, then the number of columns in G 1 will be n − 2 b . Note that the last b entries of every column in G 1 are all zero. In connection to this matrix, the vector of observed symbols is where Pr z . We now iterate the same (with a new φ function), picking one column and then adding it to another suitable column in G i giving a sum with an additional b entries being zero, forming the columns of G i+1 . Repeating the same procedure, an additional t − 1 times will reduce the number of unknown variables to k − b · t in the remaining problem. For each iteration, the noise level is squared. By the piling-up lemma (Lemma 1), we have that Hence, the bias decreases quickly to low levels as t increases. Therefore, we want to keep t as small as possible. 2. Solving phase In the final step, the BKW algorithm looks for a column vector in G t such that only the first bit of the vector is nonzero. If the algorithm finds such a vector, then that sample constitutes a very noisy observation the first bit x 1 of x. The algorithm stores the observation and repeats the reduction-phase procedure with new samples from the oracle, until sufficiently many observations of the secret bit x 1 have been obtained. Then, it uses a majority decision to determine x 1 . The whole procedure is given in Algorithm 1.

LF1 and LF2 Variants
The BKW algorithm is a powerful theoretical construction and because the algorithm operates solely on independent samples, it is possible to provide rigorous analysis using probabilistic arguments without heuristic assumptions. However, the provability comes at a quite high expense-the algorithm discards a lot of samples that could be used in solving the problem. This was first pointed out by Levieil and Fouque in [29]. They suggested that all samples should be kept after the reduction and not only the ones having weight 1. Instead of determining the secret bit by bit using majority decision, the whole k −t ·b bit secret may be determined using Walsh transformation. The authors suggested repeat (Reduction phase) Query the oracle for n queries of the form (g, z); Create a query matrix G = g T 1 g T 2 · · · g T n and an observed vector z = z 1 z 2 · · · z n ; for each partition P ∈ S do Pick a random (g , z ) ∈ P and remove it from P; Replace all remaining elements (g, z) ∈ P with (g + g , z + z ); (Solving phase) Find a column vector in G t such that only its first bit is non-zero and the remaining positions are all-zero. Then, the observed value z i is also an observation of x 1 ; until sufficiently many observations have been obtained ; Determine the secret bit x 1 by majority decision; return x 1 two methods: LF1 and LF2-the methods are essentially the same, but differ in how the columns to be merged are chosen in the reduction phase. 3 -LF1 picks a column in each partition and then adds it to the remaining samples in the same partition (entries having the same last b entries). This is identical to how the described BKW operates in its merging steps. The number of samples is reduced by 2 b after each merge operation. Hence, after a series of t merges, the number of samples is about The algorithm uses fast Walsh-Hadamard transform to determine the remaining secret of dimension k −t ·b. Thus, no samples are discarded and the algorithm does, in contrast to BKW, not query the oracle a multiple number of times. Therefore, a factor 2 b is lost in terms of query complexity. The LF1 method was subsequently adopted by Bernstein and Lange in [4]. -The other method, LF2, computes all pairs within the same partition. It produces more samples at the cost of increased dependency, thereby gaining more efficiency in practice.
Given that there are on average n 2 b samples in one partition, we expect around In the above, we illustrate t merging steps and sample count at each t with respect to BKW/LF1, r (t) and LF2, r (t).
possible samples at the end of one merge step in LF2, or more generally after t merging steps, with r (0) = n. The number of samples preserves when setting m = 3 · 2 b , and this setting is verified by an implementation in [29]. Like LF1, a fast Walsh-Hadamard transform (FWHT) is used to determine the secret. Combined with a more conservative use of samples, LF2 is expected to be at least as efficient as LF1 in practice. In particular, LF2 has great advantage when the attacker has restricted access to the oracle.
We have illustrated the different methods in Fig. 1.

Essential Idea
In this section, we try to give a very basic description of the idea used to give a new and more efficient algorithm for solving the LPN problem. A more detailed analysis will be provided in later sections. Assume that we have an initial LPN problem described by G = g T 1 g T 2 · · · g T n and z = xG + e, where z = z 1 z 2 · · · z n and As previously shown in [25] and [4], we may through Gaussian elimination transform G into systematic form. Assume that the first k columns are linearly independent and form the matrix D −1 . With a change of variablesx = xD −1 , we get an equivalent problem description withĜ We computeẑ In this situation, one may start performing a number of BKW steps on columns k + 1 to n, reducing the dimension k of the problem to something smaller. This will result in a new problem instance where noise in each position is larger, except for the first systematic positions. We may write the problem after performing t BKW steps in the form where now G has dimension k × m with k = k − bt and m is the number of columns remaining after the t BKW steps. We have z = x G + e , and Now we explain the basics of the new idea proposed in the paper. In a problem instance as above, we may look at the random variables y i = x · g i T . The bits in x are mostly zero, but a few are set to one. Let us assume that c bits are set to one. Furthermore, x is fixed for all i. We usually assume that g i is generated according to a uniform distribution. However, assuming that every column g i would be biased, i.e., every bit in a column position is zero with probability 1/2(1 + ), we then observe that the variables y i will be biased, as where k 1 , k 2 , . . . k c are the bit positions where x has value one (here [x] y denotes bit y of vector x). In fact, assuming that the variables [g i ] k j are independently distributed, 4 variables y i will have bias ( ) c .
So how do we get the columns to be biased in the general case? We could simply hope for some of them to be biased, but if we need to use a larger number of columns, the bias would have to be small, giving a high complexity for an algorithm solving the problem. We propose instead to use a covering code to achieve something similar to what is described above. Vectors g i are of length k , so we consider a code of length k and some dimension l. Let us assume that a generator matrix of this code is denoted F. For each vector g i , we now find the codeword in the code spanned by F that is closest (in Hamming sense) to g i . Assume that this codeword is denoted c i . Then, we can write where e i is a vector with biased bits. It remains to examine exactly how biased the bits in e i will be, but assume for the moment that the bias is . Going back to our previous expressions, we can write and since c i = u i F for some u i , we can write We may introduce v = x F T as a length l vector of unknown bits (linear combinations of bits from x ) and again Since we have Pr y i = z i = 1/2(1 + 2 t ), we get where is the bias determined by the expected distance between g i and the closest codeword in the code we are using, and c is the number of positions in x set to one. The last step in the new algorithm now selects . . , z m and for each guess of the 2 l possible values of v, we compute how many times v · u T i = z i when i = 1, 2, . . . , m. As this step is similar to a correlation attack scenario, we know that it can be efficiently computed using fast Walsh-Hadamard transform. After recovering v, it is an easy task to recover remaining unknown bits of x .

A Toy Example
In order to illustrate the ideas and convince the reader that the proposed algorithm can be more efficient than previously known methods, we consider an example. We assume an LPN instance of dimension k = 160, where we allow at most 2 24 received samples and we allow at most around 2 24 vectors of length 160 to be stored in memory. Furthermore, the error probability is η = 0.1.
For this particular case, we propose the following algorithm. Note that for an intuitive explanation, we assume the number of required samples to be 1/ 2 tot , where tot is the total bias. A rigorous complexity analysis of the new algorithm will be presented later.
1. The first step is to compute the systematic form, Here,Ĝ has dimension 160 andẑ has length at most 2 24 . 2. In the second step, we perform t = 4 merging steps (using the BKW/LF1 approach), the first step removing 22 bits and the remaining three each removing 21 bits. This results in where = 0.8 and Hence, the resulting problem has dimension 75 and the bias is 2 t = (0.8) 16 .
3. In the third step, we then select a suitable code of length 75. In this example, we choose a block code which is a direct sum of 25 [3,1,3] repetition codes, 5 i.e., the dimension is 25. We map every vector g i to the nearest codeword by simply selecting chunks of three consecutive bits and replace them by either 000 or 111. With probability 3 4 , we will change one position and with probability 1 4 we will not have to change any position. In total, we expect to change ( 3 4 · 1 + 1 4 · 0) · 25 positions. The expected weight of the length 75 vector e i is 1 4 · 75, so the expected bias is = 1 2 . As Pr x i = 1 = 0.1, the expected number of nonzero positions in x is 7.5. Assuming we have only c = 6 nonzero positions, we get 4. In the last step, we then run through 2 25 values of v and for each of them we compute how often v·u T i = z i for i = 1, . . . , 3·2 21 . Again since we use fast Walsh-Hadamard transform, the cost of this step is not much more than 2 25 operations. 5. The above four-step procedure forms one iteration of our solving algorithm, and we need to repeat it a few times. The expected number depends on the success probability of one iteration. For this particular repetition code, there are bad events that make the distinguisher fail. When two of the errors in x fall into the same concatenation, then the bias is zero. If there are three errors in the same concatenation, then the bias is negative. To conclude, we can distinguish successfully if there are no more than 6 ones in x and each of them falls into a distinct concatenation, i.e., the overall bias is at least 2 −11.15 . The successes probability 6 is thus In comparison with other algorithms, the best approach we can find is the Kirchner [25] and the Bernstein and Lange [4] approaches, where one can do up to 5 merging steps. Removing 21 bits in each step leaves 55 remaining bits. Using fast Walsh-Hadamard transform with 0.8 −64 = 2 20.6 samples, we can include another 21 bits in this step, but there are still 34 remaining variables that needs to be guessed. 5 In the sequel, we denote this code construction as concatenated repetition code. For this [75, 25, 3] linear code, the covering radius is 25, but we could see from this example that what matters is the average weight of the error vector, which is much smaller than 25. 6 This explains why we need a more rigorous analysis. If we would assume that the noise variables in the error vector are independent, the success probability is about 0.37. This estimation is too optimistic, since if two of the errors in x fall into the same code, the resulting zero bias totally ruin the statistical distinguishing procedure. We use a more accurate estimation in (12), which is further illustrated in Example 1. until acceptable hypothesis is found 18 Overall, the simple algorithm sketched above is outperforming the best previous algorithm using optimal parameter values. 7

Simulation
We have verified in simulation that the proposed algorithm works in practice, both in the LF1 and LF2 setting using the rate R = 1 3 concatenated repetition code.

Algorithm Description
Having introduced the key idea in a simplistic manner, we now formalize it by stating a new five-step LPN solving algorithm (see Algorithm 2) in detail. Its first three steps combine several well-known techniques on this problem, i.e., changing the distribution of secret vector [25], sorting and merging to make the dimension of samples shorter [3], and partial secret guessing [4], together. The efficiency improvement comes from 7 Adopting the same method to implement their overlapping steps, for the (160, 1 10 ) LPN instance, the Bernstein-Lange algorithm and the new algorithm cost 2 39.43 and 2 35.50 bit operations, respectively. Thus, the latter offers an improvement with a factor roughly 16 to solve this small-scale instance. a novel idea introduced in the last two subsections-if we employ a linear covering code and rearrange samples according to their nearest codewords, then the columns in the matrix subtracting their corresponding codewords lead to sparse vectors desired in the distinguishing process. We later propose a new distinguishing technique-subspace hypothesis testing, to remove the influence of the codeword part using fast Walsh-Hadamard transform. The algorithm consists of five steps, each described in separate subsections. These steps are graphically illustrated in Figs. 2 and 3.

Gaussian Elimination
Recall that our LPN problem is given by z = xG + e, where z and G are known. We can apply an arbitrary column permutation π without changing the problem (but we change the error locations). A transformed problem is π(z) = xπ(G) + π(e). This means that we can repeat the algorithm many times using different permutations, which very much resembles the operation of information-set decoding algorithms.
Continuing, we multiply by a suitable k × k matrix D to bring the matrix G to a systematic form,Ĝ = DG. The problem remains the same, except that the unknowns Fig. 3. After the columns have been merged t times, we have a matrix as shown above. In the upper part, we perform the partial secret guessing. The remaining part will be projected (with distortion) into a smaller space of dimension l using a covering code. are now given by the vectorx = xD −1 . This is just a change of variables. As a second step, we also add the codeword z 1 z 2 · · · z k Ĝ to our known vector z, resulting in a received vector starting with k zero entries. Altogether, this corresponds to the changê Our initial problem has been transformed, and the problem is now written aŝ where nowĜ is in systematic form. Note that these transformations do not affect the noise level. We still have a single noise variable added in every position.

Time-Memory Trade-Off
Schoolbook implementation of the above Gaussian elimination procedure requires about 1 2 · n · k 2 bit operations; we propose, however, to reduce its complexity by using a more sophisticated time-memory trade-off technique. We store intermediate results in tables and then derive the final result by adding several items in the tables together. The detailed description is as follows. For a fixed s, divide the matrix D in a = k s parts, i.e., where D i is a sub-matrix with s columns (except possibly the last matrix D a ). Then store all possible values of D i x T for x ∈ F s 2 in tables indexed by i, where 1 ≤ i ≤ a. For a vector g = g 1 g 2 . . . g a , the transformed vector is where D i g T i can be read directly from the table. The cost of constructing the tables is about O (2 s ), which can be negligible if memory in the later merge step is much larger. Furthermore, for each column, the transformation costs no more than k · a bit operations; so, this step requires bit operations in total if 2 s is much smaller than n.

A Minor Improvement
One observation is that only the distribution of the first k = k − t · b entries in the secret vector affects the later steps. In other words, we just need to make the first k entries biased. Thus, we can ignore the Gaussian elimination processing on the bottom tb rows of the G. More formally, let the first k columns of G be an invertible matrix G 0 , where Then, the first k column of DG is of the form I 0 .
Using the space-time trade-off technique, the complexity can be computed as: Compared with Eq. (16), we reduce the value a from k s to k s , where k = k − t · b.

Merging Columns
This next step consists of merging columns. The input to this step isẑ andĜ. We writê G = I L 0 and process only the matrix L 0 . As the length of L 0 is typically much larger than the systematic part ofĜ, this is roughly no restriction at all. We then use the a sort-and-match technique as in the BKW algorithm, operating on the matrix L 0 . This process will give us a sequence of matrices denoted L 0 , L 1 , L 2 , . . . , L t . Let us denote the number of columns of L i by r (i), with r (0) = r (0) = n − k . Adopting the LF1 type technique, every step operating on columns will reduce the number of samples by 2 b , yielding that Using the setting of LF2, the number of samples is The expression for r (t) does not appear in [29], but it can be found in [5]. We see that if m is equal to 3 · 2 b , the number of samples preserves during the reductions. Implementations suggest that there is no visible effect on the success of the algorithm, 8 so we adopt this setting. Apart from the process of creating the L i matrices, we need to update the received vector in a similar fashion. A simple way is to putẑ as a first row in the representation of G. This procedure will end with a matrix I L t , where L t will have all t · b last entries in each column all zero. By discarding the last t · b rows, we have a given matrix of dimension k − t · b that can be written as G = I L t , and we have a corresponding received vector z = 0 z 1 z 2 · · · z m . The first k = k − t · b positions are only affected by a single noise variable, so we can write for some unknown x vector (here, we remove the bottom t · b bits ofx to form the length k vector x ), whereẽ and T i contains the positions that have been added up to form the (k + i)th column of G . By the piling-up lemma, the bias forẽ i increases to 2 t . We denote the complexity of this step by C 2 , where In the both cases

Partial Secret Guessing
The previous procedure outputs G with dimension k = k − t · b and m columns. We now divide x into two parts: where x 1 is of length k . In this step, we simply guess all vectors x 2 ∈ B 2 (k − k , w 0 ) for some w 0 and update the observed vector z accordingly. This transforms the problem to that of attacking a new smaller LPN problem of dimension k with the same number of samples. Firstly, note that this will only work if w H x 2 ≤ w 0 , and we denote this probability by P(w 0 , k − k ). Secondly, we need to be able to distinguish a correct guess from incorrect ones and this is the task of the remaining steps. The complexity of this step is

Covering-Coding Method
In this step, we use a [k , l] linear code C with covering radius d C to group the columns. That is, we rewrite where c i is the nearest codeword in C, and w H e i ≤ d C . The employed linear code is characterized by a systematic generator matrix that has the corresponding parity-check matrix There are several ways to select a code. An efficient way of realizing the above grouping idea is by a table-based syndrome-decoding technique. The procedure is as follows: 1. We construct a constant-time query table containing 2 k −l items, in each of which stores the syndrome and its corresponding minimum-weight error vector. 2. If the syndrome Hg i T is computed, we then find its corresponding error vector e i by checking in the table; adding them together yields the nearest codeword c i .
The remaining task is to calculate the syndrome efficiently. We sort the vectors g i according to the first l bits, where 0 ≤ i ≤ m, and group them into 2 l partitions denoted by P j for 1 ≤ j ≤ 2 l . Starting from the partition P 1 whose first l bits are all zero, we can derive the syndrome by reading its last k − l bits without any additional computational cost. If we know one syndrome in P j , we then can compute another syndrome in the same partition within 2(k − l) bit operations, and another in a different partition whose first l-bit vector has Hamming distance 1 from that of P j within 3(k − l) bit operations. Therefore, the complexity of this step is Notice that the selected linear code determines the syndrome table, which can be precomputed within complexity O k · 2 k −l . For some instances, building such a full syndrome table may dominate the complexity, i.e., when k · 2 k −l becomes too large. Here, we use a code concatenation to reduce the size of the syndrome table, thereby making this cost negligible compared with the total attacking complexity.
We split the search space into two (or several) separate spaces by using a concatenated code construction. As an example, let C be a concatenation of two [k /2, l/2] linear codes. Then, the syndrome tables can be built in O k · 2 k /2−l/2 time and memory. Assuming that the two codes are identical and they will both contribute to the final noise. The decoding complexity then changes to

Subspace Hypothesis Testing
In the subspace hypothesis testing step, we group the (processed) samples (g i , z i ) in sets L(c i ) according to their nearest codewords and define the function f L (c i ) as The employed systematic linear code C describes a bijection between the linear space F l 2 and the set of all codewords in F k 2 , and moreover, due to its systematic feature, the corresponding information vector appears explicitly in their first l bits. We can thus define a new function such that u represents the first l bits of c i and exhausts all points in F l 2 . The Walsh transform of g is defined as Here, we exhaust all candidates of v ∈ F l 2 by computing the Walsh transform. The following lemma illustrates the reason why we can perform hypothesis testing on the subspace F l 2 .

Lemma 2. There exits a unique vector
Proof. As c i = uF, we obtain Thus, we construct the vector v = x F T that fulfills the requirement. On the other hand, the uniqueness is obvious.
Before we continue to go deeper into the details of the attack, we will now try to illustrate how the subspace hypothesis test is performed. Consider the following.
Rewrite g i as codeword c i = uF and discrepancy e i As a next step, we can separate the discrepancy e i from u F, which yields We now see that the dimension of the problem has been reduced, i.e., Since w H e i ≤ d C and w H x i ≈ η · k , the contribution from x 1 , e i is small. Note that e i is the error from the above procedure, and that we did not include the error from the oracle and the merging procedure. Recall that the sequence received from oracle is z i = y i + e i , that after merging the columns of G becomes z i = y i +ẽ i . All things considered (all sources of error piled on the sequence), we have Given the candidate v, G(v) is the difference between the number of predicted 0's and the number of predicted 1's for the bitẽ i + x , e i . Assume that x 1 , e i will contribute to a noise with bias no smaller then set . If v is the correct guess, then it is Bernoulli distributed with noise parameter otherwise, it is considered random. Thus, the best candidate v opt is the one that maximizes the absolute value of G(v), i.e., and we need approximately (39) samples 9 to distinguish these two cases.
Note that a false positive can be recognized without much cost. If the distinguisher fails, we then choose another permutation to run the algorithm again. The procedure will continue until we find the secret vector x.
We use the fast Walsh-Hadamard transform technique to accelerate the distinguishing step. As the hypothesis testing runs for every guess of x 2 , the overall complexity of this step is

Analysis
In the previous section, we already indicated the complexity of each step. We now put it together in a single complexity estimate. We first formulate the formula for the possibility of having at most w errors in j positions P(w, j), which follows a binomial distribution, i.e., The complexity consists of three parts: -Inner complexity The complexity of each step in the algorithm, i.e., These steps will be performed every iteration. -Guessing The probability of making a correct guess on the weight of x 2 , i.e., -Testing The probability that the constraint on the bias level introduced by coding (i.e., no smaller than set ) is fulfilled, denoted by P test .
The success probability in one iteration is P(w 0 , k −k ) · P test . The presented algorithm is of the Las Vegas type, and in each iteration, the complexity accumulates step by step. Hence, the following theorem is revealed.  a, b, t, k , l, w 0 , set ), is equal to under the condition that where m = n − t2 b in the LF1 setting and m = n = 3 · 2 b in the LF2 setting. 10 Proof. The complexity of one iteration is given by C 1 + C 2 + C 3 + C 4 + C 5 . The expected number of iterations is the inverse of P guess · P test . Substituting the formulas into the above will complete the proof. Condition (45) ensures that we have enough samples to determine the correct guess with high probability.
The remaining part is to calculate the value of P test , which is determined by the employed code.

Bias from a Single Perfect Code
If we use a length k perfect code 11 with covering radius d C , the bias in e i is determined by the following lemma. 12 Proposition 1. (Bias from covering code [31]) If the covering code F has an optimal covering radius, then the probability Pr w H( x 1 )=c x 1 , e i = 1 is given by where k is the dimension of x 1 and d C is the covering radius. Thus, the computed bias (c) conditioned on the weight of x 1 is The bias function (c) is monotonically decreasing. If we preset a bias level set , all the possible x 1 with weight no more than c 0 will be distinguished successfully, where c 0 = min{c| (c) ≥ set }. We can then present a lower bound on P test , i.e., P test = P(c 0 , k ). 11 In the sequel, we assume that when the code length is relatively large, it is reasonable to approximate a perfect code by a random liner code. We replace the covering radius by the sphere-covering bound to estimate the expected distance d, i.e., d is the smallest integer, s.t. d i=0 k i ≥ 2 k −l . We give more explanation in Sect. 9. 12 We would like to thank Sonia Bogos and Serge Vaudenay for pointing out this accurate bias computation.
Note that this estimation lower bounds the success probability, which is higher in practice as the distinguisher will still succeed with certain probability even if the bias level introduced by coding is smaller than the one we set. We can also make use of the list-decoding idea to increase the success probability by keeping a small list of candidates.

The Concatenated Construction
Until now, we have only considered to use a single code for the covering code part. In some cases, performing syndrome decoding may be too expensive for optimal parameters and to overcome this, we need to use a concatenated code construction. As an example, we will illustrate the complexity estimation for the concatenation of two codes, which is the optimal code construction for solving several LPN instances.
As in the previous case, we set an explicit lower bound on the bias ≥ set introduced from the covering code part, which is attained only by a certain set E set of (good) error patterns in the secret. For a concatenation of two codes, we have divided the vector into two parts and hence, The noise x 1 , e i can be rewritten as which implies that the bias = 1 2 , where 1 ( 2 ) is the bias introduced by the first (second) code and can be computed by Proposition 1. We then determine all the (good) error patterns E set in the secret such that the bias ≥ set . We can write the success probability P test We could expect that the algorithm works slightly better in practice, as we discussed before in Sect. 6.1.
The complexity C 4 changes to that in the concatenated code case which we denote by C 4 , and the pre-computation of the syndrome tables has a lowered complexity since the codes are smaller and can be treated separately. Since the pre-computation complexity O k · 2 k /2−l/2 must be less 13 or match the total attacking complexity, the lowered time complexity allows for looser constraints on the algorithm parameters. Apart from these differences, the complexity expression is the same as that for the non-concatenated construction.
It is straight-forward to extend the above analysis to a concatenation of multiple linear codes. As before, we choose to preset a lower bound set for the bias and derive a formula to estimate the probability of all the good error patterns in the secret. This type of analysis has actually been done in the toy example from Sect. 4.1.
Example 1. In this toy example from Sect. 4.1, we concatenate 25 [3,1] repetition codes C i , for 1 ≤ i ≤ 25. For each code C i , we know that the corresponding bias is related to the Hamming weight w C i of the associated subvector of the secret (as shown in Table 2). In Sect. 4.1, we set the bound for the bias set to be 2 −6 and then obtain the success probability 14 in (12).

Results
We now present numerical results of the new algorithm attacking three key LPN instances, as shown in Table 3. All aiming for achieving 80-bit security, the first one is with parameter (512, 1 8 ), widely accepted in various LPN-based cryptosystems (e.g., HB + [22], HB # [13], LPN-C [14]) after the suggestion from Levieil and Fouque [29]; the second one is with increased length (532, 1 8 ), adopted as the parameter of the irreducible Ring-LPN instance employed in Lapin [20]; and the last one is a new design parameter 15 we recommend to use in the future. The attacking details on different protocols will be given later. We note that the new algorithm has significance not only on the above applications but also on some LPN-based cryptosystems without explicit parameter settings (e.g., [10,24]).

HB +
Levieil and Fouque [29] proposed an active attack on HB + by choosing the random vector a from the reader to be 0. To achieve 80-bit security, they suggested to adjust the lengths of secret keys to 80 and 512, respectively, instead of being both 224. Its 14 When calculating the success probability in (12), we ignore the probability that a nonzero even number of concatenations have w C i = 3, since these events are so rare. 15 This instance requires 2 81 bit memory using the new algorithm and could withstand all existing attacks on the security level of 2 80 bit operations. security is based on the assumption that the LPN instance with parameter (512, 1 8 ) can resist attacks in 2 80 bit operations. But we solve this instance in 2 79.64 bit operations, indicating that the old parameters are insufficient to achieve 80-bit security.

LPN-C and HB #
Using similar structures, Gilbert et al. proposed two different cryptosystems, one for authentication (HB # ) and the other for encryption (LPN-C). By setting the random vector from the reader and the message vector to be both 0, we obtain an active attack on HB # authentication protocol and a chosen-plaintext-attack on LPN-C, respectively. As their protocols consist of both secure version (random-HB # and LPN-C) and efficient version (HB # and Toeplitz LPN-C), we need to analyze them separately.

Using Toeplitz Matrices
Toeplitz matrix is a matrix in which each ascending diagonal from left to right is a constant. Thus, when employing a Toeplitz matrix as the secret, if we attack its first column successively, then only one bit in its second column is unknown. So the problem is transformed to that of solving a new LPN instance with parameter (1, 1 8 ). We then deduce the third column, the fourth column, and so forth. The typical parameter settings of the number of the columns (denoted by m) are 441 for HB # , and 80 (or 160) for Toeplitz LPN-C. In either case, the cost for determining the vectors except for the first column is bounded by 2 40 , negligible compared with that of attacking one (512, 1 8 ) LPN instance. Therefore, for achieving the security of 80 bits, these efficient versions that use Toeplitz matrices should use a larger LPN instance.

Random Matrix Case
If the secret matrix is chosen totally at random, then there is no simple connection between different columns to exploit. One strategy is to attack column by column, thereby deriving an algorithm whose complexity is that of attacking a (512, 1 8 ) LPN instance multiplied by the number of the columns. That is, if m = 441, then the overall complexity is about 2 88.4 . We may slightly improve the attack by exploiting that the different columns share the same random vector in each round.

Lapin with an Irreducible Polynomial
Heyse et al. [20] use a (532, 1 8 ) Ring-LPN instance with an irreducible polynomial 16 to achieve 80-bit security. We show here that this parameter setting is not secure enough for Lapin to thwart attacks on the level of 2 80 . Although the new attack on a (532, 1 8 ) LPN instance requires approximately 2 82 bit operations, larger than 2 80 , there are two key issues to consider: -Ring-LPN is believed to be no harder than the standard LPN problem. For the instance in Lapin using a quotient ring modulo the irreducible polynomial x 532 + x + 1, it is possible to optimize the procedure by further taking advantage of the ring structure, thereby resulting in a more efficient attack than the generic one. -The definition of bit complexity here poorly characterizes the actual computational hardness as the computer can parallel many bit operations in one clock cycle. We believe that a better definition should be a vectorized version, i.e., defining the "atomic" operation as the addition or multiplication between two 64 (or 128)-bit vectors. The refined definition is a counterpart of that in the Advanced Encryption Standard (AES), where 80-bit security means that we can perform 2 80 AES encryptions, not just bit operations. If we adopt this vectorized security definition, the considered Lapin instance is far away from achieving 80-bit security.
We suggest to increase the size of the employed irreducible polynomial in Lapin for 80-bit security.

Experiments
We show the experimental results in this part, using a [46, 24] linear code that is a concatenation of two binary [23,12] Golay codes 17 for the subspace hypothesis testing procedure.

Validation of Success Rates
Starting with 2 25.6 LPN samples, we run two groups of simulations with k equal to 142 and 166, respectively. The noise rate η varies to achieve a reasonable success probability. We perform 4 BKW steps with size 24 for the prior and include one more step for the latter. Moreover, we stick to the LF2 type reduction steps for a better performance.
The comparison between the simulation results and their theoretical counterparts is shown in Table 4. The simulated values are obtained by running about 200 trials for each LPN instance. Meanwhile, as we always keep about 2 25.6 samples after each reduction step, the number of samples for the statistical testing procedure is also approximately 16 The Lapin instantiation with a reducible polynomial designed for 80-bit security has been broken within about 2 71 bit operations in [17]. 17 Binary [23,12] Golay codes are perfect codes with optimal covering property. The concatenation of two Golay codes can produce a larger linear code with fairly good covering property and also efficient decoding. Moreover, the implementation of Golay codes is simple and well studied.  Table 4 that the adopted theoretical estimation is a conservative estimation as discussed in Sect. 6.1, since the simulation results are almost always better than the theoretical ones. On the other hand, the theoretical predictions are fairly close to our experimental results. This understanding is further consolidated in Fig. 4 plotting the success probability comparison with fine-grained choices of the noise rate η and more accurate simulated probabilities, i.e., we run 1000 tries for each LPN instance.

The Largest Instance
We solve the (136, 1 4 ) LPN instance in 12 h on average using one thread of a server with Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz and 256 GB of RAM. This solved instance is slightly larger than the (135, 1 4 ) one reported in [12] requiring 64-thread parallel computating of 13.84 days by using the Well-Pooled MMT algorithm and 5.69 days by using the Hybrid algorithm, 18 on a server with 256 GB of RAM. Though it is tricky to compare implementations of different types of algorithms, our results support that the BKW variants are more efficient when the noise rate η is high.
In the implementation, we ask the LPN oracle for around 2 31.6 samples and then perform three LF2 type BKW steps with size 30. After this step, we have zeroed out 90 positions and do the subspace hypothesis testing on the remaining 46 positions via employing a concatenation of two [23,12] Golay codes. We run 12 trails in approximately 48 h and succeed 4 times.

More on the Covering-Coding Method
In this section, we describe more aspects of the covering-coding technique, thus emphasizing the most novel and essential step in the new algorithm.

Sphere-Covering Bound
We use sphere-covering bound to estimate the bias contributed by the new technique for two reasons. Firstly, there is a well-known conjecture [7] in coding theory that the covering density approaches 1 asymptotically if the code length goes to infinity. Thus, it is sensible to assume that the linear code has a good covering radius, when the code length k is relatively large. Secondly, we could see from the previous example that the desired key feature is a linear code with low average error weights, which is smaller than its covering radius. From this perspective, the covering bound brings us a good estimation.

Attacking Public-Key Cryptography
We know various decodable covering codes that could be employed in the new algorithm, e.g., table-based syndrome decodable linear codes, concatenated codes built on Hamming codes, Golay codes and repetition codes, etc.. For the aimed cryptographic schemes in this paper, i.e., HB variants, LPN-C, and Lapin with an irreducible polynomial, the first three are efficient, but in the realm of public-key cryptography (e.g., schemes proposed by Alekhnovich [2], Damgård and Park [9], Duc and Vaudenay [11]), the situation alters. For these systems, their security is based on LPN instances with huge secret length (tens of thousands) and extremely low error probability (less than half a percent), so due to the competitive average weight of the error vector shown by the previous example in Sect. 4.1, the concatenation of repetition codes with much lower rate seems more applicable-by low-rate codes, we remove more bits when using the covering-coding method.

Alternative Collision Procedure
Although the covering-coding method is employed only once in the new algorithm, we could derive numerous variants, and among them, one may find a more efficient attack. For example, we could replace several steps in the later stage of the collision procedure by adding two vectors decoded to the same codeword together. This alternative technique is similar to that invented by Lamberger et al. [27,28] for finding near-collisions of hash function. By this procedure, we could eliminate more bits in one step at the cost of increasing the error rate; this is a trade-off, and the concrete parameter setting should be analyzed more thoroughly later. Actually, with the help of this alternative collision idea, a series of recent papers [1,18,19,26] have greatly reduced the solving complexity of the LWE problem, the qary counterpart of LPN, both asymptotically and concretely. But we failed to find better attacks when applying this idea to the LPN instances of cryptographic interests in the proposed authentication protocols and LPN-C, since the noise rates are high. We believe that this idea could be useful when the noise is relatively small and leave this problem as an interesting scope for future research.

Conclusions
In this paper, we have described a new algorithm for solving the LPN problem that employs an approximation technique using covering codes together with a subspace hypothesis testing technique to determine the value of linear combinations of the secret bits. Complexity estimates show that the algorithm beats all the previous approaches, and in particular, we can present academic attacks on instances of LPN that has been suggested in different cryptographic primitives.
There are a few obvious improvements for this new technique, one being the use of strong distinguishers and another one being the use of more powerful constructions of good codes. There are also various modified versions that need to be further investigated. One such idea as described in Sect. 9.3 is to use the new technique inside a BKW step, thereby removing more bits in each step at the expense of introducing another contribution to the bias. An interesting open problem is whether these ideas can improve the asymptotic behavior of the BKW algorithm.