The new algorithm we propose uses the same structure as the BKW algorithm. The new idea involves changing the BKW step to a more advanced step that can remove more positions in the treated vectors at the expense of leaving an additional noise term.
We introduce some additional notation. For the index set I, we make use of \(\mathbf {v}_I\) to denote the vector with entries indexed by I. Alternatively, we utilize the symbol \(\mathbf {v}_{[1,\ldots ,n]}\) to denote the vector containing the first n entries of \(\mathbf {v}\), etc.
4.1 A New BKW Step
Recall the BKW step, taking a large number of vectors \(\mathbf{a}_i\) and trying to collide them in a set of positions determined by an index set I. This part of the vector \(\mathbf{a}\) is written as \(\mathbf{a}_I\). The size of the collision set (\(\frac{q^b-1}{2}\)) and the number of vectors have to be of the same order, which essentially determines the complexity of the BKW algorithm, as the number of steps we can perform is determined by the variance of the noise.
We propose to do the BKW step in a different manner. Assuming that we are considering step i in the BKW process, we fix a q-ary linear code with parameters \((N_i,b)\), called \(\mathcal {C}_i\). The code gives rise to a lattice code. Now, for any given vector \(\mathbf {a}_I\) as input to this BKW step, we approximate the vector by one of the codewords in the code \(\mathcal {C}_i\).
We rewrite \(\mathbf {a}_I\) into two parts, the codeword part \(\mathbf {c}_I \in \mathcal {C}_i\) and an error part \(\mathbf {e}_I\in \mathbb {Z}_q^{N_i}\), i.e.,
$$\begin{aligned} \mathbf {a}_I = \mathbf {c}_I + \mathbf {e}_I. \end{aligned}$$
(4)
Clearly, we desire the error part to be as small as possible, so we adopt a decoding procedure to find the nearest codeword in the chosen code \(\mathcal {C}_i\) using the Euclidean metric. Here, we utilize syndrome decoding by maintaining a large syndrome table, and details will be discussed thoroughly later.
Each vector \(\mathbf {a}_I\) is then sorted according to which codeword it was mapped to. Altogether, there are \(q^b\) possible codewords. Finally, generate new vectors for the next BKW step by subtracting vectors mapped to the same codeword (or adding to the zero codeword).
The inner product \(\left\langle \mathbf {s}_I,\mathbf {a}_I\right\rangle \) is equal to
$$\begin{aligned} \left\langle \mathbf {s}_I,\mathbf {a}_I\right\rangle =\left\langle \mathbf {s}_I,\mathbf {c}_I\right\rangle +\left\langle \mathbf {s}_I,\mathbf {e}_I\right\rangle . \end{aligned}$$
By subtracting two vectors mapped to the same codeword we cancel out the first part of the right hand side and we are left with the noise. The latter term is referred to as the error term introduced by coding.
Let us examine the samples we have received after t BKW steps of this kind. In step i we have removed \(N_i\) positions, so in total we have now removed \(\sum _{i=1}^t N_i\) positions (\(N_i\ge b\)). The received samples are created from summing \(2^t\) original samples, so after guessing the remaining symbols in the secret vector and adjusting for its contribution, a received symbol z can be written as a sum of noise variables,
$$\begin{aligned} z=\sum _{j=1}^{2^t} e_{i_j}+ \sum _{i=1}^n s_i(E_i^{(1)}+E_i^{(2)}+\cdots + E_i^{(t)}), \end{aligned}$$
(5)
where \(E_i^{(h)}=\sum _{j=1}^{2^{t-h+1}}\hat{e}^{(h)}_{i_j}\) and \(\hat{e}^{(h)}_{i_j}\) is the coding noise introduced in step h of the modified BKW algorithm. Note that on one position i, at most one error term \(E_i^{(h)}\) is non-zero.
We observe that noise introduced in early steps is increased exponentially in the remaining steps, so the procedure will use a sequence of codes with decreasing rate. In this way the error introduced in early steps will be small and then it will eventually increase.
4.2 Analyzing the Error Distribution
There are many approaches to estimating the error distribution introduced by coding. The simplest way is just assuming that the value is a summation of several independent discrete Gaussian random variables. This estimation is easily performed and fairly accurate. A second approach is to compute the error distribution accurately (to sufficient precision) by computer. We should note that the error distribution is determined from the linear code employed. We now rely on some known result on lattice codes to provide a good estimate on the size of the noise introduced by coding.
We assume that the error vector \({\mathbf {e}}\) introduced by the coding technique remains discrete Gaussian, and their summation is discrete Gaussian as well, just as in previous research. As the error is distributed symmetrically we should estimate the value \(\mathrm {E}[||\mathbf {e}|| ^2]\) to bound the effect of the error, where \(\mathbf {e}\) is the error vector distributed uniformly on the integer points inside the fundamental region \(\mathcal {V}\) of the lattice generated by Construction A.
Thus, the problem of decoding transforms to an MMSE quantizing problem over the corresponding lattice. For simplicity of analysis, we change the hypothesis and assume that the error vector \(\mathbf {e}\) is distributed uniformly and continuously on \(\mathcal {V}\). Thus we can utilize the theory on lattice codes to give a fairly accurate estimation of the value \(\frac{1}{N}\mathrm {E}[||\mathbf {e}|| ^2]\), which exactly corresponds to the second moment of the lattice \(\sigma ^2\). As given in Eq. (2), we can write it as,
$$\sigma ^2 = G(\varLambda ) \cdot Vol(\mathcal {V})^{\frac{2}{N}}.$$
In our scheme, although we employ several different linear codes with different rates, we also try to make the contribution of every dimension equal. We generate a lattice \(\varLambda \) by Construction A, given a linear code. We denote the minimum possible value of \(G(\varLambda )\) over all lattices \(\varLambda \) in \(\mathbb {Z}^n\) generated by Construction A from an [N, k] linear code as \(G(\varLambda _{N,k})\).
Definitely \(G(\varLambda _{N,k})\) is no less than \(G(\varLambda _N)\); thus it is lower bounded by the value \(\frac{1}{2\pi e}\) and this bound can be achieved asymptotically. For the lattice generated by \(\mathbb {Z}^N\), i.e., employing a trivial linear code without redundancy, its normalized second moment is \(\frac{1}{12}\). Therefore, the value \(G(\varLambda _{N,k})\) satisfies
$$\frac{1}{2\pi e}<G(\varLambda _{N,k})\le \frac{1}{12}.$$
We set \(G(\varLambda _{N,k})\) to be \(\frac{1}{12}\) and surely this is a pessimistic estimation. Since the lattice is built from a linear code by Construction A, the volume of \(\mathcal {V}\) is \(q^{N-k}\). Thus, we can approximate \(\sigma \) by
$$\begin{aligned} \sigma \approx q^{1-k/N}\cdot \sqrt{G(\varLambda _{N,k})} = \frac{q^{1-k/N}}{\sqrt{12}}. \end{aligned}$$
(6)
Table 3. Numerical evaluations on 1 / G
We have numerically tested the smallest possible variance of errors introduced by coding, given several small sizes of N, k and q, (e.g., [N, k] is [3, 1] or [2, 1], q is 631, 2053 or 16411) and verified that the above estimation works (see Table 3, where 1 / G is bounding \(1{/}G(\varLambda _{N,k})\)). We choose [N, 1] codes since for the covering or MMSE property, lower rate means worse performance.
It is folklore that the value G will decrease when the dimension and length becomes larger, and all the cases listed in Table 3 fully obey the rule. Thus we believe that we may have even better performance when employing a more complicated code for a larger problem. Actually, the values without a \(\dag \) sign in Table 3 is computed using randomly chosen linear codes, and they still outperform our estimation greatly. This observation fits the theory well that when the dimension n is large, a random linear code may act nearly optimally.
From Eq. (6) we know the variance of the error term from the coding part. Combining this with Eq. (5), we get an estimation of the variance of the total noise for the samples that we create after t modified BKW steps.
4.3 Decoding Method and Constraint
Here we discuss details of syndrome decoding and show that the additional cost is under control. Generally, we characterize the employed [N, k] linear code by a systematic generator matrix \(\mathbf {M} = \begin{bmatrix}\mathbf {I}\,\mathbf {F}'\end{bmatrix}_{k\times N}\). Thus, a corresponding parity-check matrix \(\mathbf {H} = \begin{bmatrix} \mathbf {F} '^{\tiny \text {T}}\, \mathbf {I}\end{bmatrix}_{(N-k) \times N}\) is directly obtained.
The syndrome decoding procedure is described as follows. (1) We construct a constant-time query table containing \(q^{N-k}\) items, in each of which we store the syndrome and its corresponding error vector with minimum Euclidean distance. (2) When the syndrome is computed, by checking the table, we locate its corresponding error vector and add them together, thereby yielding the desired nearest codeword.
We generalize the method in [22] to the non-binary case \(\mathbb {Z}_q\) for computing the syndrome efficiently. Starting by sorting the vectors \(\mathbf {a}_{I}\) by the first k entries, we then partition them accordingly; thus there are \(q^k\) partitions denoted \(\mathcal {P}_j\), for \(1\le j\le q^k\). We can read the syndrome from its last \(N-k\) entries directly if the vector \(\mathbf {a}_{I}\) belongs to the partition with the first k entries all zero. Then we operate inductively. If we know one syndrome, we can compute another one in the same partition within \(2(N - k)\) \(\mathbb {Z}_q\) operations, or compute one in a different partition whose first k entries with distance 1 from that in the known partition within \(3(N - k)\) \(\mathbb {Z}_q\) operations. Suppose we have \(m_{dec}\) vectors to decode here (generally, the value \(m_{dec}\) is larger than \(q^k\)), then the complexity of this part is bounded by \((N-k)(2 m_{dec} +q^k)<3m_{dec}(N-k)\). Since the cost of adding error vectors for the codewords is \(m_{dec}N\), we can give an upper bound for the decoding cost, which is roughly \(4m_{dec}N\).
Concatenated Constructions. The drawback of the previous decoding strategy is that a large table is required to be stored with size exponential in \(N-k\). On the other hand, there is an inherent memory constraint, i.e., \(\mathcal {O}\left( q^b\right) \), when the size b is fixed, which dominates the complexity of the BKW-type algorithm.
We make use of a narrow sense concatenated code defined by direct summing several smaller linear codes to simplify the decoding procedure, when the decoding table is too large. This technique is not favored in coding theory since it diminishes the decoding capability, but it works well for our purpose.