1 Introduction

With the practical feasibility of a quantum computer of sufficient capacity getting more and more probable by the day, the threat posed by Shor’s algorithm [47] on number theory based cryptosystems grows as well. To address this threat, NIST began a standardization process in 2016 for post-quantum cryptography. The fourth round of this process started in July 2022 when, in the Key Encapsulation Mechanism category, four candidates were submitted. Among them, the Classic McEliece [2] and the BIKE [3] cryptosystems are two solutions based on error-correcting codes. Their security relies on the intractability of the binary syndrome decoding problem (\(\textsf{SDP}\)) [6]. The SDP is the core hard problem of several cryptographic constructions, e.g., the FSB hash function [4], the SYND stream cipher [28] or the Stern identification scheme [51]. Given a parity-check matrix \(\varvec{H}\) of a binary linear code, a binary syndrome vector \(\varvec{s}^{*}\) and an integer t, the \(\textsf{SDP}\) consists in asking for fixed Hamming weight (\(\text {HW}(x)=t\)) solution to the linear system \(\varvec{H}\varvec{x}= \varvec{s}^{*}\). There are three main techniques for solving the SDP: statistical decoding [14, 18, 27, 34, 43], information set decoding (ISD) [5, 9, 11, 12, 22, 23, 26, 36, 37, 39, 40, 46, 50] and generalized inverse based decoding [52]. Information Set Decoding was originally proposed by Prange in 1962 [46], and it has been incrementally refined since by Lee and Brickell [36], Stern [30, 50] and, more recently, by May, Meurer and Thomae [39] and by Becker, Joux, May and Meurer [5]. The complexity of the ISD method has been used to better tune the parameters of the cryptosystems [24] according to the required security levels.

1.1 Integer syndrome decoding

One recent line of work considers modified versions of the \(\textsf{SDP}\), for which additional information is available, for instance via side-channel analysis on implementations of the aforementioned cryptosystems. In [33], authors study the case where parts of the error are known, or only their Hamming weight. The case where the integer syndrome \(\varvec{s}\) is available, instead of the binary one, as if the matrix-vector multiplication had been performed in the integer ring instead of the binary finite field, is considered in [20]. One method to obtain the integer syndrome is by laser fault injection attack, as presented in [15]. The problem one has to solve in this case is the integer syndrome decoding, referred to as \(\mathbb {N}-\textsf{SDP}\), where the input is the parity-check matrix \(\varvec{H}\), the integer syndrome vector \(\varvec{s}\) and the weight of the solution t. The same question is raised, whether \(\varvec{H}\varvec{x}= \varvec{s}\) admits a solution of weight t. This problem can be tackled down by means of Integer Linear Programming [15] or probabilistic methods [25]. Another method of obtaining an integer syndrome, much more feasible and realistic than laser fault injection, is by side-channel analysis [16].

Due to physical factors, the integer entries of the syndrome might not be perfectly accurate. Hence, in the resulting problem, the \(\mathbb {N}-\textsf{SDP}\) in the presence of noise, we are given a noisy integer syndrome \(\widetilde{\varvec{s}}=\varvec{s}+\epsilon \), where \(\epsilon \) models the noise as a vector of random variables. The solution proposed in [16] uses a combination of ISD techniques and the score decoder from [25]. In [16] the performance of the algorithm was evaluated by simulations. Still, no theoretical evidence exists for the performance of the ISD-score decoder. On top of that, the performance certainly depends on the distribution of the noise, and for that one needs to carry a deeper investigation into the side-channel part.

Recently, the leakage model (Hamming weight or distance) as well as various noise distributions were analyzed in [29]. It was shown that if an attacker has more that the final result (or approximation of the final result) then ISD-score decoders can be enhances or even outperformed. In particular, in [29] the attacker model assumes intermediate estimates of the syndrome computations are leaked. Hence, the extra information is used to locally correlate the information with the solution, making the resulting attacks more powerful.

1.2 Related work

Learning with errors and hints Not only code-based cryptosystems are vulnerable to such attacks. Similar results were obtained in the context of lattice-based cryptosystems by Bootle et al. [10]. The BLISS cryptosystem was cryptanalysed by means of similar hybrid attacks, where side-channel attacks revealed an Integer version of the Learning With Errors (ILWE). The ILWE problem is the lattice-based equivalent of the \(\mathbb {N}-\textsf{SDP}\). However, ILWE was solved with another technique that does not seem to work for \(\mathbb {N}-\textsf{SDP}\). Nevertheless, it points out that such scenarios extend broader than code-based cryptography.

Quantitative group testing Quantitative Group Testing (QGT) is an active field of research, lately boosted by the COVID-19 epidemic. In the QGT we are given a large population out of which some individuals suffer from a disease, and the goal is to identify the infected individuals. Possible applications of QGT go from bio-informatics [13], traffic monitoring [54] and confidential data transfer [1, 19] to machine learning [38, 55]. The \(\mathbb {N}-\textsf{SDP}\) can be also seen as a QGT in presence of noise. As we shall demonstrate, the algorithm we propose here, solves a noisy QGT instance, by adapting and improving (using coding theory tools, such as ISD techniques) a recent solution to the classical QGT [25]. We compare our findings with the results from [25] in two ways.

  • In the noiseless setting (the algorithm we propose can be applied to a zero noise distribution) we obtain less restrictive conditions on the parameters for attaining a high success probability. Also, smaller syndrome entries are required in our case for finding the solution of \(\mathbb {N}-\textsf{SDP}\).

  • In the noisy setting, by working on the proofs in [25], we derive a condition on the parameters to successfully retrieve a solution. However, we show that these conditions are more restrictive than ours, hence restraining even more the set of parameters.

1.3 Contributions

In this article, we analyze in detail the ISD-score decoder algorithm for the noisy \(\mathbb {N}-\textsf{SDP}\) problem and provide the following contributions.

Noise model We will focus on a Binomial noise model, more precisely, the vector \(\varvec{\epsilon }\) is such that each \({\epsilon }_{i}\sim -d+\mathcal {B}(2d,\frac{1}{2})\) (binomial centered in zero). One of the arguments for our choice is based on then noise description from [16, 29]. Due to implementation restrictions (width of the representation) there are differences in the observed noise for different widths (parameter w in [29]). The first type of errors in the estimation of the integer syndrome come from the accuracy of the side-channel distinguisher. Since, the accuracy corresponds to the probability of a correct guess, any wrong guess of the side-channel distinguisher will lead to an overestimation of the exact values with high probability. The second type of errors mentioned in [16, 29] is the double-cancellation and refers to the errors made in the approximation of the hamming distance between two vectors using the Hamming weight of these vectors. Thirdly, during computations the errors become dependent, inducing a propagation phenomenon. Here, we will experimentally consider all these factors and show that when the size of the words increases the binomial noise model is a proper theoretical model for all the considered instances.

We simulated realistic side-channel noise on all cryptographic parameters and noticed the following. Our theoretical model, i.e. the binomial noise, fits all real cryptographic scenarios with d being from linear in t in the worst case (high \(\sigma \) values for extreme real cases) down to constant (for small and more realistic \(\sigma \) values). We show that all real scenarios lead to \(d<\frac{t}{4}\) which, as we shall see later, is an acceptable noise level for practical purposes.

Performance of the ISD-score decoder We demonstrate that the ISD-score decoder finds a solution to the \(\mathbb {N}-\textsf{SDP}\) in the presence of noise with high probability, as long as the weight is sub-linear in n. Letting nktd be the \(\mathbb {N}-\textsf{SDP}\) parameters, \(\delta \) a small constant (usually less than 3) and W(x) the Lambert W function, we demonstrate the following.

Theorem

Let \(I=\left[ \sqrt{\frac{t+2d}{n-k}W\left( \frac{n-t}{n-k-t+\delta +1} \frac{e\sqrt{2}}{\pi }\right) ^2},1-\sqrt{\frac{t+2d-1}{n-k}W\left( \frac{t}{\delta +1} \frac{2e}{\pi }\right) ^2}\right] \) and \(\epsilon _i\sim -d+\mathcal {B}(2d,\frac{1}{2})\). If I is non-empty, then there is a value \(\beta \in I\) such that the ISD-score decoder succeeds in finding a solution with probability at least

$$\begin{aligned} \left( 1- \frac{e(n-t)}{\sqrt{2}\pi \beta (n-k-t+\delta +1)}\sqrt{\frac{t+2d}{n-k}}e^{-\frac{(n-k)\beta ^2}{2(t+2d)}}\right) \left( 1- \frac{et}{\pi (1-\beta )(\delta +1)}\sqrt{\frac{t+2d-1}{n-k}}e^{-\frac{(n-k)(1-\beta )^2}{2(t+2d-1)}}\right) . \end{aligned}$$

To reach our goal we partially build our demonstration on the techniques used in [25]. We incorporate the noise models into these techniques and, by using sharper inequalities, determine a much clearer condition for having a higher probability of success. For a more readable variant of our result we propose a slightly weaker version. We thus demonstrate the following.

Proposition

Let \(I_{\beta }=\left[ \sqrt{\frac{{2(t+2d)}}{{n-k}}\ln \frac{n-t}{n-k-t+\delta +1}}, 1-\sqrt{\frac{{2(t+2d-1)}}{{n-k}}\ln \frac{t}{\delta +1}}\right] \) and \(\epsilon _i\sim -d+\mathcal {B}(2d,\frac{1}{2})\). If \(I_{\beta }\not =\emptyset \) then the probability of success of the ISD-score decoder is at least

$$\begin{aligned} \left( 1-\frac{e}{2\pi }\frac{1}{\sqrt{\ln \frac{n - t}{n - k - t + \delta + 1}}}\right) \left( 1-\frac{e}{\sqrt{2}\pi }\frac{1}{\sqrt{\ln {\frac{t}{\delta +1}}}}\right) . \end{aligned}$$

The technical details of our proofs also provide theoretical and numerical evidence of the gain compared to [25]. In particular, for all the cryptographic parameters of BIKE and Classic McEliece, our analysis shows theoretical evidence of high success probability while, when using the results from [25], some parameters are outside of this scenario.

Information theoretic bounds

Next, we demonstrate that our algorithm can retrieve solutions of weight \(t\le O\left( \frac{n-k}{\ln (n-k)}\right) \), where n is the length of the code and k the dimension. We also analyze the noise level tolerated by the \(\mathbb {N}-\textsf{SDP}.\) We prove that the ISD-score decoder can tolerate noise levels that are linear in the weight of the solution t.

Another consequence of our approach is that when the noise is null and the ISD part is ignored, equivalently the ISD-score decoder boils down to the algorithm proposed in [25], the conditions we propose on the range of parameters, namely on t, are larger than those from [25]. In addition the techniques used in our demonstration allowed us to obtain sharper lower bound on the number of syndrome entries, or the number of rows in the parity-check matrix, required to find a solution, known as the information theoretic bound.

Simulations

We have demonstrated the performance of the \(\mathbb {N}-\textsf{SDP}\) in the presence of noise for different cryptographic parameters. To be more exact, we have chosen two code-based candidates at the NIST standardization process, Classic McEliece and BIKE. For both candidates we have considered increasing noise levels from \(\mathcal {B}(\frac{t}{4},\frac{1}{2})\) to \(\mathcal {B}(t,\frac{1}{2}).\) For Classic McEliece the parameter t exceeds a bit the maximum theoretic limit (\(t=n/\log _2(n)\)). Still, in simulations the ISD-score decoder finds the solution to the \(\mathbb {N}-\textsf{SDP}\) with noise using the optimization parameter \(\delta =3\) and all the syndrome entries. In the case of BIKE, where \(t=\mathcal {O}(\sqrt{n})\) with \(\delta \le 3\) only a fraction of 0.15 syndrome entries are sufficient for the ISD-score decoder to find the solution.

On another track, we compare the ISD-score decoder with the ILP solution proposed in [15, 20]. In the noiseless setting there is a significant gap between the two solutions in terms of efficiency, and here we refer to timing and ratio of syndrome required. On both aspects our algorithm outperforms the ILP solutions (interior points or simplex). In the noisy setting the difference is even more significant, the ILP fails to find a solution even for the smallest noise considered. While our algorithm benefits from the advantage of the ISD part, when \(\delta \ge 3\) it can be modified to continue with a generic ISD, the ILP on the other side does not posses such a feature.

We insist on the cryptographic context since it represents the origin of the underlying problem. The algorithm presented here was applied on practical instances. Although the public code of both BIKE and Classic McEliece are not random codes, they are indistinguishable from random codes. Also, here we do not necessarily insist on the fact that these instances are or not distinguishable, but on the fact that the public matrix entries follow a Bernoulli distribution. There are many constructions where the public codes can be distinguished from random codes, however, the their parity-check matrix is statistically "close" to a Bernoulli matrix (e.g. Niederreiter variant based on Reed-Muller codes [49], or on polar codes [48]). To support our models we have tested two hypothesis: i) the public parity-check matrix of BIKE and Classic Mceliece is distributed as a Bernouilli matrix, ii) its entries are independent; using a statistical test. We have noticed no significant difference, our hypothesis being validated with a p-value greater than or equal to 0.999 for all cryptographic parameter sets.

Table 1 Attacker model and algorithms for variants of SDP

Summary of results compared to other models

Let us briefly state how our results compare to other models such as those from Table 1.

  • Integer and noisy integer syndrome: the noisy scenario is much more realistic than the perfect integer syndrome entries as shown in several articles [16, 29]. Hence compared to [15, 16, 20] our results feature a more practical applicability. Also, the noise model analyzed here comes from more realistic scenarios compared to other works such as [16]. Indeed, by considering the side-channel leakage model in addition to the double cancellation, we converge towards a real life scenario. In the noiseless scenario, our method outperforms both the Quantitative Group Testing analysis [25] and the ILP decoder [15, 20].

  • Parts of solution leakage model: This model is not yet realistic and the assumption of having the exact value of some entries in the solution seems for the moment a bit unrealistic. Even-though the algorithm proposed here [33] are exponential in the Hamming weight of the solution and do not tolerate any noise. Let us emphasize that our method has the feature to incorporate this leakage model as well. Indeed, we can easily reduce the dimension of our problem by simply updating the syndrome with the solution entries equal to 1 and puncture the parity-check matrix on the positions of the solutions equal to 0.

  • Intermediate values: This particular scenario requires one not only to obtain the noisy integer entries but also to have access to additional information such as the intermediate values involved in the computations. The T-test score decoder from [29] has better performances in simulation compared to our method, however, no theoretical evidence was presented.

Short note

A 5 page short version of this article was presented at the Information Theory Workshop (ITW) 2022 [21]. We are extending on this short version by providing the following contributions.

  • We give full proofs of the results in [21] with additional comments. We extend the results by providing sharper statement, quantifying exactly the error probability, e.g., Corollary 2 and Theorem 2 from [21]. On top of that the technical details from the proofs reflect the gain of our method compared to [25].

  • We provide a detailed numerical simulated evidence of the approximated real noise model. We do illustrate that our theoretical model of noise provides a good approximation of the real noise model for the cryptographic parameters considered in the NIST standardization process.

  • Compared to the conference version [21], here, we compare other similar methods for solving the \(\mathbb {N}-\textsf{SDP}\) with the ISD-score decoder. Indeed, we complete the analysis by including the ILP sovers in the comparison, from both a success rate point of view as well as computation time point of view.

1.4 Outline of the article

In Section 2, we introduce the SDP and its variants, \(\mathbb {N}-\textsf{SDP}\) and \(\mathbb {N}-\textsf{SDP}\) in the presence of noise. We also recall the cryptographic context where these problems occur. Section 3 begins by recalling the score decoder proposed in [16]. Then, it analyzes the distribution of the discriminant function for the \(\mathbb {N}-\textsf{SDP}\) in the presence of noise. The section ends with the description of the ISD-score decoder. Next, we analyze the success probability of the ISD-score decoder in Section 4. The theoretical results from this part are being compared with numerical values from our implementation of the algorithm in Section 5. The section also makes a parallel between the efficiency of the ISD-score decoder and other methods such as ILP. Finally, we conclude the article in Section 6.

2 Preliminaries

2.1 Definitions and notations

Let us begin by fixing the necessary notations. A finite field is denoted by \(\mathbb {F}\), and the ring of integers by \(\mathbb {Z}\). The basis of the natural logarithm is denoted as by e, and \(\ln \) denotes natural logarithm. We write \(\mathbb {N}_{n}^*=\{1,\dots ,n\}\) and \(\mathbb {Z}_{-n,n}=\{-n,\dots ,0,\dots ,n\}\). Matrices and vectors are written in bold capital, respectively small letters. We also use \(\text {HW}(\varvec{c})\) to denote the Hamming weight of the vector \(\varvec{c}\), i.e., the number of non-zero positions of \(\varvec{c}.\)

For \(p\in [0,1]\) and \(n\in \mathbb {N}^*\) a random variable X that follows a distribution depending on p and n will be marked as \(X\sim \mathcal {D}\), in particular, \(X\sim \mathcal {B}er(p)\) for the Bernoulli distribution and by \(X\sim \mathcal {B}(n,p)\) for the Binomial distribution.

We denote by W(x) the Lambert W function [32], which is the converse of the function \(x=ye^y.\) In other words for any positive real number x the solution to the previous equation is \(y=W(x)\) (to be more precise we only use the first real branch of W which is usually denoted \(W_0(x)\) [17]). We will also require the asymptotic expansion near \(x=\infty \) which is

$$\begin{aligned} W(x)=\ln x-\ln \ln x+ O\left( \frac{\ln \ln x}{\ln x}\right) . \end{aligned}$$

Error correcting codes Let n and k be two positive integers such that \(k\le n\). An [nk] linear code can be defined as a sub-vector space of dimension k of the vector space \(\mathbb {F}^n\). A code can be specified either by its generator matrix \(\varvec{G}\in \mathbb {F}^{k \times n}\) (a basis for the code), or by its parity-check matrix \(\varvec{H}\in \mathbb {F}^{(n-k)\times n}\) (a basis for the dual code). A code \(\mathcal {C}\) is in standard form if its generator matrix is \(\varvec{G}=\left( \varvec{I}_k\mid \varvec{T}\right) .\) The minimum distance, or the Hamming distance of a code \(\mathcal {C}\), is the minimum of all \(\text {HW}(v)\) for \(\varvec{v}\in \mathcal {C},\varvec{v}\ne \varvec{0}\).

One of the main features of linear codes is their ability to decode noisy information/data. Several general decoding strategies exist, the syndrome decoding problem being one of them.

Definition 1

(Binary syndrome decoding problem \(\textsf{SDP}\))

  • Inputs: \(\varvec{H}\in \mathbb {F}_2^{(n-k)\times n}\), \(\varvec{s}^{*}\in \mathbb {F}_2^{n-k}\), \(t\in \mathbb {N}^{*}\).

  • Output: \(\varvec{x}\in \mathbb {F}_2^n\) such that \(\varvec{H}\varvec{x}=\varvec{s}^{*}\), and \(\text {HW}(\varvec{x})= t\).

This problem is NP-Complete [6] and, as we shall quickly see, it constitutes the building block of code-based solution for post-quantum cryptography.

2.2 The Niederreiter encryption framework

Both, Classic McEliece [2] and BIKE [3], are based on the Niederreiter encryption scheme [42]. The key generation, encryption and decryption functions of the Niederreiter cryptosystem are given in Algorithms 1, 2 and 3 respectively.

Algorithm 1
figure a

Niederreiter key generation

Algorithm 2
figure b

Niederreiter encryption

Table 2 (n, k, t) parameters for Classic McEliece and BIKE

To practically instantiate the schemes one must choose a family of error correcting codes, e.g., binary Goppa codes for Classic McEliece that posses strong security arguments. One of the required feature is to be indistinguishable from random codes, which is the case of all submitted proposals. Such a requirement has a theoretical implication (semantic security arguments) and a practical implication, we can set parameters as if we were dealing with random codes. In this is the case then breaking the confidentiality of the Niederreiter-like schemes resumes to solving the SDP for a random-like code. Hence, the sets of (n, k, t) parameters defined in [2] and [3] (see Table 2) are given with respect to the working factor of the best algorithm for solving the SDP. There are two different type of algorithms for solving SDP, statistical decoding [18, 34, 41, 44] and Information Set Decoding (ISD) [7, 8, 11, 22, 36, 37, 39, 40, 45, 50].

Algorithm 3
figure c

Niederreiter decryption

Let us shortly recall the ideas behind the ISD techniques, e.g., the Prange variant.

  1. 1.

    Randomly permute the columns of \(\varvec{H}\) (let \(\varvec{P}\) be the permutation matrix)

  2. 2.

    Compute the standard form of \(\varvec{H}^{*}=\varvec{H}\varvec{P}\), i.e.,

    $$\begin{aligned} \varvec{Q}\varvec{H}^{*}=\varvec{Q}\varvec{H}\varvec{P}=\begin{pmatrix} \varvec{T}&\varvec{I}_{n-k} \end{pmatrix} \end{aligned}$$
    (1)
  3. 3.

    If \(\text {HW}(\varvec{Q}\varvec{s}^{*})\le t\) then return \(\varvec{P}\begin{pmatrix} \varvec{0}_{k}\\ \varvec{Q}\varvec{s}^{*} \end{pmatrix}\); else go to Step 1

Since \( \varvec{H}\varvec{x}= \varvec{s}\) we can see that

$$\varvec{Q}\varvec{H}\varvec{P}\underbrace{\varvec{P}^{-1}\varvec{x}}_{\varvec{x}^{*}} = \begin{pmatrix} \varvec{T}&\varvec{I}_{n-k} \end{pmatrix} \begin{pmatrix} \varvec{x}^{*}_1\\ \varvec{x}^{*}_2 \end{pmatrix}= \varvec{Q}\varvec{s}^{*},$$

which yields

$$\begin{aligned} \varvec{T}\varvec{x}^{*}_1+\varvec{x}^{*}_2=\varvec{Q}\varvec{s}^{*}. \end{aligned}$$

Now, if \(\varvec{x}^{*}_1=\varvec{0}\) we deduce \(\varvec{x}^{*}_2=\varvec{Q}\varvec{s}^{*}\) and thus \(\varvec{x}^{*}=\begin{pmatrix} \varvec{0}\\ \varvec{Q}\varvec{s}^{*} \end{pmatrix}\) is a valid solution to the SDP. Prange’s algorithm samples permutations until the vector \(\varvec{x}^{*}_1\) equals zero, or equivalently until an information set is found. Variants of ISD offer time optimizations by allowing different relaxations of the weight condition on \(\varvec{x}^{*}_1.\)

2.3 Integer version of the syndrome decoding problem

Recent message recovery attacks are pointing the encryption step, where the cipher-text is obtained from the multiplication of the public parity-check matrix \(\varvec{H}_{\text {pub}}\) and the secret error vector \(\varvec{x}.\) Hence, in [15, 16, 29] the matrix-vector multiplication is targeted as leakage point (line 3 in Algorithm 2). The physical scenario reveals the possibility of retrieving extra information during the multiplications. More exactly, it was shown that it is possible to either change the instruction code in the Flash memory and thus set it to ADD instead of XOR [15] or to recover by side-channel measurements an approximation of the real/natural value of \(\varvec{s}^{*}\) [16]. Both scenarios lead to a modified version of the binary SDP.

Definition 2

(\(\mathbb {N}-\textsf{SDP}\))

  • Inputs: \(\varvec{H}\in \{0,1\}^{(n-k)\times n}\), \(\varvec{s}\in \mathbb {N}^{n-k}\), \(t\in \mathbb {N}^{*}\).

  • Output: \(\varvec{x}\in \{0,1\}^n\), such that \(\varvec{H}\varvec{x}=\varvec{s}\), and \(\text {HW}(\varvec{x})=t\).

As pointed out in [29] the integer value \(\varvec{s}\) is often difficult to obtain. More exactly, in [29] in was shown that there are different type of noise that interfere with the intermediate estimations, fact that leads to a noisy integer syndrome. To define \(\mathbb {N}-\textsf{SDP}\) in the presence of noise as generally as possible, we model the noise \(\epsilon =(\epsilon _1,\dots ,\epsilon _{n-k})\) as a vector of random variables \(\epsilon _i\sim \mathcal {D}\), where \(\mathcal {D}\) is a discrete probability distribution. In the \(\mathbb {N}-\textsf{SDP}\) in the presence of noise, instead of having access to an instance of the \(\mathbb {N}-\textsf{SDP}\), i.e., \((\varvec{H},\varvec{s},t)\), we are given a noisy syndrome \(\widetilde{\varvec{s}}=\varvec{s}+\epsilon \) and the value \(\varvec{s}^{*}=\varvec{s}\pmod {2}\) (component-wise).

Definition 3

\(\mathbb {N}-\textsf{SDP}\) in the presence of noise \(\epsilon \))

  • Inputs: \(\varvec{H}\in \{0,1\}^{(n-k)\times n}\), \(\widetilde{\varvec{s}}\in \mathbb {Z}^{n-k}\) \({\varvec{s}^{*}}\in \{0,1\}^{n-k}\), \(t\in \mathbb {N}^{*}\)

  • Output: \(\varvec{x}\in \{0,1\}^n\), such that \(\varvec{H}\varvec{x}=\varvec{s}^{*}\) with \(\text {HW}(\varvec{x})= t\) \(\varvec{s}^{*}=\varvec{s}\mod 2\), and \(\widetilde{\varvec{s}}=\varvec{s}+\epsilon \).

Remark that \(\mathbb {N}-\textsf{SDP}\) in presence of noise is the \(\textsf{SDP}\) with additional information. Under certain conditions, we hope that, given \((\varvec{H},\varvec{s}^{*},t,\widetilde{\varvec{s}})\), we can find \(\varvec{x}\), solution to the \(\textsf{SDP}\). Also, when the noise is zero we face the classic \(\mathbb {N}-\textsf{SDP}.\)

3 ISD-score decoder

The idea of assigning a score to each column was already used for the \(\mathbb {N}-\textsf{SDP}\) in [16]. The objective is to distinguish columns of \(\varvec{H}\) in the support of the solution vector from columns which are outside the support. We shall begin by defining a score decoder, as introduced in [25], that proved to be particularly discriminant in the context of \(\mathbb {N}-\textsf{SDP}\). For a better illustration of the nice features of the decoder in the presence of noise, we will express it in function of the noiseless decoder. As we shall see, this method allows not only to derive a particularly simple relation between those two, but also to deduce conditions on the tolerated noise level.

Definition 4

Let \(\varvec{H}\in \{0,1\}^{(n-k)\times n}, \varvec{s}\in \mathbb {N}^{n-k}\) and \(t\in \mathbb {Z}^{*}\) be the input of \(\mathbb {N}-\textsf{SDP}\). Then define the score of a column:

$$\begin{aligned} \forall i\in \mathbb {N}_{n}^*\quad \psi _i(\varvec{s})=\sum _{\ell =1}^{n-k}\left( h_{\ell ,i}\varvec{s}_{\ell }+(1-h_{\ell ,i})(t-\varvec{s}_{\ell }))\right) . \end{aligned}$$
(2)

For the \(\mathbb {N}-\textsf{SDP}\) in the presence of noise we shall use \(\psi _i(\widetilde{\varvec{s}})\). The next result, rephrased from [25], expresses the capability of the score decoder to distinguish between columns in the support of the solution vector from columns which are outside the support.

Theorem 1

Let \(\varvec{H}\in \{0,1\}^{(n-k)\times n}\) be a random matrix, with distribution given by \(h_{j,i}\sim \mathcal {B}er(\frac{1}{2})\) and \(\varvec{s}\in \mathbb {N}^{n-k}\) such that \(\exists \; \varvec{x}\in \{0,1\}^{n}\) with \(\text {HW}(\varvec{x})=t\) satisfying \(\varvec{H}\varvec{x}=\varvec{s}\). Then

$$\begin{aligned} \psi _i(\varvec{s})\sim \left\{ \begin{array}{lc} \mathcal {B}((n-k)t,\frac{1}{2})&{}, i\not \in \textsf{Supp}(\varvec{x}) \\ \mathcal {B}((n-k)(t-1),\frac{1}{2})+n-k&{}, i\in \textsf{Supp}(\varvec{x}) \end{array} \right. \end{aligned}$$

Straightforward from Theorem 1 we have \(\mathbb {E}(\psi _i(\varvec{s}))=(n-k)t/{2}\) for \(i\not \in \textsf{Supp}(\varvec{x})\) and \(\mathbb {E}(\psi _i(\varvec{s}))=(n-k)t/{2}+(n-k)/2\) for \(i\in \textsf{Supp}(\varvec{x}).\)

The difference in the average value points out that \(\psi \) can be a distinguisher between positions in the support and outside the support of the vector \(\varvec{x}\). In addition, the variance also differs, fact that will be used in the tail bounds. Moving forward, we will consider the noisy version of this problem in the next section.

3.1 Score decoder in the presence of noise

As in [16], we make some assumptions on the noise considered here, i.e., \(\epsilon _i\) are independent and identically distributed random variables, the noise does not depend on the distribution of the entries in \(\varvec{H}\) and the distribution \(\mathcal {D}\) is symmetric.

Proposition 1

([16]) For \(j\in \mathbb {Z}_{n-k}^*\) let \(\epsilon _j\) be independent and identically distributed discrete random variables following a symmetric distribution over the set \(\mathbb {Z}_{-d,d}\), such that \(\epsilon _j\) and \(h_{i,j}\) are independent.

Then

$$\begin{aligned} \textsf{Prob}\left( \psi _i(\widetilde{\varvec{s}})-\psi _i(\varvec{s})=\alpha \right) =\textsf{Prob}\left( \sum _{j=1}^{n-k}\epsilon _j=\alpha \right) . \end{aligned}$$

Proof

Let \(Y_{\ell ,i}=(2h_{\ell ,i}-1)\epsilon _\ell .\) Then we have,

$$\begin{aligned} \psi _i(\widetilde{\varvec{s}})&=\sum _{\ell =1}^{n-k}\left( h_{\ell ,i}(\widetilde{s_{\ell }}+(1-h_{\ell ,i})(t-\widetilde{s_{\ell }}))\right) \\&=\sum _{\ell =1}^{n-k}\left( h_{\ell ,i}({s_{\ell }+\epsilon _\ell }+(1-h_{\ell ,i})(t-{s_{\ell }}-\epsilon _\ell ))\right) \\ \psi _i(\widetilde{\varvec{s}})&=\psi _i(\varvec{s})+\sum _{\ell =1}^{n-k}\underbrace{\left( h_{\ell ,i}\epsilon _\ell -(1-h_{\ell ,i})\epsilon _\ell \right) }_{Y_{l,i}}\end{aligned}$$

For any fixed value of \(\ell \in \mathbb {Z}_{n-k}^*\) we have \(\textsf{Prob}(Y_{\ell ,i}=\alpha _\ell )=\textsf{Prob}(\epsilon _{\ell }=\alpha _\ell )\) for any \(\alpha _\ell \in \mathbb {Z}_{-d,d}\) (using the symmetry property and the independence of \(h_{\ell ,i}\) and \(\epsilon _{\ell }\)). Hence \(Y_{\ell ,i}\) follows the same distribution as \(\epsilon _{\ell }.\) Thus, \(\psi _i(\widetilde{\varvec{s}})-\psi _i(\varvec{s})\in \mathbb {Z}_{-(n-k)d,(n-k)d}\) with probability distribution \(\textsf{Prob}(\psi _i(\widetilde{\varvec{s}})-\psi _i(\varvec{s})=\alpha )=\textsf{Prob}\left( \sum _{j=1}^{n-k}\epsilon _j=\alpha \right) \).

Keeping the difference \(\psi _i(\widetilde{\varvec{s}})-\psi _i(\varvec{s})\) as small as possible resumes to controlling the sum of \(\epsilon _j.\) The variance of \(\epsilon _j\) plays a crucial role in the distinguishing capacity of \(\psi \).

Proposition 2

For any \(j\in \mathbb {Z}_{n-k}^{*}\) let \(\epsilon _j\) be a discrete random variable satisfying the conditions from Proposition 1 and let \(\sigma ^2=Var(\epsilon _j)\). Let g(nkt) be a function in the parameters of \(\mathbb {N}-\textsf{SDP}\). Then for any \(\alpha >\sigma \sqrt{(n-k) g(n,k,t)}\)

$$\begin{aligned} \textsf{Prob}(\psi _i(\widetilde{\varvec{s}})-\psi _i(\varvec{s})\ge \alpha )\le \dfrac{1}{g(n,k,t)}. \end{aligned}$$
(3)

Proof

Use Chebyshev’s inequality for the sum of \(\epsilon _j\) and the linearity of the variance.

The case of centered binomial noise

Corollary 1

Let \(d\in \mathbb {N}\) and \(\epsilon _i\sim -d+\mathcal {B}(2d,\frac{1}{2}).\) Then

  • for \(i\not \in \textsf{Supp}(\varvec{x})\)

    $$\begin{aligned} \psi _i(\widetilde{\varvec{s}})\sim -d(n-k)+\mathcal {B}\left( (n-k)(t+2d),\frac{1}{2}\right) ; \end{aligned}$$
  • for \(i\in \textsf{Supp}(\varvec{x})\)

    $$\begin{aligned} \psi _i(\widetilde{\varvec{s}})\sim -(d-1)(n-k)+\mathcal {B}\left( (n-k)(t-1+2d),\frac{1}{2}\right) . \end{aligned}$$

Moreover, \(\mathbb {E}(\psi _i(\widetilde{\varvec{s}}))=\mathbb {E}(\psi _i(\varvec{s}))\) and \(Var(\psi _i(\widetilde{\varvec{s}}))=Var(\psi _i(\varvec{s}))+(n-k)d/2\).

To maintain the capability to distinguish between positions inside the support and positions outside the support, the noise parameter d from \(\mathcal {B}(2d,\frac{1}{2})\) should be restricted.

Corollary 2

Let \(\epsilon _i\sim -d+\mathcal {B}(2d,\frac{1}{2})\) and g(nkt) an unbounded function in tnk. Then we have

$$\begin{aligned} \textsf{Prob}\left( \left| \psi _i(\widetilde{\varvec{s}})-\psi _i(\varvec{s})\right| \le \sqrt{\frac{d(n-k)g(n,k,t)}{2}}\right) \ge 1-\frac{2}{g(n,t,k)}. \end{aligned}$$

Moreover, for any \(d\le \frac{n-k}{8g(n,k,t)}\), the function \(\psi (\widetilde{\varvec{s}})\) distinguishes positions in \(\textsf{Supp}(\varvec{x})\) from positions outside \(\textsf{Supp}(\varvec{x}).\)

The key idea of the distinguisher is that for \(i\not \in \textsf{Supp}(\varvec{x})\) the upper limit of the confidence interval will be smaller that the lower limit of the confidence interval for \(i\in \textsf{Supp}(\varvec{x}).\) More exactly, to distinguish with probability at least \(1-\frac{1}{g(n,t,k)}\) one needs to have

$$\begin{aligned} \frac{(n-k)t}{2}+\sqrt{\frac{d(n-k) g(n,k,t) }{2}}\le \frac{(n-k)t}{2}+\frac{n-k}{2}-\sqrt{\frac{d(n-k)g(n,k,t)}{2}}, \end{aligned}$$
(4)

which yields \(d\le \frac{n-k}{8g(n,k,t)}.\) In particular, we can put \(g(n,k,t)=\ln \ln t\) or \(g(n,k,t)=\ln \ln n\) depending on the wanted speed of convergence. Figure 1 shows the distribution of \(\psi _i\) values for different levels of noise, ranging from \(d=0\), i.e. the noiseless setting, to a very high noise of \(\mathcal {B}(2t,\frac{1}{2})\). Notice that the distinguishing capability is much higher for the BIKE parameters, as shown in Fig. 1a, than for the Classic McEliece parameters, as shown in Fig. 1b.

Bernoulli noise

Proposition 3

Let \(\epsilon _i\sim \mathcal {B}er(\{0,1\},1/2)\). Then \(\psi _i(\widetilde{\varvec{s}})\) is a random variable that follows the distribution

$$ \psi _i(\widetilde{\varvec{s}})\sim \left\{ \begin{array}{lc} \mathcal {B}((n-k)(t+2),\frac{1}{2})-(n-k)&{}, i\not \in \textsf{Supp}(\varvec{x}) \\ \mathcal {B}((n-k)(t+1),\frac{1}{2})&{}, i\in \textsf{Supp}(\varvec{x}) \end{array}\right. $$

Moreover, \(\mathbb {E}(\psi _i(\widetilde{\varvec{s}}))=\mathbb {E}(\psi _i(\varvec{s}))\) and \(Var(\psi _i(\widetilde{\varvec{s}}))=Var(\psi _i(\varvec{s}))+(n-k)/2\).

Notice that, in the case of a Bernoulli type of noise, the behavior is equivalent to the case of a centered binomial noise. (equivalent to \(d=1\) in Corollary 1). Indeed, the result in Proposition 3 is equivalent to the one given in Corollary 1 with \(d=1\).

Fig. 1
figure 1

Distribution of \(\psi _i\) for \(\epsilon \sim -d+\mathcal {B}(2d,\frac{1}{2})\)

3.2 Combining ISD and score decoder

The idea in [16] was to boost the distinguishing capability of the score decoder with ISD-like techniques. To this end, the score decoder is integrated in the “permutation" step of the ISD method. Indeed, this method starts by performing a permutation on the columns of \(\varvec{H}\) that will hopefully rearrange the solution in a useful way. In the original ISD methods, permutations are sampled randomly until a “good” one is obtained. Thanks to the extra-information provided by \(\varvec{s}\) or \(\widetilde{\varvec{s}}\), the function \(\psi \) allows to construct a permutation which by no means is random. Indeed, we have seen that \(\psi \), by its nature, allows one to distinguish between positions in the support of \(\varvec{x}\) and positions outside. Hence, the underlying permutation, hopefully is a “good” permutation. As pointed out in [16], sorting the list of values \(\psi _i(\widetilde{\varvec{s}})\) in descending order is equivalent to generating a permutation \(\varvec{\Pi }\). Algorithm 4 finds a solution to the \(\mathbb {N}-\textsf{SDP}\) in the presence of noise as long as \(\varvec{\Pi }\) is “good” enough.

Algorithm 4
figure d

Prange Score Decoder

The procedure rref\((\varvec{H}\varvec{\Pi })\), which stands for “reduced row echelon form”, is equivalent to performing a partial Gaussian elimination over \(\mathbb F_2\). Indeed, there is an \((n-k)\times (n-k)\) non-singular matrix \(\varvec{A}^{*}\) such that, \(\varvec{A}^{*}\varvec{H}\varvec{\Pi }=\left[ \begin{bmatrix} \varvec{I_r}\\ \varvec{0_{n-k-r,r}} \end{bmatrix}\mathbin \Vert \varvec{B}^{*}\right] \) where \(\varvec{H}\varvec{\Pi }=[\varvec{A}\mathbin \Vert \varvec{B}]\) with \(\varvec{A}\) a \((n-k) \times r\) matrix satisfying \(\varvec{A}^{*}\varvec{A}=\begin{bmatrix} \varvec{I_r}\\ \varvec{0_{n-k-r,r}} \end{bmatrix}\), and \(\varvec{B}^{*}=\varvec{A}^{*}\varvec{B}\).

In the case of a full rank matrix \(\varvec{A}\) we have \(\varvec{A}^{*}\varvec{A}=\varvec{I_{n-k}}\). From the description of the algorithm above, the following result can be deduced.

Proposition 4

([16]) Prange Score Decoder outputs a valid solution as long as there exists at least one set \(L\subset \mathbb {N}_{n}^{*}\setminus \textsf{Supp}(\varvec{x})\) with \(\#L\ge n-r\) such that \(\min \{\psi _i(\widetilde{\varvec{s}}),i\in \textsf{Supp}(\varvec{x})\}>\max \{\psi _i(\widetilde{\varvec{x}}),i \in L\}\).

The overall time complexity of Prange Score Decoder is \(\mathcal {O}((n-k)^3)\), since it is dominated by the partial Gaussian elimination, i.e. the computation of \(\varvec{A}^*\).

Since the permutation \(\varvec{\Pi }\) might not move all the positions in the support of \(\varvec{x}\) in the first \(n-k\) positions, more powerful ISD methods may be used, e.g. Lee-Brickell [36], Stern [50] or Dumer [22]. The idea is to allow a number of \(\delta \) positions from \(\textsf{Supp}(\varvec{x})\) outside the first \(n-k\) positions. This is equivalent to extending Prange Score Decoder so that it covers error vectors with a more general pattern. The Lee-Brickell Score Decoder, where \(\delta \) positions are searched exhaustively, is thus proposed in [16] as a possible solution.

Algorithm 5
figure e

Lee-Brickell Score Decoder ([16])

When the Lee-Brickell variant is used and \(\delta =\mathcal {O}(1), k=\mathcal {O}(n)\), the work factor of the resulting algorithm becomes polynomial in n.

Proposition 5

The \(\delta \)-ISD-score decoder outputs a valid solution as long as there are at most \(\delta \) indices \(i\in \textsf{Supp}(\varvec{x})\) with values \(\psi _i(\widetilde{\varvec{s}})<\psi _j(\widetilde{\varvec{s}})\) with j in a set \(J\subset \mathbb {N}_n\) of cardinality \(n-k\).

4 Success probability of the ISD-score decoder

The following result gives a condition on the parameters for having a high probability of success for the ISD score decoder on the \(\mathbb {N}-\textsf{SDP}\) in presence of noise.

Theorem 2

Let \(I=\left[ \sqrt{\frac{t+2d}{n-k}W\left( \frac{n-t}{n-k-t+\delta +1} \frac{e\sqrt{2}}{\pi }\right) ^2},1-\sqrt{\frac{t+2d-1}{n-k}W\left( \frac{t}{\delta +1} \frac{2e}{\pi }\right) ^2}\right] \) and \(\epsilon _i\sim -d+\mathcal {B}(2d,\frac{1}{2})\). If I is non-empty, then there is a value \(\beta \in I\) such that the ISD-score decoder succeeds in finding a valid solution with probability at least

$$ \left( 1- \frac{e(n-t)}{\sqrt{2}\pi \beta (n-k-t+\delta +1)}\sqrt{\frac{t+2d}{n-k}}e^{-\frac{(n-k)\beta ^2}{2(t+2d)}}\right) \left( 1- \frac{et}{\pi (1-\beta )(\delta +1)}\sqrt{\frac{t+2d-1}{n-k}}e^{-\frac{(n-k)(1-\beta )^2}{2(t+2d-1)}}\right) . $$

4.1 Technicalities of theorem 2

To prove this theorem we shall use 3 steps. More precisely, we first give an estimation on the tails of the distributions \(\psi _i(\widetilde{\varvec{s}})\), then we insert these results into a generic upper bound on the probability of success of the ISD-score decoder, and finally we study the range of parameters for which our conditions are valid.

4.1.1 Tail bounds on the distribution

Firstly we have the following result on the distribution of \(\psi \) in the noiseless scenario.

Theorem 3

Let \(\beta \in (0,1)\) and \(B_{\beta }=\frac{(n-k)t}{2}+\frac{\beta (n-k)}{2}\). Then we have for \(i\not \in \textsf{Supp}(\varvec{x})\)

$$\begin{aligned} \textsf{Prob}\left( \psi _i(\varvec{s})\ge B_{\beta }\right) \le \dfrac{e}{\sqrt{2}\pi \beta }\sqrt{\frac{t}{n-k}}e^{-\frac{n-k}{2t}\beta ^2}, \end{aligned}$$
(5)

for \(i\in \textsf{Supp}(\varvec{x})\)

$$\begin{aligned} \textsf{Prob}\left( \psi _i(\varvec{s})\le B_{\beta }\right) \le \dfrac{e}{\pi (1-\beta )}\sqrt{\frac{t-1}{n-k}}e^{-\frac{n-k}{2(t-1)}(1-\beta )^2}. \end{aligned}$$
(6)

Moving forward, in the case of a binomial noise we have

Theorem 4

Let \(\epsilon _i\sim -d+\mathcal {B}(2d,\frac{1}{2})\), \(\beta \in (0,1)\) and \(B_{\beta }\) as previously defined. Then we have for \(i\not \in \textsf{Supp}(\varvec{x})\)

$$\begin{aligned} \textsf{Prob}\left( \psi _i(\widetilde{\varvec{s}})\ge B_{\beta }\right) \le \dfrac{e}{\sqrt{2}\pi \beta }\sqrt{\frac{t+2d}{n-k}}e^{-\frac{(n-k)\beta ^2}{2(t+2d)}}, \end{aligned}$$
(7)

for \(i\in \textsf{Supp}(\varvec{x})\)

$$\begin{aligned} \textsf{Prob}\left( \psi _i(\widetilde{\varvec{s}})\le B_{\beta }\right) \le \dfrac{e}{\pi (1-\beta )}\sqrt{\frac{t+2d-1}{n-k}}e^{-\frac{(n-k)(1-\beta )^2}{2(t+2d-1)}}. \end{aligned}$$
(8)

The proof of the two theorems above is given in the Appendix. Let us denote the two upper bounds in Theorem 4 by \(\textrm{Ub}_{\textsf{Supp}(\varvec{x})}(n,k,t,\beta )\) and \(\textrm{Ub}_{\textsf{Supp}(\varvec{x})^c}(n,k,t,\beta ).\)

4.1.2 A general bound on the success probability using tail estimations

A general theorem regarding the success probability of ISD-score decoder can be stated. For that we suppose that the distribution \(\psi _i(\widetilde{\varvec{s}})\) when \(i\in \textsf{Supp}(\varvec{x})\) has to be different from \(\psi _i(\widetilde{\varvec{s}})\) when \(i\not \in \textsf{Supp}(\varvec{x})\), e.g., it is at least shifted. If not it is obvious that ISD-score decoder can not retrieve a valid solution with high probability.

Theorem 5

Let \(\psi _i(\widetilde{\varvec{s}})\) be random variables and f(nktdB), g(nktdB) be two functions such that

$$\begin{aligned} \textsf{Prob}(\psi _i(\widetilde{\varvec{s}})\le B){} & {} \le e^{-f(n,k,t,d,B)}\quad , i\in \textsf{Supp}(\varvec{x})\end{aligned}$$
(9)
$$\begin{aligned} \textsf{Prob}(\psi _i(\widetilde{\varvec{s}})\ge B){} & {} \le e^{-g(n,k,t,d,B)}\quad , i\not \in \textsf{Supp}(\varvec{x}) \end{aligned}$$
(10)

The ISD-score decoder finds the solution if \(\exists B^{*}\) such that

  • \(0\le 1-\frac{t}{\delta +1}e^{-f(n,k,t,d,B^*)}\le 1\),

  • \(0\le 1-\frac{n-t}{n-k-t+\delta +1}e^{-g(n,k,t,d,B^*)}\le 1\),

  • \(\frac{t}{\delta +1}e^{-f(n,k,t,d,B^{*})}+\frac{n-t}{n-k-t+\delta +1}e^{-g(n,k,t,d,B^{*})}\) is close to zero,

Typically, the theorem gives a sufficient condition for having a high probability of success. Indeed, if one finds a value \(B_{\beta }\) for which the lower bound tends to 1 then the Score function achieves its goal, namely to distinguish positions in the support of \(\varvec{x}\) from those outside it. The proof of this result is given in the Appendix.

Combining the tail bounds on the distribution of \(\psi _i(\widetilde{\varvec{s}})\) with the condition on \(\beta ^*\) for having a high probability of success enables the following result. Denote

$$ \textrm{Lb}_{\textsf{Supp}(\varvec{x})^c}=1- \frac{e(n-t)}{\sqrt{2}\pi \beta (n-k-t+\delta +1)}\sqrt{\frac{t+2d}{n-k}}e^{-\frac{(n-k)\beta ^2}{2(t+2d)}}, $$
$$ \textrm{Lb}_{\textsf{Supp}(\varvec{x})}=1- \frac{et}{\pi (1-\beta )(\delta +1)}\sqrt{\frac{t+2d-1}{n-k}}e^{-\frac{(n-k)(1-\beta )^2}{2(t+2d-1)}}. $$

Proposition 6

Let \(\epsilon _i\sim -d+\mathcal {B}(2d,\frac{1}{2}).\) If \(\exists \beta ^{*}\in (0,1)\) such that \(\textrm{Lb}_{\textsf{Supp}(\varvec{x})},\textrm{Lb}_{\textsf{Supp}(\varvec{x})^c} \in [0,1].\) The probability that ISD-score decoder succeeds in finding a valid solution is at least \(\textrm{Lb}_{\textsf{Supp}(\varvec{x})}\textrm{Lb}_{\textsf{Supp}(\varvec{x})^c}.\)

Corollary 3

When \(d=0\) and \(\delta =0\) the condition on \(\beta ^{*}\) simplifies to

  • \(0\le \frac{et}{\pi (1-\beta )}\sqrt{\frac{t}{n-k}}e^{-\frac{(n-k)(1-\beta )^2}{2t}}\le 1\),

  • \(0\le \frac{e(n-t)}{(\sqrt{2}\pi \beta )(n-k-t)}\sqrt{\frac{t}{n-k}}e^{-\frac{(n-k)\beta ^2}{2t}}\le 1\),

  • \(\frac{et}{\pi (1-\beta )}\sqrt{\frac{t}{n-k}}e^{-\frac{(n-k)(1-\beta )^2}{2t}} + \frac{e(n-t)}{(\sqrt{2}\pi \beta )(n-k-t)}\sqrt{\frac{t}{n-k}}e^{-\frac{(n-k)\beta ^2}{2t}}\) is close to zero,

To fairly compare with state-of-the-art techniques such as the algorithm in [25], which is only valid for the noiseless scenario, we adapted the conditions from [25] to the noise model considered here. This gives two similar functions in \(\beta \), namely \(1- \frac{n-t}{n-k-t}e^{-\frac{(n-k)\beta ^2}{2(t+2d)}},\) and \(1-te^{-\frac{(n-k)(1-\beta )^2}{2(t+2d-1)}}.\) In Fig. 2, we plot the modified functions from [25] (dashed lines) and \(\textrm{Lb}_{\textsf{Supp}(\varvec{x})},\textrm{Lb}_{\textsf{Supp}(\varvec{x})^c}\) (solid lines).

Fig. 2
figure 2

Valid \(\beta \) interval from the bounds in [25] (dashed lines) and the proposed ones (solid lines)

In dark green and light green, the valid interval/region for the adapted functions from [25], and our functions, respectively, are represented. Notice that for all parameter sets and all noise levels considered here, our function offers a larger interval. Hence, this implies that for some sets of parameters, e.g., in Fig. 2d, the interval is empty w.r.t. conditions in [25], while w.r.t. our conditions the interval exists.

4.1.3 Range of valid parameters

Here, we shall determine the conditions on the parameters such that the conditions in Proposition 6 are satisfied. We will begin by determining the existence of \(\beta ^{*}\). We will need to denote by W(x) the Lambert W function.

Proposition 7

For any \(\beta \ge \sqrt{\frac{t+2d}{n-k}W\left( \frac{n-t}{n-k-t+\delta +1} \frac{e}{\sqrt{2}\pi }\right) ^2}\) we have that \(\frac{n-t}{n-k-t+\delta +1}\textrm{Ub}_{\textsf{Supp}(\varvec{x})^c}(n,k,t,d,\beta )\le 1\), and for any \(\beta \le 1-\sqrt{\frac{t+2d-1}{n-k}W\left( \frac{t}{\delta +1} \frac{e}{\pi }\right) ^2}\) we have that \(\frac{t}{\delta +1}\textrm{Ub}_{\textsf{Supp}(\varvec{x})}(n,k,t,d,\beta )\le 1.\)

Having both functions positive and strictly smaller than 1, at the same time, can be achieved as long the interval defined by the two extreme points, in the previous proposition is non-empty, i.e., \(\sqrt{\frac{t+2d}{n-k}W\left( \frac{n-t}{n-k-t+\delta +1} \frac{e}{\sqrt{2}\pi }\right) ^2}\le 1-\sqrt{\frac{t+2d-1}{n-k}W\left( \frac{t}{\delta +1} \frac{e}{\pi }\right) ^2}.\)

To give a more sensitive meaning of our result, we could approximate the value of the Lambert W function by \(W(m)=\ln m-\ln \ln m+\frac{\ln \ln m}{\ln m}\) as m tends to infinity. Using only the first term we define \(I_{\beta }=\left[ \sqrt{\frac{{2(t+2d)}}{{n-k}}\ln \frac{n-t}{n-k-t+\delta +1}}, 1-\sqrt{\frac{{2(t+2d-1)}}{{n-k}}\ln \frac{t}{\delta +1}}\right] .\) Hence, we deduce the following result.

Proposition 8

If \(I_{\beta }\not =\emptyset \) then the probability of success of the ISD-score decoder is at least

$$\begin{aligned} \left( 1-\frac{e}{2\pi }\frac{1}{\sqrt{\ln \frac{n - t}{n - k - t + \delta + 1}}}\right) \left( 1-\frac{e}{\sqrt{2}\pi }\frac{1}{\sqrt{\ln {\frac{t}{\delta +1}}}}\right) . \end{aligned}$$

Typically, our result gives a sub-interval where the conditions are safely satisfied. When simulations are to be performed, one could solve the inequalities in order to determine a more accurate interval. However, using more terms in the expansion of W(m) yields the following.

Corollary 4

Let \(f_{n,k,t,\delta }=\frac{n-t}{n-k-t+\delta +1}\) and \(f_{t,\delta }^{*}=\frac{t}{\delta +1}.\) The extreme points of the interval where the first two conditions in Theorem 6 are satisfied, converges to \(\sqrt{\frac{t+2d}{n-k}\left( 2\ln f_{n,k,t,\delta }-\ln 2\ln f_{n,k,t,\delta } +\frac{\ln 2\ln f_{n,k,t,\delta }}{2\ln f_{n,k,t,\delta }}\right) },\) and \(1-\sqrt{\frac{t+2d-1}{n-k}\left( 2\ln f_{t,\delta }^{*}-\ln 2\ln f_{t,\delta }^{*} +\frac{\ln 2\ln f_{t,\delta }^{*}}{2\ln f_{t,\delta }^{*}}\right) }\).

4.2 Information-theoretic bounds

4.2.1 Bounding the value of t

To see how large the weight of the error t must be to in order to still posses a non-empty interval, the following rough estimate can be used.

Theorem 6

\(\varvec{(}\,\mathbf {Upper\, bound\, on\,} t\,\varvec{)}\) Let \(k\le n-t+\delta +1-(n-t)(\delta +1)/t\) and \(d=ct/2\). Then \(I_{\beta }\not =\emptyset \) as long as we have

$$\begin{aligned} t\le \frac{n-k}{8(1+c)W\left( \frac{n-k}{8(1+c)(\delta +1)}\right) } \end{aligned}$$
(11)

Moreover, when \(n\rightarrow \infty \), we have that \(t\le \mathcal {O}\left( \frac{n-k}{\ln (n-k)}\right) \).

Using a first term approximation for the Lambert W function near infinity, we obtain a threshold on t. More exactly this value can be approximated by \( \frac{n-k}{8(1+c)\ln \frac{n-k}{8(1+c)(\delta +1)}}\).

Now, recall that we have determined a preliminary condition on d, such that the \(\psi \) function can distinguish between positions in the support of the solution and outside it. This condition was \(d\le \frac{n-k}{8\ln \ln (n-k)}\). Taking a slightly smaller noise level, e.g. \(d=\frac{n-k}{8\ln (n-k)}\le \frac{n-k}{8\ln \ln (n-k)}\) validates the choice in the hypothesis \(d=ct/2\), as per Theorem 6\(t\le \mathcal {O}\left( \frac{n-k}{\ln (n-k)}\right) \). Taking into account this condition and the hypothesis of Theorem 6, i.e. \(d=ct/2\), we deduce the following upper bound on t

$$\begin{aligned} d=\frac{ct}{2}\le \frac{n-k}{8\ln t}~~\Rightarrow ~~ t\ln t\le \frac{n-k}{4c}. \end{aligned}$$
(12)

This improves the constant term by \(\displaystyle t\le ~\frac{n-k}{4cW(\frac{n-k}{4c})}\).

Remark 1

When we consider the conditions from Proposition 7 we can deduce a similar, but stronger condition on t. Indeed under the same assumption on k we have

$$\begin{aligned} t\le \frac{n-k}{(12(1+c)W\left( \frac{1}{3}\left( \frac{n-k}{4(1+c)(\delta +1)}\right) ^{\frac{2}{3}}\right) } \end{aligned}$$
(13)

Asymptotically we obtain the same behavior, however, the constant factors matter when numerical simulations are performed (Table 3).

Table 3 Theoretical upper bounds on t

4.2.2 Bounding the required ratio of syndrome entries

The existence of a value such that the ISD-score decoder succeeds in finding a solution using fewer syndrome entries could be deduced. It suffices to replace \((n-k)\) with \(\gamma (n-k)\), where \(\gamma \in (0,1]\) represents the percentage of syndrome entries required to achieve a high probability. This value can be deduced from Theorem 6. Typically, given a number of rows \(n-k\), the maximum value of t for which the success probability is close enough to 1 also determines the minimum number of required rows. More exactly, for a fixed value of t and \(n-k\), we can compute \(\gamma (n-k)\), the value for which t satisfies \(8t(1+c)\ln \frac{t}{\delta +1}=\gamma (n-k)\). By Theorem 6, with only \(\gamma (n-k)\) rows, one can recover a solution of weight at most t with high probability. Formally, the following holds.

Corollary 5

Let \(d=ct/2\) where c is a constant. Then the minimum quantity of information required by the ISD-score decoder to find a valid solution is \(4(1+c)t\ln \frac{t}{\delta +1}\). Moreover, in the noiseless scenario, the minimum quantity of information becomes \( 4t\ln \frac{t}{\delta +1}\).

Consequently, we deduce that one could improve the constant term, however, not lower than \(2(1+c)t\ln \frac{t}{\delta +1}\).

On a parallel track, when we analyze the condition on \(n-k\) from Theorem 6 we find that \(n-k\ge t+(\delta +1)n/t-2(\delta +1).\) Equivalently this leads to a minimum dimension of order \(\mathcal {O}\left( \max (t,\frac{n}{t})\right) .\) Hence, there is a universal condition on the fraction on syndrome entries required to solve the problem regardless of the relation between d and t. Indeed, choosing the particular relation \(d=ct/2\) allows to determine the maximum value of t, but this value depends on d. Typically, all these variables are linked together. That is why we can express the tolerated noise level in function of the syndrome entries and weight t, or the minimum syndrome entries in function of the maximum decodable weight t which depends on the noise level.

5 Experimental results

The following experiments have been carried out on a standard laptop embedding an 8-core processor running at 1.6 GHz and 32 GB of RAM. The ILP solver we used is provided by the Scipy Python package [53] under the scipy.optimize.linprog function. The score decoder is implemented using the Numpy Python package [31] to perform matrix computations.

Fig. 3
figure 3

Simulated noise distribution

5.1 Noise model

We have simulated the noise model as per [29]. More exactly, we will handle here two type of errors, coming from the accuracy of the side-channel distinguisher and double-cancellation. Thus, the noise will consist of two parts \(\text {HW}(\varvec{b}_{i,j})+\mathcal {N}(0,\sigma ^2)\), where the Hamming weight \(\text {HW}\) depends on the width of the representation (8, 32 or 64-bit values) and the noise variance \(\sigma ^2\) affects the side-channel distinguisher accuracy. The accuracy of distinguisher (a) is approximated using the 3-\(\sigma \) rule \(a\simeq {{\,\textrm{erf}\,}}\left( \frac{1}{2\sqrt{2} \sigma }\right) \), where \({{\,\textrm{erf}\,}}\) is the Gauss error function [56]. We have also taken into account the parity of \(\varvec{s}^{*}\) to correct wrong estimated values of \(\widetilde{\varvec{s}}.\) The noise model for \(\sigma =0.25,\sigma =0,5\) and two parameter sets for the Classic McEliece KEM are illustrated in Fig. 3. The plotted distribution is a truncated shifted distribution since, i) one out of two values are equal to 0 (correction with respect to the parity of the binary syndrome) and ii) shifted depending on the width of the representation and the values of \(\sigma \).

Fig. 4
figure 4

Binomial model for simulated noise (\(\sigma =0.5\)) in the case of the Classic McEliece of parameters \(n=8192,t=128\)

To determine the closest distribution to the simulated noise, we have first plotted in Fig. 4a a possible interval of values d where the most probable \(\mathcal {B}(d,0.5)\) could be. Then we have computed the Euclidean distance between the simulated noise distribution and \(\mathcal {B}(2d,0.5)-d\) for \(d=5..100.\) The sequence of distances is decreasing from \(d=5\) to \(d=40\) where the minimum is reached (the distance for \(d=40\) equals 0.004), and then increasing.

Table 4 Closest binomial distributions to simulated noise. Each value represents the best value for the parameter d in \(\mathcal {B}(2d,0.5)-d\)

In Table 4, we have computed the "best" (the closest w.r.t. the Euclidean distance) binomial distributions for the simulated noise. Light green signifies values of d close to t/8, while dark colors indicates large values of d, typically larger than 3t. As expected, the value of d is increasing with \(\sigma \) and decreasing with w. The first parameter (\(\sigma \)) induces errors in the estimation of the intermediate weights which obviously affects negatively the parameter d. We can see that in Table 4a where we have computed the values for values of \(\sigma \) up to 0.75 (extremely noisy setting). In real situations [16] the largest values do not exceed \(\sigma =0.2.\) The second parameter (w) has a converse influence on d, since the larger the width of the registers the lower the noise level. This mainly comes from the fact that when the registers are large there are fewer blocks n/w on which the noise gets accumulated. Hence, there is a smaller influence in the intermediate estimated values that comes from this side. Keeping all these in mind, we see that light green colors are predominating in the small \(\sigma \) and large w region. For realistic scenarios in Table 4b the first two lines in each parameter set shows that \(d<t/4\) for all w.

5.2 Success probability and ratio of syndrome entries

The following experiments look at the number of syndrome entries required to bring \(t-\delta \) ones in the first \(n-k\) positions, as dictated by the ISD method. Results are shown in Fig. 5, for both the Classic McEliece and the BIKE cryptosystems. Let us explain the meaning of the plots, when these are read horizontally. One way this could be read is as the weight of solutions retrieved by the ISD-score decoder with probability 1. The green stripe represents the region corresponding to possible values of \(\delta \). The value of \(\delta \) for the \([t-\delta ; t]\) interval is lower for the BIKE cryptosystem since it comes with much larger values of n, making the exhaustive search for the correct permutation much more costly. Conversely, we allow for \(\delta =3\) in the case of Classic McEliece since the n values are smaller. For example, when \(n=8192\) and noise level equal to t we can hope to retrieve solutions of weight at most 122 (which is smaller than the proposed parameters), while for the same length and noise smaller than t/2 we can retrieve any solution of weight at most 128 using the ISD-score decoder using \(\delta =3\), or equivalently solutions of weight 125 using the Prange-score decoder. To summarize, except for the case \(n=8192\) with noise levels strictly greater than t/2, all the plots suggest that the ISD-score decoder is able to retrieve with high probability a valid solution of weight t in presence of noise.

Fig. 5
figure 5

Number of ones in the first \(n-k\) positions for some of the Classic McEliece and BIKE sets of parameters and different levels of a centered binomial noise

We can also read the plots vertically. This gives us the ratio of syndrome entries required to find a solution of given weight with high probability. The abscissa of the points of intersection between the curves and the green stripe gives minimum percentage of syndrome entries required in the ISD-score decoder to successfully retrieve a valid solution of weight t. For the BIKE cryptosystem, the ratio of syndrome entries required to bring at least \(t-1\) ones in the first \(n-k\) positions ranges from 4.75% to 6.5%. For the Classic McEliece cryptosystem, the ratio of syndrome entries required to bring at least \(t-3\) ones in the first \(n-k\) positions ranges from 48% to 62%. We have also computed the best theoretical lower bound we could hope for, i.e., the percentage of syndrome entries should be at least \(\frac{2(1+c)t}{n-k}\ln \frac{t}{\delta +1}\). When comparing the experimental results shown in Fig. 5 and Table 5, we observe that theoretical values are around 10% smaller than the experimental values.

Table 5 Theoretical lower bound on the ratio of syndrome entries necessary for the ISD-score decoder

To verify that the public parity-check matrix is close to a Bernoulli matrix, we verify the bias of the coefficients. We use a \(\chi ^2\) test with the null hypothesis that the distribution of 1 and 0’s follows a uniform distribution. For all sets of parameters, we obtain a p-value higher than 0.999, allowing us to accept the null hypothesis. We also check that the distribution of transitions between the coefficients follow a uniform distribution, using a \(\chi ^2\) test and uniform distribution as null hypothesis. We obtain a p-value higher than 0.999, allowing us to accept the null hypothesis.

5.3 ILP solver and ISD-score decoder

Percentage of required entries To compare the ILP solver with the ISD-score decoder we used the parameters for the Classic McEliece proposal. We decided to consider only the Classic McEliece because the execution time of the ILP solver for the smallest parameters of BIKE exceeded tens of minutes for a single instance of the \(\mathbb {N}-\textsf{SDP}\). Obtaining in a reasonable time a solid statistical evidence of the performance of the ILP solver for BIKE, would assume a much more optimized implementation of the solver, which is not the main purpose of this article. The results for the ILP solver in the noiseless scenario are given in Fig. 6a. The success rate is computed for ten evenly spaced ratios ranging from 1 to 100%.

Fig. 6
figure 6

ILP and ISD-score decoder performance for \(\mathbb {N}-\textsf{SDP}\)

We observe that the behavior is the same for all sets of parameters. When considering 30% of syndrome entries, the ILP solver failed at recovering the error vector ten times out of ten. Conversely, when considering 40% of syndrome entries, the ILP solver succeeded at recovering the error vector ten times out of ten. Hence, the main drawback of the ILP solver, when compared to the ISD-score decoder, is that the ILP cannot be used when only a small percentage of syndrome entries are known.

Noisy setting In a noisy setting, the differences between the ILP solver and the ISD-score decoder is even more dramatic. Indeed, the ILP solver either succeeds in finding a valid solution, with t ones in the first t positions, or it fails. Conversely, the ISD-score decoder succeeds if \(t-\delta \) ones are in the first \((n-k)\) positions, providing a much larger margin in the noisy setting.

Eventually, the permutation returned by the ISD-score decoder is always better than a random permutation. Therefore, one can always resort to exhaustive search afterwards.

Computation time When comparing the time required by the two algorithms for retrieving a valid solution, we notice a significant gap between the two algorithms. From Fig. 6b we can see that it takes less than 0.1 s for the ISD-score decoder, while for the ILP it takes at least 10 s for any of the parameters of the Classic McEliece scheme. Broadly speaking, the ILP solver is three orders of magnitude slower than the ISD-score decoder.

6 Conclusion

This article evaluated the efficiency of the score decoder for integer syndrome decoding in the presence of noise. We proved that, even in the presence of noise, this decoder is indeed able to successfully bring \(t-\delta \) ones in the first \(n-k\) positions, as required by the ISD-based methods. We then experimentally validate this capability considering the parameter sets of two post-quantum cryptosystems, Classic McEliece and BIKE. Future works could investigate other types of noise, improve the efficiency of the decoder, or consider other type of distributions. For example, LDPC or MDPC parity-check matrices offer interesting results in simulations, fact that opens the following question. What is the influence of the matrix sparsity on the success probability?