Keywords

1 Introduction

Since Shor’s discovery  [27] of a polynomial-time quantum algorithm for factoring integers and solving discrete logarithms, there has been a substantial amount of research on quantum computers. If large-scale quantum computers are ever built, they will be able to break many of the public-key cryptosystems currently in use. This would gravely undermine the integrity and confidentiality of our current communications infrastructure on the Internet and elsewhere.

In response, the National Institute of Standards and Technology (NIST) initiated a process  [1] to solicit, evaluate, and standardize one or more quantum-resistant, public-key cryptographic algorithms. This process began in late 2017 with 69 submissions from around the world of post-quantum key-establishment mechanisms or KEMs (resp. public-key encryption schemes or PKEs), and digital signature algorithms. In early 2019, the list of candidates was cut from 69 to 26 (17 of which are PKEs or KEMs), and the 2nd Round of the competition began  [2]. The conclusion of Round 2 is now rapidly approaching.

LEDAcrypt [16] is one of the 17 remaining candidates for standardization as a post-quantum PKE or KEM scheme. It is based on the seminal works of McEliece  [20] in 1978 and Niederreiter  [23] in 1986, which are based on the NP-complete problem of decoding an arbitrary linear binary code  [5]. More precisely, LEDAcrypt is composed of a PKE scheme based on McEliece but instantiated with a particular type of codes (called QC-LDPC) and a KEM in the variant style of Niederreiter. The specific origins of LEDAcrypt – the idea of using QC-LDPC codes with the McEliece paradigm – dates back a dozen years to  [15].

At a very high level, the private key of LEDAcrypt is a pair of binary matrices H and Q,  where H is a sparse, quasi-cyclic, parity-check matrix of dimension \(p\times p\cdot n_0\) for a given QC-LDPC code and where Q is a random, sparse, quasi-cyclic matrix of dimension \(p\cdot n_0\times p\cdot n_0\). Here p is a moderately large prime and \(n_0\) is a small constant. The intermediate matrix \(L = [L_0 | ... | L_{n_0-1}] = H\cdot Q\) is formed by matrix multiplication. The public key M is then constructed from L by multiplying each of the \(L_i\) by \(L_{n_0-1}^{-1}.\) Given this key pair, information can be encoded into codeword vectors, then perturbed by random error-vectors of a low Hamming weight.Footnote 1

Security essentially relies on the assumption that it is difficult to recover the originally-encoded information from the perturbed codeword unless a party possesses the factorization of the public key as H and Q. To recover such matrices (or, equivalently, their product) one must find low-weight codewords in the public code (or in its dual) which, again, is a well-known NP-complete problem  [5]. State-of-the-art algorithms to solve this problem are known as Information Set Decoding (ISD), and their expected computational complexity is indeed used as a design criteria for LEDAcrypt parameters.

The LEDAcrypt submission package in the 2nd Round of NIST’s PQC process provides a careful description of the algorithm’s history and specific design, a variety of concrete parameters sets tailored to NIST’s security levels (claiming approximately 128-bit, 192-bit, and 256-bit security, under either IND-CPA or IND-CCA attacks), and a reference implementation in-code.

1.1 Our Results

In this work, we provide a novel, concrete cryptanalysis of LEDAcrypt. Note that, in LEDAcrypt design procedure, the time complexity of ISD algorithms is derived by assuming that the searched codewords are uniformly distributed over the set of all n-uples of fixed weight. However, as we show in Sect. 3, for LEDAcrypt schemes this assumption does not hold, since it is possible to identify many families of secret keys, i.e., matrices H and Q, for which the rows of \(L=HQ\) (which represent low weight codewords in the dual code) are characterized by a strong bias in the distribution of set bits. We define such keys as weak since, intuitively, in such a case an ISD algorithm can be strongly improved by taking into account the precise structure of the searched codeword. As a direct evidence, in Sect. 4 we consider a moderately-sized, very weak class of keys, which can be recovered with substantially less computational effort than expected. This is a major, practical break of the LEDAcrypt cryptosystem, which is encapsulated in the following theorem.

Theorem 1.1 (Section 4)

There is an algorithm that costs the same as \(2^{49.22}\) AES-256 operations and recovers 1 in \(2^{47.72}\) of LEDAcrypt’s Category 5 (i.e. claimed 256-bit-secure) ephemeral/IND-CPA keys.

Similarly, there is an algorithm that costs the same as \(2^{57.50}\) AES-256 operations and recovers 1 in \(2^{51.59}\) of LEDAcrypt’s Category 5 (i.e. claimed 256-bit-secure) long-term/IND-CCA keys.

While most key-recovery algorithms can exchange computational time spent vs. fraction of the key space recovered, this trade-off will generally be 1-to-1 against a secure cryptosystem. (In particular this trade off is 1-to-1 for the AES cryptosystem which is used to define the NIST security strength categories for LEDAcrypt’s parameter sets.) However, we note in the above that both \(49.22+47.72 = 96.94 \ll 256\) and \(57.49+51.59 = 109.08 \ll 256,\) making this attack quite significant. Additionally, we note that this class of very weak keys is present in every parameter set of LEDAcrypt.

While the existence of classes of imperfect keys is a serious concern, one might ask:

Is it possible to identify such keys during KeyGen, reject them, and thereby save the scheme’s design?

We are able to answer this in the negative.

Indeed, as we demonstrate in Sect. 3, the bias in the distribution of set bits in L, which is at the basis of our attack, is intrinsic in the scheme’s design. Our results clearly show that the existence of weaker-than-expected keys in LEDAcrypt is fundamental in the system’s formulation and cannot be avoided without a major re-design of the cryptosystem.

Finally, we apply our new attack ideas to attempting key recovery without considering a weak key notion. Here we analyze the asymptotic complexity of attacking all LEDAcrypt keys.

Theorem 1.2 (Section 5)

The asymptotic complexity of ISD using an appropriate choice of structured information sets, when attacking all LEDAcrypt keys in the worst case, is \(\exp (\tilde{O}(p^{\frac{1}{4}})).\)

This gives a significant asymptotic speed-up over running ISD with uniformly random information sets, which costs \(\exp (\tilde{O}(p^{\frac{1}{2}})).\) We note that simply enumerating all possible values of H and Q actually leads to an attack running in time \(\exp (\tilde{O}(p^{\frac{1}{4}})),\) and indeed similar attacks were considered in LEDAcrypt’s submission documents for the NIST PQC process. However, this type of attack had worse concrete complexity than ordinary ISD with uniformly random information sets for all of the 2nd Round parameter sets.

1.2 Technical Overview of Our New Attacks

Basic Approach: Exploiting the Product Structure. The typical approach to recovering keys for LEDAcrypt-like schemes is to use ordinary ISD algorithms, a class of techniques which can be used to search for low weight codewords in an arbitrary code. Generally speaking, these algorithms symbolically consider a row of an unknown binary matrix corresponding to the secret key of the scheme. From this row, they randomly choose a set of bit positions uniformly at random in the hope that these bits will (mostly) be zero. If the guess is correct and, additionally, the chosen set is an information set (i.e., a set in which all codewords differ at least in one position), then the key will be recovered with linear algebra computation. If (at least) one of the two requirements on the set is not met, then the procedure resets and guesses again.

For our attacks, intuitively, we will choose the information set in a non-uniform manner in order to increase the probability that the support of HQ,  i.e. the non-zero coefficients of HQ,  is (mostly) contained in the complement of the information set. At a high level, we will guess two sets of polynomials \(H'_0, ..., H'_{n_0-1}\) and \(Q'_{0,0}, ..., Q'_{n_0-1,n_0-1},\) then (interpreting the polynomials as \(p\times p\) circulant matrices) group them into quasi-cyclic matrices \(H'\) and \(Q'.\) These matrices will be structured analogously to H and Q, but with non-negative coefficients defined over \(\mathbb {Z}[x]/\langle x^p-1 \rangle \) rather than \(\mathbb {F}_2[x]/\langle x^p+1 \rangle \). The hope is that the support of \(H'Q'\) will (mostly) contain the support of HQ. It should be noted that a sufficient condition for this to be the case is that the support of \(H'\) contains the support of H and the support of \(Q'\) contains the support of Q. Assuming the Hamming weight of \(H'Q'\) (interpreted as a coefficient vector) is chosen to be approximately W, then the information set can be chosen as the complement of the support of \(H'Q'\) and properly passed to an ISD subroutine in place of a uniform guess.

Observe that the probability that the supports of \(H'\) and \(Q'\) contain the supports of H and Q,  respectively, is maximized by making the Hamming weight of \(H'\) and \(Q'\) as large as possible while still limiting the Hamming weight of \(H'Q'\) to W. An initial intuition is that this can be done by choosing the 1-coefficients of the polynomials \(H'_0, ..., H'_{n_0-1}\) and \(Q'_{0,0}, ..., Q'_{n_0-1,n_0-1}\) to be in a single, consecutive chunk. For example, by choosing the Hamming weight of the polynomials (before multiplication) as some value \(B \ll W,\) we can take \(H'_0 = x^a + x^{a+1} + ... + x^{a+B-1}\) and \(Q'_{0,0} = x^c + x^{c+1}+ ... + x^{c+B-1}.\)

Note that the polynomials \(H'_0\) and \(Q'_{0,0}\) (chosen with consecutive 1-coefficients as above) have Hamming weight B, while their product only has Hamming weight \(2B-1\). In the most general case, uniformly chosen polynomials with Hamming weight B would be expected to have a product with Hamming weight much closer to \(\min (B^2,p)\). That is, for a fixed weight W required of \(H'Q'\) by the ISD subroutine, we can guess around W/2 positions at once in \(H'\) and \(Q'\) respectively instead of something closer to \(\sqrt{W}\) as would be given by a truly uniform choice of information set. As a result, each individual guess of \(H'\) and \(Q'\) that’s “close” to this outline of our intuition will be more rewarding for searching the keyspace than the “typical” case of uniformly guessing information sets.

This constitutes the core intuition for our attacks against LEDAcrypt, but additional considerations are required in order to make the attacks practically effective (particularly when concrete parameters are considered). We enumerate a few of these observations next.

Different Ring Representations. The idea of choosing the polynomials within \(H'\) and \(Q'\) with consecutive nonzero coefficients makes each iteration of an information set decoding algorithm using such an \(H'\) and \(Q'\) much more effective than an iteration with a random information set. However there is only a limited number of successful information sets with this form. We can vastly increase our range of options by observing that the ring \(\mathbb {F}_2[x]/\langle x^p+1 \rangle \) has \(p-1\) isomorphic representations which can be mapped to one another by the isomoprhism \(f(x)\rightarrow f(x^{\alpha })\). This allows us many more equally efficient choices of the information set, since rather than restricting our choices to have polynomials \(H'_0\) and \(Q'_{0,0}\) with consecutive ones in the standard ring representation, we have the freedom to choose them with consecutive ones in any ring representation (provided the same representation is used for \(H'_0\) and \(Q'_{0,0}\).)

Equivalent Keys. For each public key of LEDAcrypt, there exist many choices of private keys that produce the same public key. In particular, the same public key \(M=(L_{n_0-1})^{-1}L\) produced by the private key

$$\begin{aligned} H=[H_0, H_1, \cdots , H_{n_0-1}], \end{aligned}$$
$$ Q = \begin{bmatrix} Q_{0,0} &{} Q_{0,1} &{} \cdots &{} Q_{0,n_0-1} \\ Q_{1,0} &{} Q_{1,1} &{} \cdots &{} Q_{1,n_0-1} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ Q_{n_0-1, 0} &{} Q_{n_0-1,1} &{} \cdots &{} Q_{n_0-1,n_0-1} \end{bmatrix}; $$

would also be produced by any private key of the form

$$\begin{aligned} H'=[x^{a_0}H_0, x^{a_1}H_1, \cdots , x^{a_{n_0-1}}H_{n_0-1}], \end{aligned}$$
$$ Q' = \begin{bmatrix} x^{b-a_0}Q_{0,0} &{} x^{b-a_0}Q_{0,1} &{} \cdots &{} x^{b-a_0}Q_{0,n_0-1} \\ x^{b-a_1}Q_{1,0} &{} x^{b-a_1}Q_{1,1} &{} \cdots &{} x^{b-a_1}Q_{1,n_0-1} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ x^{b-a_{n_0}}Q_{n_0-1, 0} &{} x^{b-a_{n_0}}Q_{n_0-1,1} &{} \cdots &{} x^{b-a_{n_0}}Q_{n_0-1,n_0-1} \end{bmatrix}; $$

for any integers \(0< a_i,b < p, i\in \{0, \ldots , n_0-1\}\). These \(p^{n_0+1}\) equivalent keys improve the success probability of key recovery attacks as detailed in the following sections.

Different Degree Constraints for \(H'\) and \(Q'\). While we have so far described \(H'\) and \(Q'\) as having the same Hamming weight B,  this does not necessarily need to be the case. In fact, there are many, equivalent choices of \(H'\) and \(Q'\) which produce the same product \(H'Q'\) based on this observation. For example, the product of

$$\begin{aligned}&H'_0 = x^a + x^{a+1} + ... + x^{a+B-1}\\&Q'_{0,0} = x^c + x^{c+1}+ ... + x^{c+B-1} \end{aligned}$$

is identical to the product of

$$\begin{aligned}&H'_0 = x^a + x^{a+1} + ... + x^{a+B-1-\delta }\\&Q'_{0,0} = x^c + x^{c+1}+ ... + x^{c+B-1+\delta } \end{aligned}$$

for any integer \(-B< \delta < B.\) More generally, this relationship (that if \(H'\) shrinks and \(Q'\) proportionally grows, or vice versa, then the product \(H'Q'\) is the same) is independently true for any set of \(\{H'_i, Q'_{i, 0}, ..., Q'_{i, n_0-1}\}\) for \(i\in \{0, ..., n_0-1\}.\)

Attacks for \(n_0 = 2\) Imply Similar-Cost Attacks for \(n_0 > 2\). Our attacks are more easily described (and more effective) in the case \(n_0 = 2.\) In this case, we apply ISD to find low-weight codewords in the row space of the public key \([M_0\;|\; M_1]\) to recover a viable secret key for the system. Naively extending this approach for the case \(n_0 > 2\) to the entire public key \([M_0\;|\;...\;|\; M_{n_0}]\) requires constraints on the support of \(n_0 + n_0^2\) polynomials (\(n_0\) polynomials corresponding to \(H'\) and \(n_0^2\) polynomials corresponding to \(Q'\)), so the overall work in the attack would increase quadratically as \(n_0\) grows. However, even in the case that \(n_0 > 2,\) we observe that it is sufficient to find low weight codewords in the row space of only \([M_0\;|\;M_1]\) in order to recover a working key, implying that the attack only needs to consider \(3n_0\) polynomials \(H_i, Q_{j,0}, Q_{k,1}.\) So, increasing \(n_0\) will make all of our attacks less effective, but not substantially so. More importantly, any attack against \(n_0 = 2\) parameters immediately implies a similar-cost attack against parameters with \(n_0 > 2.\) Therefore, we focus on the case of \(n_0 = 2\) in the remainder of this work.

A Continuum of Progressively Less Weak Keys. The attacker can recover keys with the highest probability per iteration of ISD by using a very structured pattern for \(L'.\) As we will see in Sect. 4, in this pattern both \(L'_0\) and \(L'_1\) will have a single contiguous stretch of nonzero coefficients in some ring representation. The result is a practical attack, but one which is only capable of recovering weak keys representing something like 1 in \(2^{40}\) or 1 in \(2^{50}\) private keys.

However, if the attacker is willing to use a more complicated pattern for the information set, using different ring representations for different blocks of \(H'\) and \(Q'\), and possibly having multiple separate stretches of consecutive nonzero coefficients in each block, then the attacker will not recover keys with as high a probability per iteration, but the attack will extend to a broader class of slightly less weak keys. This may for example lead to a somewhat less practical attack that recovers 1 in \(2^{30}\) keys, but still much faster than would be expected given the claimed security strength of the parameter set in question.

We do not analyze the multitude of possible cases here, but we show they must necessarily exist in Sect. 3 by demonstrating that bias is intrinsically present throughout the LEDAcrypt key space.

Improvements to Average-Case Key Recovery. In Sect. 5 we will take the continuum of progressively weaker keys to its logical extreme. We show that the attacks in this paper are asymptotically stronger than the standard attacks not just for weak keys, but for all keys.

As we move away from the simpler information set patterns used on the weakest keys, the analysis becomes more difficult. To fully quantify the impact of our attack on average keys would require extensive case analysis of all scenarios that might lead to a successful key recovery given a particular distribution of information sets used by the attacker, which we leave for future work.

1.3 Related Work

The main attack strategies against cryptosystems based on QC-LDPC codes are known as information set decoding (ISD) algorithms. These algorithms are also applicable to a variety of other code-based cryptosystems including the NIST 2nd round candidates BIKE [22], HQC [8], Classic McEliece [9], and NTS-KEM [17]. Initiated by Prange  [25] in 1962, these algorithms have since experienced substantial improvements during the years  [4, 7, 12, 13, 18, 19, 28]. ISD algorithms can also be used to find low-weight codewords in a given, arbitrary code. ISD main approach is that of guessing a set of positions where such codewords contain a very low number of set symbols; when this set is actually an information set, then linear algebra computations yield the searched codeword (see [3], Appendix A.3). ISD time complexity is estimated as the product between the expected number of required information set guesses and the cost of testing each set. Advanced ISD algorithms improve Prange’s basic idea by reducing the average number of required guesses, at the cost of increasing the time complexity of the testing phase. Quantum ISD algorithms take into account Grover’s algorithm  [10] to quadratically accelerate the guessing phase. A quantum version of Prange’s algorithm  [6] was presented in 2010, while quantum versions of more advanced ISD algorithms were presented in 2017 [11].

In the case of QC-MDPC and QC-LDPC codes, ISD key recovery attacks can get a speed-up which is polynomial in the size of the circulant blocks  [26]. This gain is due to the fact that there are more than one sparse vector in the row space of the parity check matrix, and no modification to the standard ISD algorithms is required to obtain this speed-up. Another example of gains due to the QC structure is that of  [14] which, however, works only in the case of the circulant size having a power of 2 among its factors (which is not the case we consider here).

ISD can generally be described as a technique for finding low Hamming-weight codewords in a linear code. Most ISD algorithms are designed to assume that the low-weight codewords are random aside from their sparsity. However, in some cryptosystems that can be cryptanalyzed using ISD, these short codewords are not random in this respect, and modified versions of ISD have been used to break these schemes [21, 24]. Our paper can be seen as a continuation of this line of work, since unlike the other 2nd Round NIST candidates where ISD is cryptanalytically relevant, the sparse codewords which lead to a key recovery of LEDAcrypt are not simply random sparse vectors, but have additional structure due to the product structure of LEDAcrypt’s private key.

2 Preliminaries

2.1 Notation

Throughout this work, we denote the finite field with 2 elements by \(\mathbb {F}_2.\) We denote the Hamming weight of a vector a (or a polynomial a,  viewed in terms of its coefficient vector) as \(\mathrm {wt}(a).\) For a polynomial a we use the representation \(a = \sum _{i=0}^{p-1}a_ix^i\), and call \(a_i\) its i-th coefficient. We denote the support – i.e. the non-zero coordinates – of a vector (or polynomial) a by \(\mathrm {S}(a).\) In similar way, we define the antisupport of a, and denote it as \(\bar{\mathrm S}(a)\), as the set of positions i such that \(a_i = 0\). Given a polynomial a and a set J, we denote as \(a|_J\) the set of coefficients of a that are indexed by J. Given \(\pi \), a permutation of \(\{0,\cdots ,n-1\}\), we represent it as the ordered set of integers \(\{\ell _0,\cdots ,\ell _{n-1}\}\), such that \(\pi \) places \(\ell _i\) in position i. For a length-n vector a, \(\pi (a)\) denotes the action of \(\pi \) on a, i.e., the vector whose i-th entry is \(a_{\ell _i}\). For a probability distribution \( \mathcal {D}\), we write \(X\sim \mathcal {D}\) if X is distributed according to \(\mathcal {D}\).

2.2 Parameters

The parameter sets of LEDAcrypt that we explicitly consider in this work are shown in Table 1 (although similar forms of our results hold for all parameter sets). We refer the reader to [3], Appendix A.1 for further technical details of the construction.

Table 1. LEDAcrypt parameter sets that we consider in this paper.

3 Existence of Weak Keys in LEDAcrypt

As we have explained in Sect. 1.3, key recovery attacks against cryptosystems based on codes with sparse parity-check matrices can be performed by searching for low weight codewords, either in the code or in its dual. For instance, such codewords in the dual correspond, with overwhelming probability, to the rows of the secret parity-check matrix, of weight \(\omega \ll n\), where n denotes the code length. The most efficient way to solve this problem is to use ISD algorithms. To analyze the efficiency of such attacks, weight-\(\omega \) codewords are normally modeled as independent random variables, sampled according to the uniform distribution of n-uples with weight \(\omega \), which we denote as \(\mathcal {U}_\omega \). At each ISD iteration, the algorithm succeeds if the intersection between the chosen set T and the support of (at least) one of such codewords satisfies some properties. Regardless of the considered ISD variant, this intersection has to be small.

Let \(\epsilon \) be the probability that a single ISD iteration can actually recover a specific codeword of the desired weight. When the code contains M codewords of weight \(\omega \), then the probability that a single ISD iteration can recover any of these codewords is \(1-(1-\epsilon )^M\) which, if \(\epsilon M\ll 1\), can be approximated as \(\epsilon M\). This speed-up in ISD algorithm normally applies to the case of QC codes, where M corresponds to the number of rows in the parity-check matrix (that is, \(M=n-k\)).

In this section we show that the product structure in LEDAcrypt yields to a strong bias in the distribution of set symbols in the rows of the secret parity-check matrix \(L=HQ\). As a consequence, the assumption on the uniform distribution of the searched codewords does not hold anymore, and this opens up for dramatic improvements in ISD algorithms. To provide evidence of this claim we analyze, without loss of generality, a simplified situation. We focus on the case \(n_0 = 2\), and consider the success probability of ISD algorithms when applied on LEDAcrypt schemes, searching for a row of the secret L (say, the first row), with weight \(\omega = 2d_v(m_0+m_1)\).

In this case we expect to have the usual speed-up deriving from the presence of multiple low-weight codewords. However, quantifying this speed-up is not straightforward and requires cumbersome computations, since it also depends on the particular choice of the chosen set in ISD. Thus, to keep the description as general as possible and easy to follow, in this section we only focus on a single row of L. Exact computations for these quantities are performed in Sects. 4 and 5. Furthermore, we only consider the probability that a chosen set T does not overlap with the support of the searched codeword. With this choice, we essentially capture the essence of all ISD algorithms. An analysis on a specific variant, with optimized parameters and requirements on the chosen set, might significantly improve the results of this section which, however, are already significant for the security of LEDAcrypt schemes.

Let \(T\subseteq \{0,\cdots ,n-1\}\) be a set of dimension k: for \(a\sim \mathcal {U}_\omega \), we have

$$\mathrm {Pr}\left[ \left. T\cap \mathrm S(a)=\varnothing \right| a\sim \mathcal {U}_\omega \right] = \frac{\left( {\begin{array}{c}n-\omega \\ k\end{array}}\right) }{\left( {\begin{array}{c}n\\ k\end{array}}\right) }.$$

Note that this probability does not depend on the particular choice of T, but just on its size. When a purely random QC-MDPC code is used, as in BIKE [22], the first row of the secret parity-check matrix is well modeled as a random sample from \(\mathcal {U}_\omega \). The previous probability can also be described as the ratio between the number of n-uples of weight \(\omega \) whose support is disjoint with T, and that of all possible samples from \(\mathcal {U}_\omega \); in schemes such as BIKE, this also corresponds to the probability that a secret key satisfies the requirement on an arbitrary set T.

As we show in the remainder of this section, in LEDAcrypt such a fraction can actually be made significantly larger, when T is properly chosen. To each choice, we can then associate a family of weak keys, that is, secret keys for which the corresponding first row of L does not overlap with T. We formally define the notion of weak keys in the following.

Definition 3.1

Let \(\mathcal {K}\) be the public key space of LEDAcrypt with parameters \(n_0, p, d_v, m_0, m_1\). Let \(T\subseteq \{0, \cdots , n_0p-1\}\) of cardinality \(n-k=p\) and \(\mathcal {W} \subseteq \mathcal {K}\) be the set of all public keys corresponding to secret keys \(sk = (H,Q)\) such that the first row in the corresponding \(L=HQ\) has support that is disjoint with T. Finally, we define \(\omega = n_0(m_0+m_1)d_v\) and \(\mathcal {U}_{\omega }\) as the uniform distribution of \((n_0p)\)-tuples with weight \(\omega \). Then, we say that \(\mathcal {W}\) is a set of weak-keys if

$$\begin{aligned} \mathrm {Pr}\left[ pk\in \mathcal {W}|(sk,pk)\leftarrow \textsf {KeyGen}()\right] \nonumber&\gg \mathrm {Pr}\left[ T\cap \mathrm S(a)=\varnothing |a\sim \mathcal {U}_\omega \right] = \frac{\left( {\begin{array}{c}n_0p-\omega \\ p\end{array}}\right) }{\left( {\begin{array}{c}n_0p\\ p\end{array}}\right) }. \end{aligned}$$

Roughly speaking, we have a family of weak keys when, for a specific set choice, the number of keys meeting the requirement on the support is significantly larger than the one that we would have for the uniform case. Indeed, for all such keys, we will have a strongly bias in the matrix L, since null positions can be guessed with high probability; as we describe in Sects. 4 and 5, this fact opens up for strong attacks against very large portions of keys.

3.1 Preliminary Considerations on Sparse Polynomials Multiplications

We now recall some basic fact about polynomial multiplication in the rings \(\mathbb {F}_2[x]/\langle x^p + 1 \rangle \) and \(\mathbb {Z}[x]/\langle x^p - 1 \rangle \), which will be useful for our treatment. Let \(a,b\in \mathbb {F}_2[x]/\langle x^p + 1 \rangle \) and \(c = ab\); we then have

$$\begin{aligned} c_i = \bigoplus _{z=0}^{p-1}a_z b_{z'},z' = i-z\mod p, \end{aligned}$$

where the operator \(\bigoplus \) highlights the fact that the sum is performed over \(\mathbb {F}_2\). Taking into account antisupports, we can rewrite the previous equation as

$$\begin{aligned} c_i = \bigoplus _{{\begin{matrix}z\not \in \bar{\mathrm {S}}(a)\\ z'=i-z\mod p,z'\not \in \bar{\mathrm {S}}(b)\end{matrix}}}^{p-1}a_zb_{z'}. \end{aligned}$$
(1)

Let N(abi) denote the set of terms that contribute to the sum in Eq. (1), i.e.

$$\begin{aligned} N(a,b,i) = \left\{ z\text {s.t.}z\not \in \bar{\mathrm {S}}(a)\text {and}i-z\mod p\not \in \bar{\mathrm {S}}(b) \right\} . \end{aligned}$$

We now denote with \(\tilde{a}\) and \(\tilde{b}\) the polynomials obtained by lifting a and b over \(\mathbb {Z}[x]/\langle x^p - 1 \rangle \) i.e., by mapping the coefficients of a and b into \(\{0,1\}\subset \mathbb {Z}\). Let \(\tilde{c} = \tilde{a} \tilde{b}\): we straightforwardly have that \(c\equiv \tilde{c} \mod 2\), \(|N(a,b,i)| = \tilde{c}_i\) and \(\sum _{i=0}^{p-1}\tilde{c}_i = \mathrm {wt}(a) \cdot \mathrm {wt}(b)\). Let \(a'\in \mathbb {Z}[x]/\langle x^p+1\rangle \) with coefficients in \(\{0,1\}\), such that \(\mathrm {S}(a')\supseteq \mathrm {S}(a)\), i.e., such that its support contains that of a (or, in another words, such that its antisupport is contained in that of a); an analogous definition holds for \(b'\). Indeed, we can write \(a' =\tilde{a} + s_a\), where \(s_a\in \mathbb {Z}[x]/\langle x^p + 1 \rangle \) and whose i-th coefficient is equal to 0 if \(a'_i = a_i\), and equal to 1 otherwise; with analogous notation, we can write \(b' =\tilde{b} + s_b\). Then

$$\begin{aligned} c' = a' b'&\nonumber = (\tilde{a} + s_a)(\tilde{b} + s_b) = \tilde{a} \tilde{b} + s_a\tilde{b} + s_b \tilde{a} + s_a s_b = \tilde{c} + s_a\tilde{b} + s_b \tilde{a} + s_a s_b. \end{aligned}$$

Since \(s_a\tilde{b}\), \(s_b \tilde{a}\) and \(s_a s_b\) have all non-negative coefficients, we have

$$\begin{aligned} c_i' \ge \tilde{c}_i = |N(a,b,i)|\ge 0,\forall i\in \{0,\cdots ,p-1\}. \end{aligned}$$
(2)

We now derive some properties that link the coefficients of \(c'\) to those of c; as we show, knowing portions of the antisupports of a and b is enough to gather information about the coefficients in their product.

Lemma 3.2

Let \(a,b\in \mathbb {F}_2[x]/\langle x^p + 1 \rangle \), and \(J_a,J_b\subseteq \{0 , \cdots , p-1\}\) such that \(J_a\supseteq \mathrm {S}(a)\) and \(J_b\supseteq \mathrm {S}(b)\). Let \(a', b'\in \mathbb {Z}[x]/\langle x^p - 1 \rangle \) be the polynomials whose coefficients are null, except for those indexed by \(J_a\) and \(J_b\), respectively, which are set as 1. Let \(c=ab\in \mathbb {F}_2[x]/\langle x^p + 1 \rangle \) and \(c' = a' b'\in \mathbb {Z}[x]/\langle x^p - 1 \rangle \); then

$$c'_i = 0\implies c_i=0.$$

Proof

The result immediately follows from (2) by considering that if \( c'_i=0\) then necessarily \(|N(a,b,i)|= 0\) and, subsequently, \(c_i = 0\).    \(\square \)

When the weight of \(c=ab\) is maximum, i.e., equal to \(\mathrm {wt}(a)\cdot \mathrm {wt}(b)\), the probability to have null coefficients in \(c_i\) can be related to the coefficients in \(c'_i\); in analogous way, we can also derive the probability that several bits are simultaneously null. These relations are formalized in the following Lemma.

Lemma 3.3

Let \(a,b\in \mathbb {F}_2[x]/\langle x^p + 1 \rangle \), with respective weights \(\omega _a\) and \(\omega _b\), such that \(\omega = \omega _a\omega _b\le p\), and \(c=ab\) has weight \(\omega \). Let \(J_a,J_b\subseteq \{0 , \cdots , p-1\}\) such that \(J_a\supseteq \mathrm {S}(a)\) and \(J_b\supseteq \mathrm {S}(b)\). Let \(a', b'\in \mathbb {Z}[x]/\langle x^p - 1 \rangle \) be the polynomials whose coefficients are null, except for those indexed by \(J_a\) and \(J_b\), respectively, which are set as 1; finally, let \(M= |J_a|\cdot |J_b|\).

  1. i)

    Let \(c'_i\) be the i-th coefficient of \(c' = a' b'\); then

    $$\begin{aligned} \mathrm {Pr}\left[ c_i = 0|c_i'\right] \nonumber&= \gamma (M,\omega , c_i') = \bigg (1+\omega \cdot \frac{c_i'}{M + 1 - \omega - c_i'}\bigg )^{-1}. \end{aligned}$$
  2. ii)

    For \(V = \{v_0,\cdots , v_{t-1}\}\subseteq \{0 , \cdots , p-1\}\), we have

$$\begin{aligned} \mathrm {Pr}\left[ \mathrm {wt}(c|_V)=0\mid c'\right] = \zeta (V, c', \omega ) = \prod _{ \ell = 0}^{t-1}\gamma \big (M-{\textstyle \sum _{j=0}^{\ell -1} c_{v_j}'},\omega , c_{v_\ell }'\big ). \end{aligned}$$

Proof

The results follow from a combinatorial argument. See [3], Appendix B.3 for details.

   \(\square \)

3.2 Identifying Families of Weak Keys

We are now ready to use the results presented in the previous section to describe how, in LEDAcrypt, families of weak keys as in Definition 3.1 can be identified. We base our strategy on the results of Lemmas 3.2 and 3.3. Briefly, we guess “containers" for each polynomial in the secret key, i.e., polynomials over \(\mathbb {Z}[x]/\langle x^p - 1 \rangle \) whose support contains that of the corresponding polynomials in \(\mathbb {F}_2[x]/\langle x^p + 1 \rangle \). We then combine such containers, to find positions that, with high probability, do not point at set coefficient in the polynomials in \(L=HQ\). Assuming that the initial choice for the containers is right, we can then use the results of Lemmas 3.2 and 3.3 to determine such positions. For the sake of simplicity, and without loss of generality, we describe our ideas for the practical case of \(n_0=2\).

Operatively, to build a set T defining an eventual set of weak keys, we rely on the following procedure.

  1. 1.

    Consider sets \(J_{H_i}\) such that \(J_{H_i}\supseteq \mathrm {S}(H_i)\), for \(i=0,1\); the cardinality of \(J_{H_i}\) is denoted as \(B_{H_i}\). In analogous way, define sets \(J_{Q_{i,j}}\), for \(i=0,1\) and \(j=0,1\), with cardinalities \(B_{Q_{i,j}}\).

  2. 2.

    To each set \(J_{H_i}\), associate a polynomial \(H'_i\in \mathbb {Z}[x]/\langle x^p-1 \rangle \), taking values in \(\{0,1\}\) and whose support corresponds to \(J_{H_i}\); in analogous way, construct polynomials \(J_{Q_{i,j}}\) from the sets \(J_{Q_{i,j}}\). Compute

    $$L'_{i,j} = H'_j Q'_{j,i}\in \mathbb {Z}[x]/\langle x^p - 1 \rangle ,(i,j)\in \{0,1\}^2.$$
  3. 3.

    Compute

    $$\begin{aligned} L'_i \nonumber&= L'_{i,0} + L'_{i,1} = H'_{0}Q'_{0,i} + H'_{1}Q'_{1,i}\in \mathbb {Z}[x]/\langle x^p - 1 \rangle . \end{aligned}$$

    Let \(\pi _i\), with \(i=0,1\), be a permutation such that the coefficients of \(\pi _i\big (L'_i\big )\) are in non decreasing order. Group the first \(\left\lfloor \frac{p}{2}\right\rfloor \) entries of \(\pi _0\) in a set \(T_0\), and the first \(\left\lceil \frac{p}{2}\right\rceil \) ones of \(\pi _1\) in a set \(T_1\). Define T as \(T=T_0\cup \{\left. p+\ell \right| \ell \in T_1\}.\)

A visual representation of the above constructive method to search for weak keys is described in [3], Appendix C.

Essentially, our proposed procedure to find families of weak keys starts from the sets \(J_{H_i}\) and \(J_{Q_{i,j}}\), which we think of as “containers” for the secret key, i.e., sets containing the support of the corresponding polynomial in the secret key. Their products yield polynomials \(L'_{i,j}\), which are containers for the products \(H_iQ_{j,i}\). Because of the maximum weight requirement in LEDAcrypt key generation, each \(L'_{i,j}\) matches the hypothesis required by the Lemma 3.3: the lowest entries in \(L'_{i,j}\) correspond to the coefficients that, with the highest probability, are null in \(H'_i Q_{j,i}'\). We remark that, because of Lemma 3.2, a null coefficient in \(L'_{i,j}\) means that the corresponding coefficients in \(H_jQ_{j,i}\) must be null. Finally, we need to combine the coefficients of the polynomials \(L'_{i,j}\), to identify positions that are very likely to be null in each \(L_i\). The approach we consider consists in choosing the positions that correspond to coefficients with minimum values in the sums \(L'_{i,0}+L'_{i,1}\). This simple criterion is likely to be not optimal, but allows to avoid cumbersome notation and computations; furthermore, as we show next, it already detects significantly large families of weak keys.

The number of secret keys that meet the requirements on T, i.e., keys leading to polynomials \(L_0\) and \(L_1\) that do not overlap with the chosen sets \(T_0\) and \(T_1\), respectively, clearly depends on the particular choice for the containers. In the remainder of this section, we describe how such a quantity can be estimated. For the sake of simplicity, we analyze the case in which the starting sets for the containers have constant size, i.e., \(B_{H_i} = B_H\) and \(B_{Q_{i,j}} = B_Q\), for all i and j; furthermore, we choose \(J_{H_{0}} = J_{H_{1}}\), \(J_{Q_{0,0}} = J_{Q_{1,1}}\) and \(J_{Q_{1,0}} = J_{Q_{0,1}}\).

First of all, let \(\mathcal {J}\) be the set of secret keys whose polynomials are contained in the sets \(J_{H_i}\) and \(J_{Q_{i,j}}\); the cardinality of this set can be estimated as

$$\begin{aligned} |\mathcal {J}| = \eta \bigg (\left( {\begin{array}{c}B_H\\ d_v\end{array}}\right) \left( {\begin{array}{c}B_Q\\ m_0\end{array}}\right) \left( {\begin{array}{c}B_Q\\ m_1\end{array}}\right) \bigg )^2, \end{aligned}$$

where \(\eta \) is the acceptance ratio in key generation, i.e., the probability that a random choice of matrices H and Q leads to a matrix L with full weight.

We now estimate the number of keys in \(\mathcal {J}\) that produce polynomials \(L_0\) and \(L_1\) corresponding to a correct choice for \(T_0\) and \(T_1\), i.e., such that their supports are disjoint with \(T_0\) and \(T_1\), respectively. For each product \(H_iQ_{i,j}\), we know i) that it has full weight, not larger than p, and ii) that sets \(J_{H_i}\), \(J_{Q_{i,j}}\) are containers for \(H_i\) and \(Q_{i,j}\), respectively. Then, Lemma 3.3 can be used to estimate the portion of valid keys. For instance, we consider the polynomial \(L_0 = H_0Q_{0,0} + H_1Q_{1,0}\): the coefficients that are indexed by \(T_0\) will be null when both the supports of \(H_0Q_{0,0}\) and \(H_1Q_{1,0}\) are disjoint with \(T_0\). If we neglect the fact that these two products are actually correlated (because of the full weight requirement on \(L_0\)), then the probability that \(L_0\) does not overlap with \(T_0\), which we denote as \( \mathrm {Pr}\left[ \texttt {null}(T_0)\right] \), is obtained as

$$\begin{aligned} \mathrm {Pr}\left[ \texttt {null}(T_0)\right] = \zeta \big (T_0,L'_{0,0},m_0d_v\big )\cdot \zeta \big (T_0,L'_{0,1},m_1d_v\big ), \end{aligned}$$

where \(\zeta \) is defined in Lemma 3.3. The above quantity can then be used to estimate the fraction of keys in \(\mathcal {J}\) for which the support of \(L_0\) does not overlap with \(T_0\); we remark that, as highlighted by the above formula, this quantity strongly depends on the choices on \(J_{H_0}\), \(J_{H_1}\), \(J_{Q_{0,0}}\), \(J_{Q_{1,0}}\).

With the same reasoning, and with analogous notation, we compute \( \mathrm {Pr}\left[ \texttt {null}(T_1)\right] \); because of the simplifying restrictions on \(J_{Q_{i,j}}\), this probability is equal to \( \mathrm {Pr}\left[ \texttt {null}(T_0)\right] \).

Then, if we neglect the correlation between \(L_0\) and \(L_1\) (since \(H_0\) and \(H_1\) are involved in the computation of both polynomials), the probability that a random key from \(\mathcal {J}\) is associated to a valid L, i.e., that it leads to polynomials \(L_0\) and \(L_1\) that respectively do not overlap with \(T_0\) and \(T_1\), can be estimated as

$$\begin{aligned} \mathrm {Pr}\left[ \texttt {null}(T)\right] \nonumber&= \mathrm {Pr}\left[ \texttt {null}(T_0)\right] \cdot \mathrm {Pr}\left[ \texttt {null}(T_1)\right] \\&= \big (\mathrm {Pr}\left[ \texttt {null}(T_0)\right] \big )^2 \\&= \bigg (\zeta \big (T_0,L'_{0,0},m_0d_v\big )\cdot \zeta \big (T_0,L'_{0,1},m_1d_v\big )\bigg )^2. \end{aligned}$$

Thus we conclude that the number of keys whose polynomials are contained by the chosen sets, and such that the corresponding L does not overlap with T, can be estimated as \( |\mathcal {J}|\cdot \mathrm {Pr}[\texttt {null}(T)]\).

Then, for the set of secret keys where T does not intercept the first row of L, which we denote with \(\mathcal {W}\), we have

$$\begin{aligned} |\mathcal {W}|\ge |\mathcal {J}|\cdot \mathrm {Pr}[\texttt {null}(T)]. \end{aligned}$$
(3)

The inequality comes from the fact the right term in the above formula only counts keys with polynomials contained by the initially chosen sets; even if such property is not satisfied, it may still happen that the resulting L does not overlap with T (thus, we are underestimating the cardinality of \(\mathcal {W}\)).

3.3 Results

In this section we provide practical examples on choices for containing sets, leading to actual families of weak keys. To this end, we need to define clear criteria on how the sets \(J_{H_i}\) and \(J_{Q_{i,j}}\) can be selected. For the sake of simplicity, we restrict our attention to the cases \(J_{H_0} = J_{H_1} = J_H\) and \(J_{Q_{0,0}} = J_{Q_{0,1}} = J_{Q_{1,0}} = J_{Q_{1,1}} = J_Q\). We here consider two different strategies to pick these sets.

  • Type I: for \(i=0,1\), \(\delta \in \{0 , \cdots , p-1\}\) and \(t \in \{1,\cdots ,p-1\}\), we choose

    $$\begin{aligned} J_{H} = \left\{ \ell t\mod p\left| 0\le \ell \le B_H-1 \right. \right\} ,\\ J_{Q} = \left\{ \delta +\ell t\mod p\left| 0\le \ell \le B_Q-1 \right. \right\} . \end{aligned}$$
  • Type II: for \(i=0,1\), we choose \(J_{H_0} = J_{H_1}\) as the union of disjoint sets, formed by contiguous positions. Analogous choice is adopted for \(J_Q\).

To provide numerical evidences for our analysis, in Fig. 1 we compare the simulated values of \(\mathrm {Pr}[\texttt {null}(T)]\) with the ones obtained with theoretical expression, for parameters of practical interest and for some Types I and II choices. The simulated probabilities have been obtained by generating random secret keys from \(\mathcal {J}\) and, as our results show, are well approximated by the theoretical expression. This shows that Eq. 3 provides a good estimate for the fraction of keys in \(\mathcal {J}\) that meet the requirement on the corresponding set T.

Tables 23 display results testing various weak key families of Type I and II, for two different LEDAcrypt parameters sets. According to the reasoning in the previous section, the values reported in the last column can be considered as a rough (and likely conservative) estimate for the probability that a random key belongs to the corresponding set \(\mathcal {W}\). Our results show that the identified families of keys meet Definition 3.1, so can actually be considered weak.

Fig. 1.
figure 1

Comparison between simulated and theoretical values for \(\mathrm {Pr}[\mathtt {null}]\), for \({p=14939}\), \(d_v=11\), \(m_0 = 4\), \(m_1 = 3\). The values reported in Figure (a) are all referred to the case \(\delta =0\). In Figure (b), the blue curves correspond to the choice \({J_H=J_Q=\{0,\cdots ,1999\}\cup \{\mu ,\cdots ,\mu +1999\}}\), while the red curves correspond to \(J_H=\{0,\cdots ,2499\}\cup \{\mu ,\cdots ,\mu +2499\}\) and \(J_Q=\{0,\cdots ,3999\}\).

Table 2. Fraction of weak keys, for LEDAcrypt instances designed for 128-bit security, with parameters \(n_0 = 2\), \(p=14939\), \(d_v=11\), \(m_0 = 4\), \(m_1 = 3\), for which \(\eta \approx 0.7090\). For this parameter set, probability of randomly guessing a null set of dimension p, in a vector of length 2p and weight \(2(m_0+m_1)d_v\), is \(2^{-154.57}\).
Table 3. Fraction of weak keys, for LEDAcrypt instances designed for 256-bit security, with parameters \(n_0 = 2\), \(p=36877\), \(d_v=11\), \(m_0 = 7\), \(m_1 = 6\), for which \(\eta \approx 0.614\). For this parameter set, probability of randomly guessing a null set of dimension p, in a vector of length 2p and weight \(2(m_0+m_1)d_v\), is \(2^{-286.80}\).

Remark 1

The results we have shown in this section only represent a qualitative evidence of the existence of families of weak keys in LEDAcrypt. There may exist many more families of weak keys, having a complete different structure from the ones we have studied. Additionally, the parameters we have considered for types I and II may not be the optimal ones, but already identify families of weak keys. In the next sections we provide a detailed analysis for families of keys of type I and II, and furthermore specify the actual complexity of a full cryptanalysis exploiting such a key structure.

4 Explicit Attack on the Weakest Class of Keys

In the previous section we described how the product structure in LEDAcrypt leads to an highly biased distribution in set positions in L. As we have hinted, this property may be exploited to improve cryptanalysis techniques based on ISD algorithms. In this section, we present an attack against a class of weak keys in LEDAcrypt’s design. We begin by identifying what appear to be the weakest class of keys (though large enough in number to constitute a serious, practical problem for LEDAcrypt). It is easily seen that the class of keys we consider in this section corresponds to a particular case of type I, introduced in Sect. 3.3. We proceed to provide a simple, single-iteration ISD algorithm to recover these keys, then analyze the fraction of all of LEDAcrypt’s keys that would be recovered by this attack. Afterward, we show how to extend the ISD algorithm to more than one iteration, so as to enlarge the set of keys recovered by a similar enough of effort per key. We conclude by considering the effect of advanced ISD algorithms on the attack as well as the relationship between the rejection sampling step in LEDAcrypt’s \(\mathsf {KeyGen}\) and our restriction to attacking a subspace of the total key space.

4.1 Attacking an Example (sub)class of Ultra-Weak Keys

The simplest and, where it works, most powerful version of the attack dramatically speeds up ISD for a class of ultra-weak keys chosen under parameter sets where \(n_0=2\). One example (sub)class of ultra-weak keys are those keys where the polynomials \(L_0\) and \(L_1\) are of degree at most \(\frac{p}{2}\). Such keys can be found by a single iteration of a very simple ISD algorithm. We describe this simple attack as follows.

The attacker chooses the information set to consist of the last \(\frac{p-1}{2}\) columns of the first block of M and the last \(\frac{p+1}{2}\) columns of the second block. If the key being attacked is one of these weak keys, the attacker can correctly guess the top row of L as being identically zero within the information set and linearly solve for the nonzero linear combination of the rows of M meeting this condition. The cost of the attack is one iteration of an ISD algorithm.

A sufficient condition for this class of weak key to occur is for the polynomials \(H_0\), \(H_1\), \(Q_{0,0}\), \(Q_{0,1}\), \(Q_{1,0}\), and \(Q_{1,1}\) to have degree no more than \(\frac{p}{4}\). Since each of the \(2 m_0 + 2 m_1 +2 d_v\) nonzero coefficients of these polynomials has a \(\frac{1}{4}\) probability of being chosen with degree less than \(\frac{p}{4}\), these weak keys represent at least 1 part in \(4^{2 m_0 + 2 m_1 +2 d_v}\) of the key space.

4.2 Enumerating Ultra-Weak Keys for a Single Information Set

In fact, there are significantly more weak keys than this that can be recovered by the basic, one-iteration ISD algorithm using the information set described above. Intuitively, this is for two reasons:

  1. 1.

    Equivalent keys: There are \(p^2\) private keys, not of this same, basic form, which nonetheless produce the same public key.

  2. 2.

    Different degree constraints: The support of the top row of L will also fall entirely outside the information set if the degree of \(H_0\) is less than \(\frac{p}{4}-\delta \) and the degrees of \(Q_{0,0}\) and \(Q_{0,1}\) are both less than \(\frac{p}{4}+\delta \) for any \(\delta \in \mathbb {Z}\) such that \(-\frac{p}{4}< \delta < \frac{p}{4}\). Likewise for \(H_1\) and \(Q_{1,0}\) and \(Q_{1,1},\) for a total of p keys.

Concretely, we derive the number of distinct private keys that are recovered by the one-iteration ISD algorithm in the following theorem.

Remark 2

There are p columns of each block of M. For the sake of simplicity, instead of referring to pairs of \(\frac{p-1}{2}\) and \(\frac{p+1}{2}\) columns, we instead use \(\frac{p}{2}\) for both cases. This has a negligible effect on our results.

Theorem 4.1

The number of distinct private keys that can be found in a single iteration of the decoding algorithm described above (where the information set is chosen to consist of the last \(\frac{p}{2}\) columns of each block of M) is

$$\begin{aligned} \begin{aligned}&p^3 \cdot \sum _{A_0 = d_v-1}^{\frac{p}{2}} \sum _{A_1 = d_v-1}^{\frac{p}{2}} \left( \left( {\begin{array}{c}A_0 - 1\\ d_v - 2\end{array}}\right) \left( {\begin{array}{c}A_1 - 1\\ d_v - 2\end{array}}\right) \right. \\&\cdot \left( \left( {\begin{array}{c}\frac{p}{2} - A_0 - 2\\ m_0 - 1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_0 - 1\\ m_1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_1 - 1\\ m_1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_1 - 1\\ m_0\end{array}}\right) \right. \\&+ \left. \left( {\begin{array}{c}\frac{p}{2} - A_0 - 1\\ m_0\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_0 - 2\\ m_1-1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_1 - 1\\ m_1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_1 - 1\\ m_0\end{array}}\right) \right. \\&+ \left. \left( {\begin{array}{c}\frac{p}{2} - A_0 - 1\\ m_0\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_0 - 1\\ m_1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_1 - 2\\ m_1-1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_1 - 1\\ m_0\end{array}}\right) \right. \\&+ \left. \left. \left( {\begin{array}{c}\frac{p}{2} - A_0 - 1\\ m_0\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_0 - 1\\ m_1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_1 - 1\\ m_1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_1 - 2\\ m_0-1\end{array}}\right) \right) \right) \\&\cdot \left( 1 - O\left( \frac{m}{p}\right) \right) . \end{aligned} \end{aligned}$$
(4)

Proof

We count the number of ultra-weak keys as follows. By assumption, all nonzero bits in each block of an ultra-weak key are contained in some consecutive stretch of size \(\le \frac{p}{2}\). Thus these ultra-weak keys contain a stretch of at least \(\frac{p}{2}\) zero bits. This property applies directly to the polynomials \(H_0 Q_{0,0}+H_1Q_{1,0}\) and \(H_0 Q_{0,1}+H_1 Q_{1,1}\), and must also hold for \(H_0\) and \(H_1\). We index the number of ultra-weak keys according to the first nonzero coefficient of these polynomials after the stretch of zero bits in cyclic ordering.

We begin by considering HQ though not requiring HQ to have full weight. We are using an information set consisting of the same columns for both \(H_0 Q_{0,0}+H_1Q_{1,0}\) and \(H_0 Q_{0,1}+H_1 Q_{1,1}\). Therefore we count according the first nonzero bit of the sum \(H_0 Q_{0,0}+H_1Q_{1,0} + H_0 Q_{0,1}+H_1 Q_{1,1}\). Let l be the location of the first nonzero bit of this sum.

Let \(j_0, j_1\) be the locations of the first nonzero bit of \(H_0, H_1\), respectively. Suppose that the nonzero bits of \(H_0, H_1\) are located within a block of length \(A_0, A_1\), respectively.

By LEDAcrypt’s design, \(d_v \le A_i, i\in \{0,1\}\) and by assumption on the chosen information set, \(A_i \le \frac{p}{2}, i\in \{0,1\}\). Once \(j_0\) is fixed, there are \(\sum _{A_0=d_v-1}^{\frac{p}{2}}\left( {\begin{array}{c}A_0-1\\ d_v-2\end{array}}\right) \) ways to arrange the remaining bits of \(H_0\). Thus there are

$$\begin{aligned} \sum _{j_0=1}^{p-1}\sum _{A_0=d_v-1}^{\frac{p}{2}}\left( {\begin{array}{c}A_0-1\\ d_v-2\end{array}}\right) \sum _{j_1=1}^{p-1}\sum _{A_1=d_v-1}^{\frac{p}{2}}\left( {\begin{array}{c}A_1-1\\ d_v-2\end{array}}\right) \end{aligned}$$
(5)

many bit arrangements of \(H_0,H_1\).

Once \(j_0, j_1\) are fixed, there are four blocks of Q which may influence the location l. We compute the probability that only one block of Q may influence l at a time.

If l is influenced by \(Q_{0,0}\), there are \(\left( {\begin{array}{c}\frac{p}{2}-A_0-2\\ m_0-1\end{array}}\right) \) ways the remaining bits of \(Q_{0,0}\) can fall, \(\left( {\begin{array}{c}\frac{p}{2}-A_0-1\\ m_1\end{array}}\right) \) arrangements of the bits of \(Q_{0,1}\), \(\left( {\begin{array}{c}\frac{p}{2}-A_1-1\\ m_1\end{array}}\right) \) arrangements of the bits of \(Q_{1,0}\), and \(\left( {\begin{array}{c}\frac{p}{2}-A_1-1\\ m_0\end{array}}\right) \) arrangements of the bits of \(Q_{1,1}\). If l is influenced by \(Q_{0,1}\), there are \(\left( {\begin{array}{c}\frac{p}{2}-A_0-2\\ m_0\end{array}}\right) \) arrangements of the bits of \(Q_{0,0}\), \(\left( {\begin{array}{c}\frac{p}{2}-A_0-1\\ m_1-1\end{array}}\right) \) ways the remaining bits of \(Q_{0,1}\) can fall, \(\left( {\begin{array}{c}\frac{p}{2}-A_1-1\\ m_1\end{array}}\right) \) arrangements of the bits of \(Q_{1,0}\), and \(\left( {\begin{array}{c}\frac{p}{2}-A_1-1\\ m_0\end{array}}\right) \) arrangements of the bits of \(Q_{1,1}\). Similar estimates hold for \(Q_{1,0}\), or \(Q_{1,1}\).

We sum over the l locations considering each of the blocks of Q and their respective weights. Then the overall sum is

$$\begin{aligned} \begin{aligned}&\sum _{j_0=0}^{p-1}\sum _{A_0=d_v-1}^{\frac{p}{2}}\left( {\begin{array}{c}A_0-1\\ d_v-2\end{array}}\right) \sum _{j_1=0}^{p-1} \sum _{A_1 = d_v-1}^{\frac{p}{2}} \left( {\begin{array}{c}A_1 - 1\\ d_v - 2\end{array}}\right) \\&\cdot \sum _{l=0}^{p-1} \left( \left( {\begin{array}{c}\frac{p}{2} - A_0 - 2\\ m_0 - 1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_0 - 1\\ m_1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_1 - 1\\ m_1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_1 - 1\\ m_0\end{array}}\right) \right. \\&+ \left. \left( {\begin{array}{c}\frac{p}{2} - A_0 - 1\\ m_0\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_0 - 2\\ m_1-1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_1 - 1\\ m_1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_1 - 1\\ m_0\end{array}}\right) \right. \\&+ \left. \left( {\begin{array}{c}\frac{p}{2} - A_0 - 1\\ m_0\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_0 - 1\\ m_1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_1 - 2\\ m_1-1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_1 - 1\\ m_0\end{array}}\right) \right. \\&+ \left. \left. \left( {\begin{array}{c}\frac{p}{2} - A_0 - 1\\ m_0\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_0 - 1\\ m_1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_1 - 1\\ m_1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2} - A_1 - 2\\ m_0-1\end{array}}\right) \right) \right) \\&\cdot \left( 1 - O\left( \frac{m}{p}\right) \right) . \end{aligned} \end{aligned}$$
(6)

Failure to impose full weight requirements on HQ introduces double-counting. This occurs when more than one block of Q influences l, though the probability of this event will not exceed \(O(\frac{m}{p})\). The constant sums yield the factor of \(p^3\).    \(\square \)

We can now estimate the percentage of these recovered, ultra-weak keys out of all possible keys.

Theorem 4.2

Let \(m=m_0+m_1, x=\frac{A_0}{p}, y=\frac{A_1}{p}\). Out of \(\left( {\begin{array}{c}p\\ d_v\end{array}}\right) ^2\left( {\begin{array}{c}p\\ m_0\end{array}}\right) ^2\left( {\begin{array}{c}p\\ m_1\end{array}}\right) ^2\) possible keys, we estimate the percentage of ultra-weak keys found in a single iteration of the decoding algorithm above as

$$\begin{aligned} {d_v}^2(d_v-1)^2m \int _{x=0}^{\frac{1}{2}} \int _{y=0}^{\frac{1}{2}} (xy)^{d_v-2}\left( \left( \frac{1}{2}-x\right) \left( \frac{1}{2}-y\right) \right) ^{m}\left( \frac{1}{\frac{1}{2}-x}+\frac{1}{\frac{1}{2}-y}\right) \mathrm {d}y \mathrm {d}x. \end{aligned}$$

Proof

Note that the lines 2–5 of (4) are approximately

$$\begin{aligned} \left( {\begin{array}{c}\frac{p}{2}-A_0\\ m_0\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2}-A_0\\ m_1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2}-A_1\\ m_1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2}-A_1\\ m_0\end{array}}\right) \left( \frac{m_0+m_1}{\frac{p}{2}-A_1}+\frac{m_0+m_1}{\frac{p}{2}-A_0} \right) . \end{aligned}$$
(7)

For \(b,c\in \{0,1\}\),

$$\begin{aligned} \left( {\begin{array}{c}\frac{p}{2}-A_b\\ m_c\end{array}}\right) \approx \left( {\begin{array}{c}p\\ m_c\end{array}}\right) \left( \frac{1}{2}-\frac{A_b}{p} \right) ^{m_c} \end{aligned}$$
(8)

and

$$\begin{aligned} \left( {\begin{array}{c}A_b-1\\ d_v-2\end{array}}\right) \approx \left( {\begin{array}{c}p\\ d_v-2\end{array}}\right) \left( \frac{A_b}{p} \right) ^{d_v-2} \end{aligned}$$
(9)

since p is much larger than \(m_0,m_1, d_v\). We rewrite (4) using the approximations of expressions (7, 8) as

$$\begin{aligned}&p^3 \sum _{A_0=d_v-1}^{\frac{p}{2}}\left( {\begin{array}{c}A_0-1\\ d_v-2\end{array}}\right) \sum _{A_1=d_v-1}^{\frac{p}{2}}\left( {\begin{array}{c}A_1-1\\ d_v-2\end{array}}\right) \left( {\begin{array}{c}p\\ m_0\end{array}}\right) ^2\left( \frac{1}{2}-\frac{A_0}{p}\right) ^{m_0+m_1}\end{aligned}$$
(10)
$$\begin{aligned}&\quad \left( {\begin{array}{c}p\\ m_1\end{array}}\right) ^2 \left( \frac{1}{2}-\frac{A_1}{p}\right) ^{m_0+m_1}\left( \frac{m_0+m_1}{\frac{p}{2}-A_1}+ \frac{m_0+m_1}{\frac{p}{2}-A_0} \right) . \end{aligned}$$
(11)

Applying approximation (9) further reduces expression (10) to

$$\begin{aligned}&p^3\left( {\begin{array}{c}p\\ m_0\end{array}}\right) ^2 \left( {\begin{array}{c}p\\ m_1\end{array}}\right) ^2 \left( {\begin{array}{c}p\\ d_v-2\end{array}}\right) ^2 \sum _{A_0=d_v-1}^{\frac{p}{2}}\left( \frac{A_0}{p} \right) ^{d_v-2} \sum _{A_1=d_v-1}^{\frac{p}{2}}\left( \frac{A_1}{p} \right) ^{d_v-2} \\&\quad \left( \frac{1}{2}-\frac{A_0}{p}\right) ^{m_0+m_1} \left( \frac{1}{2}-\frac{A_1}{p}\right) ^{m_0+m_1}\left( \frac{m_0+m_1}{\frac{p}{2}-A_1}+ \frac{m_0+m_1}{\frac{p}{2}-A_0} \right) \\ =\,&p^2\left( {\begin{array}{c}p\\ d_v-2\end{array}}\right) ^2\left( {\begin{array}{c}p\\ m_0\end{array}}\right) ^2\left( {\begin{array}{c}p\\ m_1\end{array}}\right) ^2m \sum _{A_0=d_v-1}^{\frac{p}{2}} \sum _{A_1=d_v-1}^{\frac{p}{2}}\left( \frac{A_0}{p}\frac{A_1}{p} \right) ^{d_v-2} \left( \frac{1}{2}-\frac{A_0}{p} \right) ^m \\&\quad \left( \frac{1}{2}-\frac{A_1}{p} \right) ^m \left( \frac{1}{\frac{1}{2}-\frac{A_0}{p}}+\frac{1}{\frac{1}{2}-\frac{A_1}{p}} \right) . \end{aligned}$$

Letting \(x=\frac{A_0}{p}, y=\frac{A_1}{p}\), this is approximated by

$$\begin{aligned}&p^2\left( {\begin{array}{c}p\\ d_v\end{array}}\right) ^2\left( {\begin{array}{c}p\\ m_0\end{array}}\right) ^2\left( {\begin{array}{c}p\\ m_1\end{array}}\right) ^2m\frac{{d_v}^2(d_v-1)^2}{(p-d_v+2)^2(p-d_v+1)^2} \\&\cdot p^2\int _{x=0}^{\frac{1}{2}}\int _{y=0}^{\frac{1}{2}}(xy)^{d_v-2}\left( \frac{1}{2}-x \right) ^m \left( \frac{1}{2}-y \right) ^m \left( \frac{1}{\frac{1}{2}-x} + \frac{1}{\frac{1}{2}-y} \right) \mathrm {d}y \mathrm {d}x. \end{aligned}$$

Dividing by \(\left( {\begin{array}{c}p\\ d_v\end{array}}\right) ^2\left( {\begin{array}{c}p\\ m_0\end{array}}\right) ^2\left( {\begin{array}{c}p\\ m_1\end{array}}\right) ^2\), the result follows.    \(\square \)

Evaluating this percentage with the claimed-256-bit ephemeral (CPA-secure) key parameters of LEDAcrypt—\(d_v=11, m=13\)—we determine that 1 in \(2^{72.8}\) ephemeral keys are broken by one iteration of ISD. Similarly for the long-term (CCA-secure) key setting, we evaluate with the claimed 256-bit parameters—\(d_v=13, m=13\)—and conclude the number of long-term keys broken is 1 in \(2^{80.6}\).

This result merely determines the number of keys that can be recovered given that the information set of both blocks of M is chosen to be the last \(\frac{p}{2}\) columns.Footnote 2 In the following, we turn to demonstrating a class of additional information sets that are as effective as this one.

Remark 3

We remind the reader that instead of referring to the pairs of \(\frac{p-1}{2}, \frac{p+1}{2}\) columns of blocks of M, we use \(\frac{p}{2}\) in both cases. This has a negligible effect on our results.

4.3 Enumerating Ultra-Weak Keys for All Information Sets

Now we will demonstrate a multi-iteration ISD attack that is effective against the class of all ultra-weak keys. To set up the discussion, we begin by highlighting two, further “degrees of freedom,” which will allow us to find additional, relevant information sets to guess:

  1. 1.

    Changing the ring representation: Contiguity of indices depends on the choice of ring representation. The large family of ring isomorphisms on \(\mathbb {Z}[x]/\langle x^p-1\rangle \) given by \(f(x)\rightarrow f(x^t)\) for \(t \in [0,p]\) preserves Hamming weight. For example, we can use the family of polynomials

    $$ H'_i = Q'_{i,j} = 1 + x^t + x^{2t} + ... + x^{\left\lfloor \frac{p}{4}\right\rfloor t} $$

    in this attack, since there exists one t such that \(H_i'\) has consecutive nonzero coefficients. Choices of \(t \in \{1, \ldots , \frac{p-1}{2}\}\) yield independent information sets (noting that choices of t and \(-t\mod p\) yield equivalent information sets).

  2. 2.

    Changing the relative offset of the two consecutive blocks: We can also change the beginning index of the consecutive blocks produced within \(L'_0\) or \(L'_1\) (by modifying the beginning indices of \(H'_i\) and \(Q'_{i, j}\) to suit). Note that shifting both \(L'_0\) and \(L'_1\) by the same offset will recover equivalent keys. However, if we fix the beginning index of \(L'_0\) and allow the beginning index of \(L'_1\) to vary, we can find more, mostly independent information sets in order to recover more, distinct keys. The exact calculation of how far one should shift \(L'_1\)’s indices for a practically effective attack is somewhat complex; we perform this analysis below in the remainder of this subsection.

Recall that in the prior 1-iteration attack, we considered one example class of ultra-weak keys – namely, those keys where the polynomials \(L_0\) and \(L_1\) are of degree at most \(\frac{p}{2}.\) Here, we will now take a broader view on the weakest-possible keys.

Definition 4.3

We define the class of ultra-weak keys to be those where, in some ring representation, both \(H_0Q_{0,0}+H_1Q_{1,0}\) and \(H_0Q_{0,1}+H_1Q_{1,1}\) have nonzero coefficients that lie within a block of \(\frac{p-1}{2}\)-many consecutive (modulo p) degrees.

Our goal will be now to find a multi-iteration ISD algorithm—by estimating how far to shift the offset of \(L'_1\) per iteration—that recovers as much of the class of ultra-weak keys as possible without “overly wasting” the attacker’s computational budget. Toward this end, recall that we have a good estimate in Theorem 4.2 of the fraction of keys (\(2^{-72.8},\) resp. \(2^{-80.6}\)) recovered by the best-case, single iteration of our ISD algorithm. In what follows, we will first calculate the fraction of ultra-weak keys as a part of the total key space.

Let \(2^{-X}\) be the fraction of all keys recovered by the best-case, single iteration of our previous ISD algorithm. Let \(2^{-Y}\) be the fraction of ultra-weak keys among all keys. On the assumption that every ring representation leads to independent information sets (chosen uniformly for each invocation of ISD) and on the assumption that independence of ISD key-recovery is maximized by shifting “as far as possible,” we will compute an estimate of the number of index-shifts that should be performed by the optimal ultra-weak-key attacker as \(2^Z = 2^{X-Y}.\) Beyond \(2^Z\) shifts per guess (but not until), the attacker should begin to experience diminishing returns in how many keys are recovered per shifted guess.

Therefore, given an index beginning at 1 out of p positions, the attacker will shift by \(\frac{p(\frac{p-1}{2})}{2^Z}\) indices at each invocation (where the factor \(\frac{p-1}{2}\) accounts for the effect of the different possible ring representations). By assumption, each such guess will be sufficiently independent to recover as many keys in expectation as the initial, best-guess case described by the 1-iteration algorithm. We note that additional, ultra-weak keys will certainly be obtained by performing more work—specifically by shifting less than \(\frac{p(\frac{p-1}{2})}{2^Z}\) per guess—but necessarily at a reduced rate of reward per guess.

Toward this end, we now calculate the number of ultra-weak keys then the fraction of ultra-weak keys among all keys following the format of the previous calculation.

Theorem 4.4

The total number of ultra-weak keys is

$$\begin{aligned}&\frac{p-1}{2}p^2\sum _{A_0=d_v-1}^{\frac{p}{2}}\sum _{A_1=d_v-1}^{\frac{p}{2}}\left( {\begin{array}{c}A_0-1\\ d_v-2\end{array}}\right) \left( {\begin{array}{c}A_1-1\\ d_v-2\end{array}}\right) \end{aligned}$$
(12)
$$\begin{aligned}&\cdot \sum _{l_0=0}^{p-1}\left( \left( {\begin{array}{c}\frac{p}{2}-A_0-1\\ m_0-1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2}-A_1-1\\ m_1\end{array}}\right) +\left( {\begin{array}{c}\frac{p}{2}-A_0-1\\ m_0\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2}-A_1-1\\ m_1-1\end{array}}\right) \right) \end{aligned}$$
(13)
$$\begin{aligned}&\cdot \sum _{l_1=0}^{p-1}\left( \left( {\begin{array}{c}\frac{p}{2}-A_0-1\\ m_0\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2}-A_1-1\\ m_1-1\end{array}}\right) +\left( {\begin{array}{c}\frac{p}{2}-A_0-1\\ m_0-1\end{array}}\right) \left( {\begin{array}{c}\frac{p}{2}-A_1-1\\ m_0\end{array}}\right) \right) . \end{aligned}$$
(14)

Proof

The proof technique follows as in Theorem 4.1. Details are found in [3], B.1.

   \(\square \)

Theorem 4.5

Let \(m=m_0+m_1, x=\frac{A_0}{p}, y=\frac{A_1}{p}.\) The fraction of ultra-weak keys out of all possible keys is

$$\begin{aligned} \frac{p-1}{2}{d_v}^2(d_v-1)^2&\int _{x=0}^{\frac{1}{2}}\int _{y=0}^{\frac{1}{2}}x^{d_v-2}y^{d_v-2}\left( \frac{1}{2}-x\right) ^m\left( \frac{1}{2}-y\right) ^m \\&\left( \frac{{m_0}^2+{m_1}^2}{(\frac{1}{2}-x)(\frac{1}{2}-y)}+\frac{m_0m_1}{(\frac{1}{2}-x)^2}+\frac{m_0m_1}{(\frac{1}{2}-y)^2} \right) \mathrm {d}y \mathrm {d}x. \end{aligned}$$

Proof

Similar techniques apply. See [3], B.2 for details.    \(\square \)

We evaluate the fraction of weak keys using the claimed CPA-secure parameters \(p=36877, m=13, d_v=11\) and determine that 1 in \(2^{54.1}\) ephemeral keys are broken. Evaluating with one of the CCA-secure parameter sets \(p=152267, m=13, d_v=13\), approximately 1 in \(2^{59.7}\) long-term keys are broken.

Given the above, we can make an estimate as to the optimal shift-distance per ISD invocation as \(\frac{36,877(\frac{36,876}{2})}{2^{72.8-54.1}} \approx 1597 \approx 2^{10.6}\) for the ephemeral key parameters and \(\frac{152,267(\frac{152,266}{2})}{2^{80.6-59.7}} \approx 5925 \approx 2^{12.5}\) for the long-term key parameters.

The multi-iteration ISD algorithm against the class of ultra-weak keys, then, makes its first guess (except, one in each ring representation) as in the case of the 1-iteration ISD algorithm. It then shifts the relative offset of the two consecutive blocks by the values calculated above and repeats (again, in each ring representation).

This will not recover all ultra-weak keys, but it will recover a significant fraction of them. In particular, if the support of each block of L, rather than fitting in \(\frac{p}{2}\) consecutive bits fits in blocks that are smaller by at least \(\frac{1}{4}\) of the shift distance. We can therefore lower bound the fraction of recovered keys by replacing factors of \(\frac{1}{2}\) with factors of \(\frac{p}{2}\) minus half or a quarter of the offset, all divided by p, to find the sizes of sets of private keys of which we are guaranteed to recover all, or at least half of respectively.

The multi-iteration ISD algorithm attacking the ephemeral key parameters will make \(2^{72.8-54.1} \approx 2^{18.7}\) independent guesses and recover at least 1 in \(2^{56.0}\) of the total keys. The multi-iteration ISD algorithm attacking the long-term key parameters will make \(2^{80.6-59.7} \approx 2^{20.9}\) independent guesses and recover at least 1 in \(2^{61.6}\) of the total keys.

4.4 Estimating the Effect of More Advanced Information-Set Decoding

Our attempts to enumerate all weak keys were based on the assumption that the adversary was using an ISD variant that required a row of L to be uniformly 0 on all columns of the information set. The state of the art in information set decoding still allows the adversary to decode provided that a row of L has weight no more than about 6 on the information set. For example, Stern’s algorithm [28] with parameter 3 would attempt to find a low weight row of L as follows.

The information set is divided into two disjoint sets of \(\frac{p}{2}\) columns. The first row of L to be recovered should have weight at most 3 within each of the two sets. Further, the same row of L should have have \(\varOmega (\log (p))\) many consecutive 0’s in column-indices that are disjoint from those of the information set. If both of these conditions occur, then a matrix inversion is performed (even though 6 non-zero bits were contained in the information set).

Note that for reasonably large p, nearly a third of the sparse vectors having weight 6 in the information set will meet both conditions. The most expensive steps in the Stern’s algorithm iteration are a matrix inversion of size p and a claw finding on functions with logarithmic cost in p and domain sizes of \(\left( {\begin{array}{c}\frac{p}{2}\\ 3\end{array}}\right) \). The claw finding step is similar in cost to the matrix inversion, both having computational cost \(\approx p^3\). The matrix inversion step is present in all ISD algorithms. Therefore with Stern’s algorithm we can recover in a single iteration with similar cost to a single iteration of a simpler ISD algorithm, O(1) of the private keys where a row of L has weight no more than 6 on the information set columns.

Recall that we choose the information set to be of size \(\approx \frac{p}{2}\) in \(L'.\) The distribution of the non-zero coordinates within a successful guess of information set will be more heavily weighted toward the middle of the set and approximately triangular shaped (since these coordinates are produced by convolutions of polynomials). In particular, we will heuristically model both of the tails of the distribution as small triangles containing 3 bits on the left side and three bits on the right that are missed by the choice of information set.

Let \(W = 2d_v(m_0+m_1)\) denote the number of non-zero bits in \(L'.\) Then the actual fraction \(\epsilon \) that the information set (in the context of advanced information set decoding) should target within L, rather than 1/2, can be estimated by geometric area as

$$ \epsilon \cdot \left( 1-\sqrt{\frac{3}{W/2}}\right) = \frac{1}{2} $$

or, re-writing:

$$ \epsilon = \frac{1}{2\left( 1-\sqrt{\frac{3}{W/2}}\right) }. $$

For the claimed-256-bit ephemeral key parameters, we have \(W_{\mathsf {CPA}}\) = 286. For the claimed-256-bit long-term key parameters, we have \(W_{\mathsf {CCA}}\) = 338. Therefore,

$$\begin{aligned} \epsilon _{\mathsf {CPA}}&= \frac{1}{2\left( 1-\sqrt{\frac{3}{286/2}}\right) } \approx 0.585.\\ \epsilon _{\mathsf {CCA}}&= \frac{1}{2\left( 1-\sqrt{\frac{3}{338/2}}\right) } \approx 0.577. \end{aligned}$$

So – heuristically – we can model the effect of using advanced information set decoding algorithms by replacing the \(\frac{1}{2}\)’s in the calculations of the theorems earlier in this section by \(\epsilon _{\mathsf {CPA}}\) or \(\epsilon _{\mathsf {CCA}}\) respectively.

4.5 Rejection Sampling Considerations

We recall that LEDACrypt’s KeyGen algorithm explicitly requires that the parity check matrix L be full weight. Intuitively full weight means that no cancellations occur in the additions or the multiplications that are used to generate L from H and Q. Formally, the full weight condition on L can be stated as:

$$ \forall i \in \{0, \ldots , n_0 -1 \}, \textsf {weight}(L_i) = d_v \sum _{j=0}^{n_0-1} m_j.$$

When a weak key notion causes rejections to occur significantly more often for weak keys than non-weak keys, we will effectively reduce the probability of weak key generation compared to our previous analysis. As an extreme example, if, for a given weak key notion, rejection sampling rejects all weak keys, then no weak keys will ever be sampled. We therefore seek to measure the probability of key rejection for both weak keys and keys in general in order to determine whether the effectiveness of this attack is reduced via rejection sampling.

Let \(\mathcal {K}\), \(\mathcal {W} \subset \mathcal {K}\), and KeyGen be the public key space, the weak key space, and the key generation algorithm of LEDACrypt, respectively. Let \(\mathcal {K'}\), \(\mathcal {W'} \subset \mathcal {K'}\), and KeyGen’ be the associated objects if rejection sampling were omitted from LEDACrypt. We observe that since KeyGen samples uniformly from \(\mathcal {K}\),

$$\text {Pr} \left[ pk \in \mathcal {W} | (pk, sk) \leftarrow \textsf {KeyGen()} \right] = \frac{|\mathcal {W}|}{|\mathcal {K}|}. $$

This equality additionally holds when rejection sampling does not occur. Since, until now, all of our analysis has ignored rejection sampling we have effectively been measuring \(|\mathcal {W'}|{/} |\mathcal {K'}|\). We therefore seek to find a relation that allows us determine \(|\mathcal {W}|/|\mathcal {K}|\) from \(|\mathcal {K'}|\) and \(\mathcal {W'}|\). We observe that

$$\frac{|\mathcal {W}|}{|\mathcal {K}|} = \frac{|\mathcal {W}|}{|\mathcal {K}|} \frac{|\mathcal {W'}|}{|\mathcal {W'}|} \frac{|\mathcal {K'}|}{|\mathcal {K'}|} = \frac{|\mathcal {W'}|}{|\mathcal {K'}|} \frac{|\mathcal {W}|}{|\mathcal {W'}|} \frac{|\mathcal {K'}|}{|\mathcal {K}|}. $$

Therefore it holds that the probability of generating a weak key when we consider rejection sampling for the first time in our analysis changes by exactly a factor of \((|\mathcal {W}|{/}|\mathcal {W'}|) \cdot (|\mathcal {K'}|{/} |\mathcal {K}|)\). This is precisely the probability that a weak key will not be rejected due to weight concerns divided by the probability that key will not be rejected due to weight concerns.

We note that as long as the rejection probabilities for both keys and weak keys is not especially close to 0 or 1, then it is sufficient to sample many keys according to their distributions and observe the portion of these keys that would be rejected.

In order to practically measure the security gained by rejection sampling for the 1-iteration ISD attack against the ephemeral key parameters, we sample 10,000 keys according to KeyGen’ and we sample 10,000 weak keys according to KeyGen’ and we observe how many of them are rejected. We observe that approximately 39.2% of regular keys are rejected while approximately 67.4% of weak keys are rejected. We therefore conclude for this attack and this parameter set, \(\frac{|\mathcal {W}|}{|\mathcal {K}|} = 0.582 \frac{|\mathcal {W'}|}{|\mathcal {K'}|}\). Therefore, rejection sampling grants less than 1 additional bit of security back to LEDACrypt.

This attack analysis can be efficiently reproduced for additional parameter sets and alternative notions of weak key with the same result.

4.6 Putting It All Together

Finally, we re-calculate the results of Sect. 4.2 using Theorems 4.2 and 4.5, but accounting for the attack improvement of using advanced information set decoding from Sect. 4.4 and accounting for the security improvement due to rejection sampling issues in Sect. 4.5. We re-write the formulas with the substitutions of \(\epsilon _{\mathsf {CPA}}\) (resp. \(\epsilon _{\mathsf {CPA}}\)) for the constant \(\frac{1}{2}\) for the reader, and note that the definition of ultra-weak keys has been implicitly modified to have more liberal degree constraints to suit the advanced ISD subroutine being used now.

Let xym be defined as in Theorem 4.5. For the case of claimed-256-bit security for ephemeral key parameters, the fraction of ultra-weak keys recovered by a single iteration of the advanced ISD algorithm is

$$\begin{aligned} {d_v}^2(d_v-1)^2m \int _{x=0}^{\epsilon } \int _{y=0}^{\epsilon } (xy)^{d_v-2}\left( \left( \epsilon -x\right) \left( \epsilon -y\right) \right) ^{m}\left( \frac{1}{\epsilon -x}+\frac{1}{\epsilon -y}\right) \mathrm {d}y \mathrm {d}x, \end{aligned}$$

and the fraction of these ultra-weak keys out of all possible keys is

$$\begin{aligned} (\epsilon p){d_v}^2(d_v-1)^2&\int _{x=0}^{\epsilon }\int _{y=0}^{\epsilon }x^{d_v-2}y^{d_v-2}\left( \epsilon -x\right) ^m\left( \epsilon -y\right) ^m \\&\left( \frac{{m_0}^2+{m_1}^2}{(\epsilon -x)(\epsilon -y)}+\frac{m_0m_1}{(\epsilon -x)^2}+\frac{m_0m_1}{(\epsilon -y)^2} \right) \mathrm {d}y \mathrm {d}x. \end{aligned}$$

Evaluating these formulae with ephemeral key parameters \(d_v = 11, m_0 = 7, m_1 = 6, p = 36,877\) and substituting \(\epsilon _{\mathsf {CPA}} = .585\) yields 1 key recovered in \(2^{62.62}\) per single iteration, and 1 ultra-weak key in \(2^{43.90}\) of all possible keys. This yields an algorithm making \(2^{62.62-43.90} = 2^{18.72}\) guesses and recovering 1 in \(2^{47.72}\) of the ephemeral keys (accounting for the loss due to rejection sampling and the limited number of iterations).

Substituting \(\epsilon _{\mathsf {CCA}}=.577\) similarly and evaluating with long-term key parameters \(d_v = 13, m_0 = 7, m_1 = 6, p = 152,267\) yields 1 key recovered in \(2^{70.45}\) per single iteration and 1 ultra-weak key in \(2^{49.55}\) of all possible keys. This yields an algorithm making \(2^{70.45 - 49.55} =2^{20.90}\) guesses and recovering 1 in \(2^{52.54}\) of the long-term keys (accounting for the loss due to rejection sampling and the limited number of iterations).

To conclude, we would like to compare this result against the claimed security level of NIST Category 5. Formally, these schemes should be as hard to break as breaking 256-bit AES. Each guess in the ISD algorithms leads to a cost of approximately \(p^3\) bit operations (due to linear algebra and claw finding operations combined). This is \(2^{45.5}\) bit operations for the ephemeral key parameters and \(2^{51.6}\) bit operations for the long-term key parameters. A single AES-256 operation costs approximately \(2^{15}\) bit operations. This yields the main result of this section.

Theorem 4.6 (Main)

There is an advanced information set decoding algorithm that costs the same as \(2^{49.22}\) AES-256 operations and recovers 1 in \(2^{47.72}\) of LEDAcrypt’s Category 5 ephemeral keys.

Similarly, there is an advanced information set decoding algorithm that costs the same as \(2^{57.50}\) AES-256 operations and recovers 1 in \(2^{52.54}\) of LEDAcrypt’s Category 5 long-term keys.

Remark 4

Note that \(49.22+47.72 = 96.94 \ll 256\), \(57.50+52.54 = 110.03 \ll 256.\)

Remark 5

Finally, we recall that we used various heuristics to approximate the above numbers, concretely. However, these simplifying choices can only affect at most one or two bits of security compared to a fully formalized calculation (which would come at the expense of making the analysis significantly more burdensome to parse for the reader).

5 Attack on All Keys

To conclude, we briefly analyze the asymptotic complexity of our new attack strategy in the context of recovering keys in the average case. We first note that, assuming the LEDAcrypt approach is parameterized in a balanced way – that is, H and Q are similarly sparse, and further assuming that \(n_0\) is a constant – the ordinary ISD attack (with a randomly chosen information set) has a complexity of \(\exp (\tilde{O}(p^{\frac{1}{2}})).\) To see this, observe that all known ISD variants using a random information set to find an asymptotically sparse secret parity check matrix constructed like the LEDAcrypt private key, have complexity \(O\left( \frac{n_0}{n_0-1}\right) ^w\), where \(w = n_0d_vm\) is the row weight of the secret parity check matrix. Efficient decoding requires \(w = O (p^{\frac{1}{2}})\). By inspection this complexity is \(\exp (\tilde{O}(p^{\frac{1}{2}}))\)

However, we obtain an improved asymptotic complexity when using structured information sets as follows.

Theorem 5.1

The asymptotic complexity of ISD using an appropriate choice of structured information sets, when attacking all LEDAcrypt keys in the worst case, is \(\exp (\tilde{O}(p^{\frac{1}{4}})).\)

Proof

We analyze the situation with structured information sets. Imagine we are selecting the nonzero coefficients of \(H'\) and \(Q'\) completely at random, aside from a sparsity constraint. The sparsity constraint needs to be set in such a way that the row weight of the product \(H'Q'\) (restricted to two cyclic blocks) has row weight no more than p. This further constrains the row weight of each cyclic block of \(H'\) and \(Q'\) to be approximately \(\left( \frac{p\text {ln}(2)}{n_0}\right) ^\frac{1}{2} = O(p^{\frac{1}{2}})\). The probability of success per iteration is then at least \(O\left( \left( \frac{\text {ln} (2)}{p n_0}\right) ^{\frac{1}{2} \cdot (\sum _{i=0}^{n_0-1}m_i + n_0 d_v)}\right) \). With balanced parameters, \(d_v\) and the \(m_i\) are \(O(p^{\frac{1}{4}})\), thus the total complexity is indeed \(\exp (\tilde{O}(p^{\frac{1}{4}}))\). Note that when \(H'\) and \(Q'\) are random aside from the sparsity constraint, the probability that the supports of \(H'\) and \(Q'\) contain the supports of H and Q respectively does not depend on H and Q, so the structured ISD algorithm is asymptotically better than the unstructured ISD algorithm, even when we ignore weak keys.

   \(\square \)

Remark 6

The fact that there exists an asymptotically better attack than standard information set decoding against keys structured like those of LEDAcrypt is not itself particularly surprising. Indeed, the very simple attack that proceeds by enumerating all the possible values of H and Q is also asymptotically \(\exp (\tilde{O}(p^{\frac{1}{4}}))\). However, this simple attack does not affect the concrete parameters presented in the Round 2 submission of LEDAcrypt.

In contrast, we strongly suspect, but have not rigorously proven, that our attack significantly improves on the complexity of standard information set decoding against typical keys randomly chosen for some of the submitted parameter sets of LEDAcrypt. In particular, our estimates suggest that the NIST category 5 parameters with \(n_0=2\) can be attacked with an appropriately chosen distribution for \(H'\) and \(Q'\) (e.g. with each polynomial block of \(H'\) and \(Q'\) chosen to have 5 or 6 consecutive chunks of nonzero coefficients in some ring representation) and that typical keys will be broken at least a few hundred times faster than with ordinary information set decoding.

If it were the case that we were attacking an “analogously-chosen” parameter set for LEDAcrypt targeting higher security levels (512-bit security, 1024-bit security, and so on), we believe a much larger computational advantage would be obtained and (importantly) be very easy to rigorously demonstrate.

6 Conclusion

In this work, we demonstrated a novel, real-world attack against LEDAcrypt – one of 17 remaining 2nd Round candidates for standardization in NIST’s Post-Quantum Cryptography competition. The attack involved a customized form of Information Set Decoding, which carefully guesses the information set in a non-uniform manner so as to exploit the unique product structure of the keys in LEDAcrypt’s design. The attack was most effective against classes of weak keys in the proposed parameter sets asserted to have 256-bit security (demonstrating a trade-off between computational time and fraction of the key space recovered that was better than expected even of a 128-bit secure cryptosystem), but the attack also substantially reduced security of all parameter sets similarly.

Moreover, we demonstrated that these type of weak keys are present throughout the key space of LEDAcrypt, so that simple “patches” such as rejection sampling cannot repair the problem. This was done by demonstrating a continuum of progressively larger classes of less weak keys and by showing that the same style of attack reduces the average-case complexity of certain parameter sets.